Dr. Dobb's Journal (February)

, Dr. Dobbs J O U R N A L #381 FEBRUARY 2006 SOFTWARE TOOLS FOR THE PROFESSIONAL PROGRAMMER http://www.ddj.com 64-B...

69 downloads 1985 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

,

Dr. Dobbs J O U R N A L

#381 FEBRUARY 2006

SOFTWARE TOOLS FOR THE PROFESSIONAL PROGRAMMER http://www.ddj.com

64-BIT COMPUTING! Multiplatform Porting to 64 Bits Mac OS X & 64 Bits Examining µC++ Native Queries for Persistent Objects Dynamic Bytecode Instrumentation $4.95US $6.95CAN

0

74470 01051

02

7

Summer of Code

Range Tracking & Comparison GIF Images & Mobile Phones Inside Sudoku Viewing & Organizing Log Files Porting Real-Time Operating Systems

C O N T E N T S

FEBRUARY 2006 VOLUME 31, ISSUE 2

FEATURES Multiplatform Porting to 64 Bits 20 by Brad Martin, Anita Rettinger, and Jasmit Singh

Porting 300,000 lines of 32-bit code to nearly a dozen 64-bit platforms requires careful planning.

Mac OS X Tiger & 64 Bits 26 by Rodney Mach

Before migrating to 64-bit platforms, the first question to ask is whether you really need to do so.

Ajax: Asynchronous JavaScript and XML 32 by Eric J. Bruno

Ajax, short for “Asynchronous JavaScript and XML,” lets you create dynamic web pages.

Examining µC++ 36 by Peter A. Buhr and Richard C. Bilson

µC++ was designed to provide high-level concurrency for C++.

Native Queries for Persistent Objects 41 by William R. Cook and Carl Rosenberger

Among other benefits, native queries overcome the shortcomings of string-based APIs.

Dynamic Bytecode Instrumentation 45 by Ian Formanek and Gregg Sporar

Dynamic bytecode instrumentation is an innovative technique that makes profiling fast and easy.

Range Tracking & Comparison Algorithms 50 by Kirk J. Krauss

Some information is best viewed as a list of ranges. Kirk presents algorithms for dealing with ranges.

Displaying GIF Images on J2ME Mobile Phones 52 by Tom Thompson

Surprisingly, many Java-based mobile phones couldn’t display GIF image files — until now.

Sudoku & Graph Theory 56 by Eytan Suchard, Raviv Yatom, and Eitan Shapir

Understanding graph theory is central to building your own Sudoku solver.

Google’s Summer of Code: Part III 58 by DDJ Staff and Friends

Google’s Summer of Code resulted in thousands and thousands of lines of code. Here are more students who participated.

FORUM

Viewing & Organizing Log Files 61

EDITORIAL 10

by Phil Grenetz

by Jonathan Erickson

LogChipper, the tool Phil presents here, lets you view and organize the contents of log files.

LETTERS 12

by you

EMBEDDED SYSTEMS PROGRAMMING Porting an RTOS to a New Hardware Platform 65 by Byron Miller

Porting software to new hardware boards doesn’t need to be difficult.

DR. ECCO’S OMNIHEURIST CORNER 14

by Dennis E. Shasha NEWS & VIEWS 16

by DDJ Staff PRAGMATIC EXCEPTIONS 24

COLUMNS

by Benjamin Booth

Programming Paradigms 68

Chaos Manor 74

OF INTEREST 79

by Michael Swaine

by Jerry Pournelle

by DDJ Staff

Everything Michael knows he attributes to Roger Penrose’s The Road to Reality: A Complete Guide to the Laws of the Universe.

Beware of Sony’s Digital Rights Management (DRM) scheme, which covertly installs itself.

SWAINE’S FLAMES 80

Embedded Space 71

Programmer’s Bookshelf 77

by Ed Nisley

by Peter N. Roth

Ed remembers to tell you that memory really does matter.

Peter reviews Stephen C. Perry’s Core C# and .NET.

http://www.ddj.com

Dr. Dobb’s Journal, February 2006

by Michael Swaine NEXT MONTH: The smart thing to do in March is to read our issue on Intelligent Systems.

5

D R .

D O B B ’ S

O N L I N E

C O N T E N T S Online Exclusives

The C/C++ Users Journal

http://www.ddj.com/exclusives/

http://www.cuj.com/

VB6 to VB.NET Migration There are millions of Visual Basic 6 developers and an enormous amount of VB6 code. What does the landscape look like for this tremendous pool of legacy code and talent?

The Obsolete Operating System

Flexible C++ #13: Beware Mixed Collection/Enumerator Interfaces

To some, the modern definition of a computer operating system is obsolete.

When the semantics of collection and enumerator interfaces are blurred, the result can mean trouble.

Dobbscast Audio

The News Show

http://www.ddj.com/podcast/

http://thenewsshow.tv/

SysML: A Modeling Language for Systems Engineering

The Feds and IT Failures

Chris Sibbald discusses SysML, a visual modeling language for systems engineering applications.

Computer Theft: A Growing Problem Biometric and computer security expert Greg Chevalier discusses the growing problem of mobile computer theft, and what you can do to combat it.

AADL: A Design Language for Embedded Systems

The IRS spent nearly $2 billion on business modernization before it began to process even 1 percent of tax returns.

RESOURCE CENTER

What features would a good O/R mapping tool provide you with and how can it be beneficial to you?

As a service to our readers, source code, related files, and author guidelines are available at http://www.ddj.com/. Letters to the editor, article proposals and submissions, and inquiries should be sent to [email protected]. For subscription questions, call 800-456-1215 (U.S. or Canada). For all other countries, call 902563-4753 or fax 902-563-4807. E-mail subscription questions to [email protected], or write to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188. If you want to change the information you receive from CMP and others about products and services, go to http://www.cmp.com/ feedback/permission.html or contact Customer Service at Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188. Back issues may be purchased prepaid for $9.00 per copy (which includes shipping and handling). For issue availability, send e-mail to [email protected], fax to 785-838-7566, or call 800-444-4881 (U.S. and Canada) or 785838-7500 (all other countries). Please send payment to Dr. Dobb’s Journal, 4601 West 6th Street, Suite B, Lawrence, KS 66049-4189. Digital versions of back issues and individual articles can be purchased electronically at http://www.ddj.com/.

BYTE.com

WEB SITE A C C O U N T A C T I VA T I O N

Peter Feiler discusses the Architecture Analysis and Design Language, a textual and graphical language that supports modelbased engineering of embedded real-time systems.

COM Interop .NET guru Juval Lowy explores how COM Interop can allow legacy VB6 applications to coexist in a .NET world.

Windows/.NET http://www.ddj.com/topics/windows/

An Overview of Generics In the .NET Framework 2.0, C# and Visual Basic .NET support generics.

Dotnetjunkies http://www.dotnetjunkies.com/

Top 10 Must-Have Features in O/R Mapping Tools

http://www.byte.com/

Why Can’t Windows Do Windows? Multimedia apps require lots of desktop real estate, so having two or more displays can be the answer — if you can get them to work.

Dr. Dobb’s Journal subscriptions include full access to the CMP Developer Network web sites. To activate your account, register at http://www.ddj.com/registration/ using the web ALL ACCESS subscriber code located on your mailing label.

DR. DOBB’S JOURNAL (ISSN 1044-789X) is published monthly by CMP Media LLC., 600 Harrison Street, San Francisco, CA 94017; 415-947-6000. Periodicals Postage Paid at San Francisco and at additional mailing offices. SUBSCRIPTION: $34.95 for 1 year; $69.90 for 2 years. International orders must be prepaid. Payment may be made via Mastercard, Visa, or American Express; or via U.S. funds drawn on a U.S. bank. Canada and Mexico: $45.00 per year. All other foreign: $70.00 per year. U.K. subscribers contact Jill Sutcliffe at Parkway Gordon 01-49-1875-386. POSTMASTER: Send address changes to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80328-6188. Registered for GST as CMP Media LLC, GST #13288078, Customer #2116057, Agreement #40011901. INTERNATIONAL NEWSSTAND DISTRIBUTOR: Source Interlink International, 27500 Riverview Center Blvd., Suite 400, Bonita Springs, FL 34134, 239-949-4450. Entire contents © 2006 CMP Media LLC. Dr. Dobb’s Journal® is a registered trademark of CMP Media LLC. All rights reserved.

6

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

,

Dr.Dobbs J O U R N A L

PUBLISHER Michael Goodman

SOFTWARE TOOLS FOR THE PROFESSIONAL PROGRAMMER

EDITOR-IN-CHIEF Jonathan Erickson

EDITORIAL MANAGING EDITOR Deirdre Blake SENIOR PRODUCTION EDITOR Monica E. Berg ASSOCIATE EDITOR Della Wyser COPY EDITOR Amy Stephens ART DIRECTOR Margaret A. Anderson SENIOR CONTRIBUTING EDITOR Al Stevens CONTRIBUTING EDITORS Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley, Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley, Jerry Pournelle, Dennis E. Shasha EDITOR-AT-LARGE Michael Swaine PRODUCTION MANAGER Stephanie Fung INTERNET OPERATIONS DIRECTOR Michael Calderon SENIOR WEB DEVELOPER Steve Goyette WEBMASTERS Sean Coady, Joe Lucca AUDIENCE DEVELOPMENT AUDIENCE DEVELOPMENT DIRECTOR Kevin Regan AUDIENCE DEVELOPMENT MANAGER Karina Medina AUDIENCE DEVELOPMENT ASSISTANT MANAGER Shomari Hines AUDIENCE DEVELOPMENT ASSISTANT Andrea Abidor MARKETING/ADVERTISING ASSOCIATE PUBLISHER Will Wise SENIOR MANAGERS, MEDIA PROGRAMS see page 78 Pauline Beall, Michael Beasley, Cassandra Clark, Ron Cordek, Mike Kelleher, Andrew Mintz MARKETING DIRECTOR Jessica Marty SENIOR ART DIRECTOR OF MARKETING Carey Perez DR. DOBB’S JOURNAL 2800 Campus Drive, San Mateo, CA 94403 650-513-4300. http://www.ddj.com/ CMP MEDIA LLC Steve Weitzner President and CEO John Day Executive Vice President and CFO Jeff Patterson Executive Vice President, Corporate Sales and Marketing Bill Amstutz Senior Vice President, Audience Marketing and Development Mike Azzara Senior Vice President, Internet Business Joseph Braue Senior Vice President, CMP Integrated Marketing Solutions Sandra Grayson Senior Vice President and General Counsel Anne Marie Miller Senior Vice President, Corporate Sales Marie Myers Senior Vice President, Manufacturing Alexandra Raine Senior Vice President, Communications Kate Spellman Senior Vice President, Corporate Marketing Michael Zane Vice President, Audience Development Robert Faletra President, Channel Group Tony Keefe President, CMP Entertainment Media Vicki Masseria President, CMP Healthcare Media Philip Chapnick Senior Vice President, Group Director, Applied Technologies Group Paul Miller Senior Vice President, Group Director, Electronics and Software Groups Fritz Nelson Senior Vice President, Group Director, Enterprise Group Stephen Saunders Senior Vice President, Group Director, Communications Group

Printed in the USA

American Buisness Press

8

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

EDITORIAL

Bits and Bytes…

I

f you believe everything you read, “64 bits” is this week’s bee’s knees of computing. Microsoft must think so, as the company recently announced at least some of its upcoming server offerings will run only on x86-compatible 64-bit processors. In fact, the ready availability of 64bit platforms is an important step forward. Still, that doesn’t necessarily mean it’s time to post your 32-bit system on Craigslist or eBay. There’s a time and place for everything, including 64 bits. According to Microsoft’s Bob Kelly, the time and place for 64-bit systems is with performancecritical applications such as Microsoft’s Exchange 12 e-mail server and its SQL Server database. Other applications areas that benefit from 64-bit processors are complex engineering programs, games, and anything that involves audio/video encoding. Anything, in other words, which takes advantage of 64-bit arithmetic or requires addressing datasets beyond the 4-gigabyte constraint of 32-bit processors. A 64-bit processor can address up to 16 exabytes of memory— that’s 18-billion gigabytes, and more than enough for most compute-intensive applications. Of course, in the spirit of “there’s no such thing as a free lunch,” the memory used by a 64-bit processor’s larger integers and/or pointers can also lead to more paging and disk I/O, thereby degrading performance. This means that while some applications don’t need 64-bit integers and/or pointers, they end up paying for them anyway. In short, the fundamental difference between 32-bit and 64-bit processors isn’t necessarily the speed of the processor, but the amount of data that can be processed that, at times, lends the appearance of faster speed. That said, there are workarounds (some of which involve virtual memory) that let you utilize 64-bit addressing on systems with less than 4 GB of memory, not to mention that you can gain some performance pop by running a 64-bit processor in 32-bit mode. The bottom line is that there’s still a lot to learn when it comes to effectively using next-generation platforms, and the sooner we jump on them, the better prepared we will be for the future. Speaking of the future, anyone who doesn’t think the wireless world has found a home in academia hasn’t sat in on a college lecture class recently. What with everything from iPods and Instant Messaging to e-mail and FreeCell, there’s a whole lot of something going on, most of which seems to have little to do with learning. That’s changing, however, with the advent of “Interactive Audience Response Systems,” referred to simply as “clickers”— radio frequency (RF) sender/receiver devices that let students and teachers interact in real time. A typical student/teacher scenario goes something like this: Students buy or rent a clicker (somewhat akin to a TV remote-control device but with fewer keys) at the beginning of the semester and register it with the school. Students can use a single clicker in multiple classes. When instructors want feedback, students answer, and their responses are instantly available and/or recorded for later review. Because many universities now have wired lecture halls, tracking and storing clicker information for professors isn’t a big deal. Alternatively, instructors can plug USB readers into their laptops and store the information locally. With typical systems, up to 1000 student RF keypads can be used per receiver, with up to 82 sessions (channels) running at the same time in close proximity without interference. There are a number of companies that offer this technology, including Turning Technologies (http://www.turningtechnologies.com/) and eInstruction (http://www.einstruction.com/). eInstruction claims its system is being used in 800 institutions in 50 states and 20 countries, with more than a million devices in the hands of students. Granted, audience response systems such as these have been around for a while. Early implementations were based on infrared technology (IR), but RF offers clear advantages in range and the ratio of sender units to the receiver. Additionally, some vendors offer “virtual clickers”— soft keypads that run on PCs or PDAs that support all the features of standard clickers but with the added functionality of text messaging, which lets students submit questions to teachers and offers support for response to fill-in-the-blank and essay questions. And on a sad note, John Vlissides, coauthor of the seminal book Design Patterns: Elements of Reusable Object-Oriented Software, recently passed away. Along with his coauthors who made up the “Gang of Four,” John was a recipient of the Dr. Dobb’s Journal Excellence in Programming Award in 1998. He was also the author of several other books, most of which focused on software design and patterns. For much of his career, John was a researcher at IBM’s T.J. Watson Research Center. Prior to joining IBM Research, John was a postdoctoral scholar in the Computer Systems Lab at Stanford University, where he codeveloped InterViews. Memories of John have been put together on Ward Cunningham’s Wiki (http://c2.com/cgi/wiki?JohnVlissides/).

Jonathan Erickson editor-in-chief [email protected] 10

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

LETTERS

, PO S S O BB

unless we place a few wind-turbines around the inner-belt of the GW parkway in Washington, D.C. When Congress is in session, I am certain gigawatts of electricity could easily be generated by the hot air produced. Ronald R. Goodwin [email protected]

T

D

2

N CE

TS

2

Nuclear versus Wind Energy Dear DDJ, Luis de Sousa stated in “Letters” (DDJ, September 2005) that nuclear is not a clean energy due to mining, purifying, and disposing of nuclear wastes. Okay, as a 25year nuclear health physicist who dealt with nuclear waste issues in about 15 of the 48 contiguous states, I might agree with the waste issue because our hosed-up government can’t find anybody willing to give enough kickbacks to make some Senator or Representative rich enough to make the waste issues work. However, to compare the first two issues— mining and purifying— I have to ask Luis how does he expect the windmills to be made? Will the same God that makes the wind provide the metal for the towers, the blades, the housings, and the generators; the metal for the cabling that will run for how far from the wind towers; the insulation for these same cables? (As an aside, the creation of insulation for cable is one of the most polluting manufacturing processes known to man. And the generators, breakers, and switches of a sea-mounted windpowered farm filled with PCBs and other chemicals is just scary!) How about the environmental impact on the sea bed where his “wind-generators” will be placed? I believe when we start comparing the manufacturing of the materials that are used, well, the scales are pretty much balanced. When nuclear (not “nukular” as GWB would say) people discuss the cleanliness of nuclear power, they are talking about the actual lack of emissions of any pollutants into the atmosphere: I mean sulfuric acid, sulfur dioxide, carbon monoxide, carbon dioxide, hydro- and hyperchloric acids, and the like, that come from burning fossil fuels. Granted, wind has great potential, but if you have driven through northern New Mexico and observed the miles and miles of wind-powered generators (most of them setting idle, by the way, where land potential is surrendered to make room for 50+ foot wind-turbine blades by the score), well, I cannot consider wind as a viable option, 12

Piracy versus Privacy Dear DDJ, It is reported that Mr. Yale spent his entire life attempting to make a lock he himself could not pick. He never succeeded. Reading Dennis Shasha and Michael Rabin’s “Preventing Piracy While Preserving Privacy” (DDJ, October 2005) in the light of this insight leads me to several questions, none of them included in the FAQ: 1. The users of my software operate in remote parts of the globe, where Internet access is unavailable (or prohibitively expensive). Weekly access to your servers is out of the question. Also, I have a missioncritical WinXP PC here on my desk that has never been infected by a virus or adware or spyware trojan. How is this possible, given the notorious fragility of Microsoft software? I never let it on the Internet for any reason. I often transfer files on the local LAN to this Mac, but only through a physical A/B switch that disconnects the Internet when the PC is connected. Who cares about privacy if our mission-critical systems won’t work at all under your system? 2. Speaking of the notorious fragility of Microsoft products and the comparable (adjusted for market penetration) fragility of UNIX-based products, how do you propose to implement a “Supervising Program” that cannot be remotely cracked (to say nothing of local attacks)? 3. What happens if a clever pirate distributes a freeware program (no rights management needed) that runs under your SP and acts as a surrogate SP to run the protected content one step removed from the “Content Identifying” processes of the actual SP? For example, this rogue crypto-SP can process sound files, but instead of sending the sound waves out the speaker port where the real SP can measure the melodic content, it sends it out to an iPod on the USB bus? Everybody knows the iPod has no direct Internet connection to run your verification protocols. Or else to a rogue USBto-speaker device sold on the black market? It is arbitrarily difficult for your SP to know it is sound content going out that port. 4. Speaking of a surrogate SP running under the real SP, given that your protocols must be open, how do you prevent rogue SPs from swamping the servers with bogus TTIDs? 5. Who is qualified to upload a CII signature to your “Superfingerprint” server? What happens if a “vendor” tries to upload Dr. Dobb’s Journal, February 2006

a fingerprint that matches an existing fingerprint? In the case of music, I can imagine something keyed to melodic lines matching only if the music is, in fact, the same tune (although much modern “music” is, in fact, tuneless), but I can also imagine a clever programmer designing his software to have a signature that matches the signature of the program he wishes to bore. These questions arose in just the few minutes it took me to read your article. Crackers have a lot more time to probe for weaknesses. Do you really think your system is any more secure than the existing software-based protection mechanisms? I think the iPod phenomenon is a much more robust mechanism for reducing the market cost of piracy: The proportion of paid-for music to pirate copies has improved significantly since the iPod came to market. Furthermore, the remaining pirate copies do not represent nearly as great a loss to the content-creation industry as they want you to believe because most of those “librarians and 12-year-old kids” wouldn’t buy it anyway. I was there when Dan Sokol came to the HomeBrew Computer Club with 10 copies of Altair Basic (which, as he pointed out, contained no copyright notice anywhere and was, therefore, legally in the public domain), and I watched over the years as those pirate copies were multiplied into thousands of local electronics businesses, so that when they needed a legitimate copy of Basic, they bought the version they knew— from Microsoft! My own Basic was too cheap to pirate, so it never reached the same market penetration. The result: Bill Gates is rich and I am not. Tom Pittman [email protected] Dennis and Michael respond: Thanks, Tom. 1. Superfingerprint downloads and callups can occur through intermediaries. So there is no need for a direct connection to the Internet. The fidelity of Superfingerprints is certainly an issue and will require substantial care. 2. The article refers to the Lampson-style boot strategy to assure the integrity of the Supervising Program. Trusted hardware is a part of this solution. 3. Content going out to unprotected devices may not be detected. We agree. 4. There will be a notion of hash-cash to prevent denial-of-service attacks. 5. When Superfingerprints are uploaded, they must be checked against existing ones to ensure that an author’s rights are protected. We will also provide a service to register freeware, so Superfingerprints don’t appear that prevent freeware from running. DDJ http://www.ddj.com

DR. ECCO’S OMNIHEURIST CORNER

Proteins for Fun and Profit Dennis E. Shasha

P

ulling a card out of the inside pocket of his well-tailored, dark suit, the professor presented it to Ecco. It read Ming Thomas, PhD, protein industrialist. “I’ve come with a project,” Thomas began after greeting us and taking a seat. “In the early days of molecular biology, people asserted — with the authority that only uncertainty could inspire — that every gene generates one protein. “Now it seems that there are at least a few genes that produce thousands of proteins. Let me explain how. “A gene is a sequence of DNA, but, in higher organisms, that DNA alternates between strings that in fact produce portions of proteins (called ‘exons’) and strings that don’t (called ‘introns’). Thus, a gene sequence has the form E1 I1 E2 I2 E3 I3… where the Es represent exons and the Is represent introns. “Genes can produce many proteins because any (not necessarily consecutive) subsequence of exons can form a protein. For example, E2 E4 E5 can form a protein as can E1 E2 E7, but E6 E4 E5 cannot because the ordering E6 E4 violates the order of the original exon sequence. E3 E3 E5 cannot form a protein either because an exon at a given position cannot be repeated. “When manufacturing proteins at industrial scale, we can handle up to seven exons. Our expense is directly related to the total length of those exons. We hope you can minimize our expense. “Our first client wants us to generate 15 hydrophobic proteins that are alanine heavy. They believe these will act like sticky balls floating on top of water allowing translucent water sculpture. Think Los Angeles swimming pools. We want help designing the exons in order to minimize their size. I know you like warmups, so here is one. Suppose we could use only three exons and we wanted to generate the following proteins (where each amino acid is represented by a single letter; for example, Alanine is A): GA GAGAS GAS RAGA RAGAS

What would the exons have to be to generate these proteins, trying to minimize the total length of the exons?” Solution to Warm-Up: The following three exons could do this, having a total length of seven. RA GA GAS

“Just a minute,” Ecco interrupted turning to his 17-year-old niece Liane, who had been listening in. “Liane, isn’t the biology here somewhat more complicated?” “Well, yes, but probably not in an essential way,” Liane responded. “DNA doesn’t literally consist of amino acids, but rather, an alphabet of ‘nucleotides’ whose nonoverlapping consecutive triplets are translated to amino acids. So, when Dr. Thomas speaks of minimizing the length of the exons, he formally means minimizing the number of nucleotides. Provided each exon’s length is a multiple of three, however, the problems are mathematically identical because minimizing the number of amino acids produced by the exons minimizes the number of nucleotides in the exons themselves.” “I couldn’t have explained this better myself,” said Thomas visibly impressed. “For many reasons, we want each exon to generate full amino acids, so each exon’s length is in fact a multiple of three. Therefore, we can view each exon as consisting of the amino acid string it generates. Now do you understand the warm-up?” “Sure,” said 11-year-old Tyler. “The protein RAGAS is generated from the RA and GAS exons, for example. RAGA is generated from the first two exons and GAGAS from the last two. So give us your big challenge.” Ming Thomas chuckled. “May I hire your whole family, Dr. Ecco?” “We’re all confirmed puzzle freaks,” Ecco responded with a smile. “Do tell us which proteins you want.” “Here they are,” said Thomas. “Remember that you are allowed seven exons and we want to minimize the total length (in amino acids) of those exons:

Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at [email protected].

AGPA APASAG APASARAGPA APASARASA APASARASAPA

14

Dr. Dobb’s Journal, February 2006

CAAPASAGASAPA CAAPASARAG CAAPASARPA CARAPAPAS CARAPAPASAGASA CARAPAPASPA CARAPASA RAPAPASAGPA RAPAPASASAPA RAPASA

1. Can you find an encoding into exons whose total amino acid length is 20 or less? Liane and Tyler worked this out. “Very nice,” said Thomas. “That’s better than the solution we had thought of. Very nice work. “Here is a follow-up question: One of our biochemists says he can manipulate up to 11 exons provided each produces two amino acids. In that case, what is the smallest total amino acid length of exons to create the following 15 proteins? BAPAFADAFACA BAPAGAPADA RABAPAGADAFACA RASA RASAGAPAFAFACA RASATABAPAGAPAFACA RASATABAPAGAPAFAFA RATAGAPAFADAFA SABAPAFADACA SAPADA SAPAPAFADAFACA SATABAGAPADAFA SATABAPAGADAFACA SATAPAGAPAFA TABACA

Ecco helped his nephew and niece solve the problem this time. When Thomas saw the solution, he nodded and said, “Excellent. We have a long consulting arrangement ahead of us.” 2. Please give it a shot. Ecco turned to the children after Thomas left: “The longest protein in Dr. Thomas’s last problem had a length of only 18. It is therefore conceivable that nine two-amino-acid exons would have been sufficient. Our solution required 11. Could we have done better?” 3. What do you think? For the solution to last month’s puzzle, see page 70. DDJ http://www.ddj.com

SECTION

A

MAIN NEWS

Dr. Dobb’s

News & Views

IBM Previews Next-Generation DB2 Database IBM has unveiled details about Viper, its next-generation DB2 database that is designed to help manage and access data across service-oriented architectures (http:// www.ibm.com/db2/xml/). Viper will be the first database with both native XML data management and relational data capability. Scheduled for release in 2006, DB2 Viper will supposedly be able to seamlessly manage both conventional relational data and XML data without requiring the XML data to be reformatted or placed into a large object within the database. DB2 Viper also will simultaneously handle range partitioning, multidimensional clustering, and hashing, and provide XQuery support.

Smart Vehicles Show Off Among the technology demonstrations presented at the 12th World Congress on Intelligent Transport Systems (ITS) (http:// www.itsworldcongress.org/) were those involving: Vehicle-Infrastructure Integration (VII) technology, in which “smart” roads with roadside antennas wirelessly communicated information to cars equipped with on-board units — the communication network provides information about travel times and about warnings and locations of work zones or traffic incidents to the driver; Integrated Collision Warning Systems, in which conference attendees rode transit buses fitted with a front and side collision warning system designed for use on both highways and in dense urban environments; Automated Bus Rapid Transit Technology, in which buses were fitted with sensors, actuators, and computerbased processors that let them perform automated lane maneuvers and precisely dock at boarding platforms; and Smart Intersections, in which radar, GPS, and sensors were used to track the position of vehicles approaching intersections and activate warning signs. ITS is an organization of international researchers, industry professionals, and government officials developing advanced transportation technologies and deployment activities.

Microsoft Opens File Formats Microsoft has announced that it will open up and submit its file format technology for its Office produces —Word, PowerPoint, and Excel — to the Ecma International standards body. In turn, Ecma will develop and make available documenta16

tion of those formats. In addition, Microsoft will make available tools to enable old documents to make use of the open standard format.

Report Says Innovation Is Possible In a study entitled “Innovation, R&D and Offshoring,” University of California at Berkeley researchers Dwight Jaffee and Ashok Bardhan concluded that technological innovation — even if it takes place in emerging international markets — will not spell economic doom. According to their study (http://repositories.cdlib.org/ iber/fcreue/reports/1005/), new jobs and economic growth will result in the U.S., particularly in the Silicon Valley. Jaffee and Bardhan found that many large U.S. firms are increasingly sending R&D activities offshore by setting up affiliated, intrafirm R&D centers abroad. Their research also shows that smaller firms generally conduct their research in the U.S.— and tend to produce more innovation. At the same time, the authors found that the U.S. market could benefit from the geographical dispersion of innovation and research to India, China and other transitioning countries.

Iris Recognition Is an Eye Opener Researchers at the University of Bath have developed a biometric iris recognition system that uses the colored part of the eye to validate a person’s identity (http:// www.bath.ac.uk/elec- eng/pages/sipg/ irisweb/). According to Professor Don Monro of the Department of Electronic and Electrical Engineering, the algorithm at the heart of the system has produced 100 percent accuracy in initial trials. Monro and his team are currently road testing the technology using a specially constructed database containing thousands of iris images collected from students and colleagues at the university. Iris recognition, which is regarded as the most accurate biometric recognition technology, works by “unwrapping” a digital image of a person’s iris and creating a unique encrypted “barcode” that is stored in a database. The images are captured using a special camera and an infrared light source that helps get over problems caused by shadows and competing light sources. Hundreds of images can be captured in a few minutes, and the team selected 20 from each eye from each volDr. Dobb’s Journal, February 2006

DR. DOBB’S JOURNAL February 1, 2006

unteer. Monro hopes to build a database with 16,000 iris images.

Sun Announces Postgres Support, ZFS Filesystem Sun Microsystems will distribute the Postgres database with its Solaris 10 operating system. At the same time, the company announced integration of Solaris ZFS, a 128bit filesystem with error detection and correction capabilities, into OpenSolaris. Finally, Sun announced plans to integrate Solaris Containers for Linux applications, which lets companies run Red Hat binaries unmodified in Containers on Solaris 10 into OpenSolaris. The Solaris ZFS filesystem supports self-healing data through advanced error detection and correction, task automation that simplifies storage management — in some cases reducing task times from hours to seconds — and builtin storage virtualization that eliminates the complexity of a volume manager.

Financial Industry Is Always a Target In a recent study entitled “2005 Attack Trends: Beyond The Numbers,” security expert Bruce Schneier reports that criminals who are motivated by money are generally better funded, less risk-averse, and more tenacious than run-of-the-mill intruders who are in it for thrills (http://www.counterpane .com/cgi-bin/attack-trends2.cgi). Schneier also pointed out that, although the financial industry ranks second highest in attacks, it is actually the most vulnerable to criminal activity. Of the 13 major vertical markets tracked by Counterpane (the security company Schneier founded), approximately 50 percent of all targeted scans detected by Counterpane occurred within the financial industry. According to Schneier, damaging attacks such as Trojan viruses and bot networks are expected to increase. All categories of organizations are at risk, but the financial industry is expected to remain the highest risk vertical in the near term.

Security Threats: Cross-Platform Software For the first time, the SANS Institute has included cross-platform applications as targets in its annual list of top Internet security threats (http://www.sans.org/top20/). The list includes backup programs, media players, antivirus software, PHP-based applications, and database software, among others. http://www.ddj.com

Multiplatform Porting to 64 Bits Up-front planning is worth the effort BRAD MARTIN, ANITA RETTINGER, AND JASMIT SINGH

O

ne project we were recently involved in was the port of a large 32-bit application, which supported 11 platforms to a 64-bit environment. The number of lines of code in this application exceeded 300,000 lines. Considering that the 32-bit application had parts developed several years ago, there was every likelihood that the code had been modified by a variety of developers. For this and other reasons, we suspected that, among other problems, type mismatches that cause problems for a 64-bit port were likely introduced as modules were added or removed over time. We ported the 32-bit application to 64-bit to take advantage of the benefits of 64-bit technology— large file support, large memory support, and 64-bit computation, among other features. Our overall approach was an iterative one that alternated between zooming in on detailed issues such as byte order and refining compiler flags, to stepping back to look at global issues, such as ANSI compliance and future portability of source-code base. Our first step was to research 64-bit resources to learn about each of the 11 operating system’s compiler switches, memory models, and coding considerations. To define our starting point, we turned on the compiler warnings for one platform, ran a first build, and examined the build log’s messages. With these initial builds and later use of tools such as Parasoft’s Insure++ (http://www.parasoft.com/), lint, and native debuggers, we developed a road map of the issues we would encounter. From there, we proceeded to perform a complete inventory of the source code and examine every build configuration. After initial code modifications, debug sessions, and passes through build messages, we had enough information to sort out and prioritize realistic milestones and the specific tasks required to get there. We reached a significant milestone when we had a running application with enough basic functionality that it could be debugged by running it through our automated test suite, which consists of backward compatibility tests in addition to new tests built to exercise 64-bit features. If you have several 64-bit platforms as part of your conversion project, you might be tempted to work on one platform at a time. Once the application is running properly on the first platform, you might move on to the next platform, and so on. However, we found significant advantages to working on all platforms at the same time because: The authors are senior software engineers for Visual Numerics. They can be contacted at http://www.vni.com/. 20

• Each of the compilers provided different information in its warnings, and looking at the errors from several compilers can help to pinpoint problem areas. • Errors behave differently on different platforms. The same problem might cause a crash on one platform and appear to run successfully on another.

“Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications” A final consideration in approaching this project was to plan ahead for time required for the final release testing phase. Because our newly modified code base is shared across multiple 32-bit and 64-bit platforms, each 32-bit platform would need to be retested as thoroughly as our newly ported platforms, thereby doubling testing time and resources. Cross-Platform Issues There are a number of issues, ranging from compiler warnings to reading/writing binary data, that you can face when porting 32-bit applications that run on multiple 64-bit operating systems. Luckily, compilers can assist in determining 64-bit porting issues. Set the warning flags of the compilers to the strictest level on all platforms, paying close attention to warnings that indicate data truncation or assignment of 64-bit data to 32-bit data. However, one problem with compiler warnings is that turning on stricter warning levels can lead to an overwhelming number of warnings, many of which were automatically resolved by the compiler. The problem is that major warnings are buried within the mass of minor warnings, with no easy way to distinguish between the two. To resolve this issue, we enabled the warnings on multiple platforms and performed concurrent builds. This helped because different compilers give different warnings with different levels of detail. We then filtered the warnings using information from multiple compilers and were able to determine which warnings needed to be fixed.

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

(continued from page 20) Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications. In these situations, you have to examine your binary format for issues resulting from larger longs and pointers. This may require modifications to your read/write functions to convert sizes and handle any Little- or Big-endian issues for multiple platforms. To get the correct machine endianess, the larger data sizes in 64-bit applications require extended byte swapping. For example, a 32-bit long: Big Endian = (B0, B1, B2, B3)

can be converted to: Little Endian = (B3, B2, B1, B0)

while a 64-bit long: Big Endian = (B0, B1, B2, B3, B4, B5, B6, B7)

is converted to: Little Endian = (B7, B6, B5, B4, B3, B2, B1, B0).

Most compilers will find mismatched types and correct them during the build. This is true for simple assignments as well as most parameters passed to other functions. The real problems lay in the integer-long-pointer mismatches that are invisible to the compiler at compile time, or when an assumption the compiler makes at compile time is what produces a mismatch. The former concerns pointer arguments and function pointers, while the latter primarily concerns function prototypes. Passing integer and long pointers as arguments to functions can cause problems if the pointers are then dereferenced as a different, incompatible type. These situations are not an issue in 32-bit code because integers and longs are interchangeable. However, in 64-bit code, these situations result in runtime errors because of the inherent flexibility of pointers. Most compilers as-

In a 32-bit system, the structure would look like: 4 Bytes 4 Bytes

4 Bytes

4 Bytes

4 Bytes

Integer Long

Natural Boundary In a 64-bit system, the structure would look like: 8 Bytes 4 Bytes

4 Bytes

Integer

Padding

4 Bytes

4 Bytes

Long

Natural Boundary

Figure 1: Structure alignment in 32-bit and 64-bit systems. 22

sume that what you are doing is what you intended to do, and quietly allow it unless you can enable additional warning messages. It is only during runtime that the problems surface. Listing One, for example, compiles without warnings on both Solaris and AIX (Forte7, VAC 6) in both 32-bit and 64-bit modes. However, the 64-bit version prints the incorrect value when run. While these problems may be easy to find in a short example, it may be more difficult in much larger code bases. This sort of problem might be hidden in real-world code and most compilers will not find it. Listing One works properly when built as a 64-bit executable on a Little-endian machine because the value of arg is entirely contained within the long’s four least-significant bytes. However, even on Little-endian x86 machines, the 64-bit version produces an error during runtime when the value of arg exceeds its four least-significant bytes. With function pointers, the compiler has no information about which function will be called, so it cannot correct or warn you about type mismatches that might exist. The argument and return types of all functions called via a particular function pointer should agree. If that is not possible, you may have to provide separate cases at the point at which the function is called to make the proper typecasts of the arguments and return values. The second issue concerns implicit function declarations. If you do not provide a prototype for each function that your code calls, the compiler makes assumptions about them. Variations of the compiler warning “Implicit function declaration: assuming extern returning int” are usually inconsequential in 32-bit builds. However, in 64-bit builds, the assumption of an integer return value can cause real problems when the function returns either a long or a pointer (malloc, for example). To eliminate the need for the compiler to make assumptions, make sure that all required system header files are included and provide prototypes for your own external functions. Hidden Issues There are, of course, issues that may not be readily apparent at the beginning of the project. For instance, in 64-bit applications, longs and pointers are larger, which also increases the size of a structure containing these data types. The layout of your structure elements determines how much space is required by the structure. For example, a structure that contains an integer followed by a long in a 32-bit application is 8 bytes, but a 64-bit application adds 4 bytes of padding to the first element of the structure to align the second element on its natural boundary; see Figure 1. To minimize this padding, reorder the data structure elements from largest to smallest. However, if data structure elements are accessed as byte streams, you need to change your code logic to adjust for the new order of elements in the data structure. For cases where reordering the data structures is not practical and the data structure’s elements are accessed as a byte stream, you need to account for padding. Our solution for these cases was to implement a helper function that eliminates the padding from the data structure before writing to the byte stream. A side benefit to this solution was that no changes were required on the reader side; see Listing Two. Arrays 64-bit long type arrays and arrays within structures will not only hold larger values than their 32-bit equivalents, but they may also hold more elements. Consider that 4-byte variables previously used to define array boundaries and allocate array sizes may also need to be converted to longs. (For help in determining whether existing long arrays should be reverted to integer type for better performance in your 64-bit application, see http://developers .sun.com/prodtech/cc/articles/ILP32toLP64Issues.html.)

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

(continued from page 22) Coding Practices and Porting Considerations In addition to following the standard 64-bit coding practices recommended in your operating system’s compiler documentation and noted in the resources listed in the Resources section, here are a few considerations and coding tips that will help when planning a 64-bit migration project: • Convert your source-code base to ANSI C/C++, if possible and realistic. This simplifies your 64-bit port and any future ports. • Does your target operating system support both 32- and 64bit applications? Find this out ahead of time, as it will impact project decisions. For example, on Solaris, use the system command isainfo to check compatibility with both 32-bit and 64bit applications: % isainfo -v 64-bit sparcv9 applications 32-bit sparc applications

Pragmatic Exceptions . . . .

• If your source code is not already managed under a versioncontrol system such as CVS (http://www.nongnu.org/cvs), it will be helpful to implement one before porting your code. Due to the large number of global changes we needed to make for porting, we needed to revert to previous code much more often than normal. This made having a version-control system extremely beneficial. • Does your application use and load 32 bit, third-party libraries? If so, it is better to decide during the planning phase whether these libraries should be upgraded to 64 bit. If long data and pointers are not transferred between your main application and third-party library, then possibly no 64-bit migration is necessary for the library as long as the operating system is capable of running both 32-bit and 64-bit applications. If the operating system does not have this dual capability, plan on

24

taking the steps required to migrate the third-party application to 64 bit. • If your application dynamically loads libraries at runtime and still uses the old calls for load( ), switch to dlopen( ) to correct data-transfer problems between the main application and the library module. This is especially true for older AIX applications coded before dlopen( ) was available. To enable runtime linking on AIX, use the -brtl option to the linker with the -L “:” option to locate libraries. For compatibility, both your main application and all libraries loaded with dlopen( ) will need to be compiled using runtime linking. • Consider backwards compatibility. When porting to 64-bit platforms, backwards compatibility issues will be even more critical. Consider enhancing your current test suite to include both older 32-bit tests and new 64-bit tests. Tools Performing a source-code inventory for a large code base shared across several platforms for 32-bit to 64-bit migration and assessing the scope of each change, however trivial, can prove to be a daunting task. The potential to overlook conversion problems and introduce new errors is high. However, by using a small arsenal of 64-bit tools and techniques, many of these potential problems can be caught during the precompilation stage, at compile time, and at runtime. Some of the tools available are: • Precompilation stage. A pass using lint, available with the compiler using the -errchk=longptr64 flag, is effective in catching type conversion mismatches, implicit function declarations, and parameter mismatches. Example 1 shows typical lint warnings that are red flags for 64 bit. Other lint-type applications are also available, such as FlexeLint (http://www.gimpel.com/html/products.htm). • Compile-time techniques. Adjust your compiler warning levels so warnings are not suppressed, at least during the initial

Tip #6: Don’t Throw Logs Other than in tornados, logs aren’t thrown. They’re sawed, chopped, rolled, turned, burned, floated, and even written—but never thrown. This simple physics applies to programming as well. In other words, you shouldn’t throw exceptions that have already been logged. And yet, I’ve seen code such as this: try { // something that generates an exception . . . } catch( Exception x ) { Logger.log(x); throw x; }

Bad. This will most likely result in seeing the same exception message at very different points in your program’s execution. The problem is, it’s the same error! Trying to debug this is confusing at best. It also sends an ambivalent and confusing message to the callers of your function. The pitcher is saying to the catcher, “I’ll log this now but, well, I’m not sure…it could be fatal…perhaps you should deal with it, too?” This isn’t only weak minded; it’s also lazy and pathetic. —Benjamin Booth http://www.benjaminbooth.com/

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

stages of the project. For multiplatform environments, take advantage of the fact that different operating systems compiling the same source code will complain about different issues. Clearing these warnings should benefit all platforms. • Compile-time/Runtime tools. Advanced tools, such as Insure++ or Purify for 64-bit for at least one base platform, are a huge benefit in any development environment for both runtime and compile-time issues. • Runtime tools. Try dbx, provided with each UNIX compiler, and ddd (data display debugger), a graphical interface for dbx and gdb on UNIX (http://www.gnu.org/software/ddd/). Conclusion Taking the time to do up-front planning and investigation is worth the effort. Don’t get discouraged when nothing in your application is working correctly. Methodical and careful passes through the code will uncover the problem areas. With availListing One #include <stdlib.h> #include <stdio.h> int Func1(char *); int main() { long arg, ret; arg = 247; ret = Func1((char *)&arg); printf("%ld\n", ret); return(0); } int Func1(char * input) { int *tmp; tmp = (int *)input;

warning: implicit function declaration: main warning: argument does not match remembered type: arg #1 warning: passing 64-bit integer arg, expecting 32-bit integer: MyProc(arg 7) warning: assignment of 64-bit integer to 32-bit integer warning: function argument ( number ) used inconsistently warning: comparing 32-bit integer with 64-bit integer

Example 1: Typical lint warnings. able memory and dataset sizes growing tremendously each year, the benefits of a 64-bit application are worth the pain of conversion. DDJ return(*tmp); }

Listing Two typdef struct demo{ int i; long j; } DEMO; DEMO test; /*pout_raw outputs raw bytes to a ﬁle */ /* output each element of a structure to avoid padding */ pout_raw ((int) ﬁle_unit, (char *) test.i, sizeof (test.i)); pout_raw ((int) ﬁle_unit, (char *) test.j, sizeof (test.j)); /* the following line of code includes padding */ pout_raw ((int) ﬁle_unit, (char *) test,sizeof(test));

DDJ

Mac OS X Tiger & 64 Bits Migrating to 64 bits only when you need to RODNEY MACH

M

ac OS X Tiger is the first version of the Macintosh operating system that supports 64-bit computing, thereby letting you fully exploit the 64-bit PowerPC G5 processor. However, this does not necessarily mean that you should migrate every application to the 64-bit platform. Most OS X apps don’t need to be ported to 64-bit, and in fact will execute faster as 32-bit applications. The main reason you might want to make an application 64-bit is if it needs to access more than 4 GB of memory. Applications in this category include scientific and engineering programs, rendering applications, and database apps. So before looking at what’s necessary to port your applications to 64-bit, it is a good idea to examine the circumstances that don’t require applications to be ported to 64-bit:

• 64-bit data types. You don’t need to port to 64-bit to gain access to 64-bit data types. For example, long long and int64_t are 64 bit and can be used by 32-bit applications. • Faster code. You should not port to 64bit if your code is performance sensitive and highly tuned for 32-bit. The increased size of 64-bit pointers and long can cause increased cache pressure, as well as increased disk, memory, and network usage, which can lead to application performance degradation.

• 64-bit math. You don’t need to port to 64-bit to do 64-bit arithmetic with OS X on 64-bit PowerPC G5 hardware. The PowerPC supports 64-bit arithmetic instructions in 32-bit mode. You can use the GCC options -mcpu=G5 to enable G5-specific optimizations, as well as -mpowerpc64 to allow 64-bit instructions. Using these two options enables performance gains in 32-bit applications. Apple has announced that the Mac platform will be transitioning to Intel. Intel processors, such as the 64-bit Intel Xeon, require applications to be 64bit to take advantage of the additional 64-bit general-purpose registers (unlike the PowerPC). Therefore, you may need to reevaluate decisions to port to 64-bit once more details about the Intel on Mac architecture become available — especially if your code is integer intensive.

64-Bit Clean Once you determine that an application does need to be 64 bit, then you should make your code “64-bit clean.” The 64bit C data model used by Mac OS X (and all modern UNIX derivatives) is commonly referred to as “LP64.” In the LP64 data model, ints are 32 bit, while longs and pointers are 64 bit. The 32-bit data model is referred to as “ILP32,” and ints, longs, and pointers are all 32 bit. This difference in the size of long and pointer between ILP32 and LP64 can cause truncation issues in code that assumes the same width as int. Many of these 64-bit porting bugs can be detected by using the -Wall -Wformat Wmissing- prototypes -Wconversion Wsign-compare -Wpointer options with GCC. (For more information on general 64bit porting issues, refer to my article “Moving to 64-Bits,” C/C++ Users Journal, June 2005; http://www.cuj.com/documents/ s=9796/cuj0506mach/0506mach.html.) However, there is a 64-bit caveat: Support for 64-bit programming is not available throughout the entire OS X API for 64-bit computing on OS X Tiger. For example, application frameworks such as Cocoa and Carbon are not yet available for 64-bit development. This means you cannot simply recompile 32-bit GUI apps as 64 bit on OS X— only command-line apps can be recompiled as 64 bit. However, this doesn’t mean GUI applications cannot take advantage of 64-bit computing. In the rest of this article, I examine how you work around this issue by porting an example 32-bit OS X GUI application to 64-bit.

Rodney Mach is HPC Technical Director for Absoft. He can be contacted at rwm@ absoft.com.

The Demo Application The 32-bit demo application that I 64-bit enable here is a simple “array lookup” ap-

26

Dr. Dobb’s Journal, February 2006

plication. Users enter an index of the array, and the application returns the array value at that index; see Figure 1. I want to migrate this application to 64 bit to take advantage of arrays greater than 4 GB. The GUI in this example is written in Qt 4 (http://www.trolltech.com/), an opensource C++ application framework that makes it straightforward to write crossplatform native GUIs (Carbon on OS X). At Absoft (where I work), all of our crossplatform developer tools are written in Qt

“The 64-bit C data model used by Mac OS X is commonly referred to as ‘LP64’” for easy maintenance, and native speed on all of our supported platforms (Windows, Linux, and OS X). If your application is not Qt based and uses native OS X APIs, the strategy I present here still applies. The Methodology To convert the 32-bit demo application to 64 bit, I split the 32-bit application into two parts to work around the limitation that only command-line apps can be 64 bit on OS X: • A 64-bit command-line server that does the necessary 64-bit operations such as array allocation and management. • A 32-bit GUI that displays result and interfaces with users. The existing GUI is refactored to launch and communicate with the server. This is the same strategy we used at Absoft with our 64-bit Fx2 debugger on OS X Tiger. The debugger is a 32-bit UI that communicates with a 64-bit back end. Refactoring the application into a 64-bit executable and 32-bit GUI is the most difficult task for most GUI applications. Once you have identified a strategy for 64-bit enabling of the application, you http://www.ddj.com

must decide on the communication method between the 64-bit server and 32bit GUI client. There are several mechanisms you can use for communication: • Communicate using message passing between STDIN and STDOUT of the 64bit application. • Use UNIX Domain sockets for same host communication. • Use TCP/IP client/server mechanisms. • Use shared memory or other IPC mechanism. The method you select depends on the application. The implementation I present here is based on UNIX Domain sockets. UNIX Domain sockets are lightweight, high-performance sockets that enable communication between processes on the same host. If you are familiar with standard TCP sockets, you will find UNIX domain sockets easy to master. UNIX Domain sockets also assist in future proofing your code by enabling an easy upgrade path to more heavyweight TCP sockets. For example, a future version of your application could have the server run on a PowerPC-based Mac, and the GUI client on the Intel-based Mac. Creating the Server The server handles allocating the array so you can access more than 4 GB of memory. It also provides an interface that a client can use to look up values from the array. This server can be tested independently of the GUI, letting you hammer out the client-server interaction before refactoring the GUI. Use fixed-width datatypes for sharing between ILP32 and LP64. Listing One (server.c) is the server source code. In lines 16–18 of Listing One, the code uses fixedwidth datatypes such as uint64_t instead of unsigned long long. It is a good practice to use fixed-width datatypes when sharing data over a socket, or sharing data on disk between ILP32 and LP64. This guarantees that the size of the data does not change while communicating between the two different data models. It also future proofs your code against changes in the width of fundamental datatypes and saves you headaches in the future. These fixedwidth datatypes were introduced by C99, and are located in the header file <stdint.h>. While this C99 feature is not technically part of the C++ Standard, it is a feature supported by most C++ compilers (such as Absoft 10.0 a++ and GNU g++). Use the _LP64_ macro to conditionally compile 64-bit-specific code. When maintaining a single code base for 32- and 64-bit code, you may want to conditionalhttp://www.ddj.com

ly compile the code depending on whether it is 64 bit or 32 bit. In this case, I want the defined ARRAY_SIZE on line 18 to be larger when compiled as 64-bit to take advantage of larger memory. Listing Two (_ _LP64_ _) is the macro to use on OS X. In UNIX Domain sockets, a pathname in the filesystem (“/tmp/foo,” for instance) is used as the address for the client and server to communicate. This filename is not a regular filename that you can read from or write to — your program must associate this filename with a socket in order to perform communication. You can identify this special socket using the UNIX command ls -laF on the file; you will see a “=” appended to the filename indicating it is a socket: % ls -laF /tmp/sock srwxr-xr-x 1 rwm wheel 0 Oct 29 21:51 /tmp/sock=

Figure 1: The 32-bit GUI app. tee that all the bytes requested will be read/written in one call. Wrapper functions are used to ensure all bytes are read/written as expected (see util.c, available electronically, “Resource Center,” page 6).

• The first argument is the family type. In this case, I use AF_LOCAL to specify UNIX Domain family. • The second argument of SOCK_STREAM type provides sequenced, reliable, twoway connection-based bytestreams for this socket. • The final argument selects the protocol for the family. In this case, zero is the default.

Creating the Client To test the server, create a C client that connects to the server, requests an array index, and fetches the result. You can use this client to test the server interaction before having to refactor the GUI. The client uses the socket and connect calls to talk to the server; see Listing Three for the implementation of the client lookUp function. The client code should be easy to follow because it is similar to the server but uses the connect system call to connect to the already existing server socket. You may wonder why the server and client were not written in C++. The main reason is portability. C socket implementations are portable to a variety of platforms without the need for third-party libraries or a roll-your-own implementation. If you do need to code the client/server in C++, Qt provides a QSocket class that you can extend to support UNIX Domain sockets.

In lines 30–33 of Listing One, I set up the sockaddr_un structure with the filename to use. Note that the SOCK_ADDR filename is defined in the absoft.h header file (Listing Two) as a UNIX pathame “/tmp/sock.” The filename is arbitrary, but must be defined the same in both the client and server, and must be an absolute pathname. Be sure to delete this file as it may have been left over from a previous instance on line 35 and ensure that the bind call succeeds. Next, on line 37, I bind the unnamed socket previously created with the name I just configured. Finally, on line 42, I use the listen call to begin accepting connections on this connection. On line 46, I sit in a loop and wait to accept connections from the client. Once you have received a connection, you read in the array index the user selected on line 54, and return the array value on line 64. Note the use of readn and written functions. Regular read/write do not guaran-

Refactoring the GUI At this point, you have a server that allocates the array, and a client that can call the server and fetch values from the server. It is now time to tackle the messy part — refactoring the GUI. You must identify everywhere the GUI currently manipulates or queries the array directly, and direct it to use the client function call instead. Luckily, only one method, Viewer::lookupArray( ) in line 52 of Viewer.cpp (available electronically), is used to look up values in the array. This method is modified on line 54 to call the client lookupUp function in a thread. To leave the original behavior intact, wrap the new functionality in a DIVORCE_UI define statement so you can conditionally compile-in changes. To simplify the code, I made all network calls blocking. You can’t issue a blocking call from the UI thread in Qt (and most GUI frameworks) without making (continued on page 30)

Returning to the server code in Listing One, the server must be prepared to accept connections, which is done via the socket, bind, and listen calls. On line 26 of Listing One, the socket call creates an endpoint for communication, returning an unnamed socket. The socket call takes three arguments:

Dr. Dobb’s Journal, February 2006

27

(continued from page 27) the UI unresponsive to users. Therefore, I issue the blocking call to the server inside a thread, and have the thread alert the UI when the blocking network communication has completed. See the FetchDataThread.cpp class (Listing Four) for the implementation of my thread wrapper to the fetchData function. The run( ) method in Listing Four calls the blocking lookupValue function call defined in Listing Three. The method locks a mutex around critical data to ensure thread safety. In line 27 of Viewer.cpp, I use the Qt emit keyword to emit a signal containing the result received from the server. The GUI receives this method by connecting a “slot” in Qt parlance to the “signal” from the FetchDataThread thread (see lines 40– 43 in Viewer.cpp). The end result is the showResult method in Viewer.cpp. It is called to display the results from the server and enable the Lookup button in the application. Starting and Stopping the Server The final piece of the puzzle is to have the GUI automatically start the 64-bit server to make the split appear transparent. The main( ) function in Viewer.cpp uses the Qt class QProcess to launch the serv-

er executable on lines 83 – 88, and shuts the server down on lines 93 – 97 before the applications exits. Creating a Universal Binary You may want to ship 32-bit and 64-bit servers so your application can run on a wide variety of Macintosh hardware. Instead of shipping multiple versions of the application, you can create a Universal Binary (also called a “Fat Binary”) that lets you ship one server binary that is both 32 bit and 64 bit. A Universal Binary automatically selects the correct code, depending on the user’s system without additional coding or user intervention. It is straightforward to create a Universal Binary using Xcode, or using the lipo tool shipped with OS X. Lipo “glues” your 32-bit and 64-bit applications into one binary. Listing Five is an example makefile that creates a Universal Binary for the server presented here. Use the UNIX file command to examine the resulting binary: % file server server: Mach-O fat file with 2 architectures server (for architecture ppc): Mach-O executable ppc server (for architecture ppc64): Mach-O 64-bit executable ppc64

Listing One #include #include #include #include #include #include #include #include #include #include

/* Unlink file to make sure bind succeeds. Ignore error */ unlink(SOCK_ADDR); /* Bind to socket */ if (bind(listenfd, (struct sockaddr *)&server, sizeof(server)) < 0 ) { perror("bind"); exit(2); } /* Listen on socket */ if (listen(listenfd, LISTENQ) < 0 ) { perror("listen"); exit(3); } for(;;) { printf("Waiting for a connection...\n"); clientlen = sizeof(client); if ((clientfd = accept(listenfd, (struct sockaddr *)&client, &clientlen)) < 0) { perror("accept");

30

% qmake ; make ; make -f Makefile.server

at the command line. The qmake utility (included with Qt) creates a Makefile for building the GUI from the Viewer.pro file in Listing Six. The Makefile.server builds the server as a Universal Binary. Once the build has completed, you can execute the 64-bit enabled Viewer application by running it from the command line: %./Viewer.app/Contents/MacOS/Viewer

Conclusion With its UNIX heritage and innovative features such as Universal Binaries, OS X is a great 64-bit platform to develop 64-bit applications on. Migrating command-line applications to 64-bit is straightforward, and the strategy I’ve outlined here will help you in 64-bit enabling your GUI applications to harness the full power of Mac OS X Tiger. DDJ

exit(4); } /* Read the array index UI has requested */ readn(clientfd, &x, sizeof(x)); printf("Read in request for array element %d\n", x); if ( x > ARRAY_SIZE || x < 0 ) { /* Error */ result = 0; } else { result = bigarray_[x]; } /* Print specifier for unsigned 64-bit integer*/ printf ("Server sending back to client: %llu\n", result); if (writen(clientfd, &result, sizeof(result)) < 0 ) { exit(5); } close(clientfd);

<stdio.h> <stdlib.h> <errno.h> <string.h> <sys/types.h> <sys/socket.h> <sys/un.h> "absoft.h"

int main(int argc, char *argv[]) { int listenfd,/* listen socket descriptor */ clientfd, /* socket descriptor from connect */ i; int32_t x; /* array index from the client */ uint64_t result; /* result sent to client */ static uint64_t bigarray_[ARRAY_SIZE]; socklen_t clientlen; struct sockaddr_un server, client; /* Initialize array with random values */ for ( i = 0 ; i < ARRAY_SIZE ; i++ ) { bigarray_[i] = 10000000000000000000ULL + i; } /* AF_LOCAL is Unix Domain Socket */ if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0) { perror("socket"); exit(1); } /* Setup socket info */ bzero((char *) &server, sizeof(server)); server.sun_family = AF_LOCAL; strncpy(server.sun_path, SOCK_ADDR, sizeof(server.sun_path));

Building and Running the Application To build the application after you have installed Qt, enter:

} exit(0); }

Listing Two 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

#ifndef ABSOFT_H #define ABSOFT_H #include <stdint.h> #include <stdlib.h> #define SOCK_ADDR "/tmp/sock" #define LISTENQ 5 /* When compiled as 64-bit, use larger array * (for demo the size is just 1 larger then 32-bit */ #ifdef __LP64__ #define ARRAY_SIZE 1001 #else #define ARRAY_SIZE 1000 #endif /* __LP64__ */ /* Protos */ ssize_t readn(int fd, void *vptr, size_t n); ssize_t writen(int fd, const void *vptr, size_t n); uint64_t lookupValue(int32_t x); #endif

Listing Three 1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <errno.h> 4 #include <string.h> 5 #include <sys/types.h> 6 #include <sys/socket.h> 7 #include <sys/un.h> 8 #include <sys/uio.h> 9 #include <sys/fcntl.h>

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

#include #include <stdint.h> #include #include "absoft.h" /* Lookup array value at index x * by connecting to unix domain socket */ uint64_t lookupValue(int32_t x) { int s; struct sockaddr_un remote; uint64_t result; if ((s = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0 ) { perror("socket"); return(0); } bzero(&remote, sizeof(remote)); printf("Trying to connect...\n"); remote.sun_family = AF_LOCAL; strcpy(remote.sun_path, SOCK_ADDR); if (connect(s, (struct sockaddr *)&remote, sizeof(remote)) < 0) { perror("connect"); return(0); } printf("Connected and sending %d\n", x); if (writen(s, &x, sizeof(x)) < 0 ) { perror("send"); return(0); } readn(s, &result, sizeof(result)); printf ("Client received result from server = %llu\n", result); close(s); return result; }

Listing Four 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

#include "FetchDataThread.h" FetchDataThread::FetchDataThread(QObject *parent) : QThread(parent) { } FetchDataThread::~FetchDataThread() { cond.wakeOne(); wait(); } void FetchDataThread::fetchData(const int32_t x) { // Hold mutex until function exits QMutexLocker locker(&mutex); this->x = x; if (!isRunning()) start(); else cond.wakeOne(); } void FetchDataThread::run() { QMutexLocker locker(&mutex); int32_t xv = x; // This is the call that blocks uint64_t result = lookupValue(xv); /* Minimal error checking. Returns 0 if error */ if ( result == 0 ) { emit errorOccured("Error looking up value"); return; } else { QString str; emit fetchedData( str.setNum(result) ); } }

Listing Five CFLAGS= -Wall -Wformat -Wmissing-prototypes -Wconversion -Wsign-compare -Wpointer-arith all: server server32: util.c server.c gcc $(CFLAGS) -m32 util.c server.c -o server32 server64: util.c server.c gcc $(CFLAGS) -m64 util.c server.c -o server64 server: server32 server64 lipo -create server32 server64 -output server clean: rm -rf server32 server64 server

Listing Six # Use The Qt utility "qmake" to build # a Makefile from this file TEMPLATE = app CONFIG += qt release TARGET += DEPENDPATH += . INCLUDEPATH += . DEFINES += DIVORCE_UI HEADERS += Viewer.h HEADERS += absoft.h HEADERS += FetchDataThread.h SOURCES += client.c SOURCES += util.c SOURCES += Viewer.cpp SOURCES += FetchDataThread.cpp

DDJ

PROGRAMMER’S TOOLCHEST

Ajax: Asynchronous JavaScript and XML H

script can modify its HTML by manipulating parts of the DOM. By including islands of XML data within the HTML, a page’s script can show or hide portions of the page and its data based on user actions. This technology is known as Dynamic HTML (DHTML). To avoid embedding potentially large amounts of XML data within a single page, Microsoft added the XMLHttpRequest object to its Internet Explorer. This object can be used to dynamically make an HTTP request to the server, receive XML as the response, and use that XML to update portions of the currently displayed page.

Suggest page. So how does this work? Ajax is used to make asynchronous requests to the Google servers with the letters you’ve typed. As you continue to type, response data is received, and the list on the page is updated dynamically. This interaction does not impact you negatively in any way; you don’t need to wait for the responses; you don’t wait while the entire page is refreshed; and the results are useful. Compare this to the more simple interaction that takes place with a static web application.

“The server merely delivers raw data in the form of XML that the client uses to update portions of the HTML”

Eric is a consultant in New York, and has worked extensively in Java and C++ developing real-time trading and financial applications. He can be contacted at eric@ ericbruno.com.

Introducing Ajax Although the individual techniques and capabilities have been around for some time, the use of HTML, XML, JavaScript, and the XMLHttpRequest object to form a dynamic web-based application has more recently become known as “Asynchronous JavaScript and XML” (Ajax). The name defines the design pattern commonly used to create dynamic web pages, and has helped to define a common model that many browsers now support, such as Mozilla Firefox, Microsoft Internet Explorer, Opera, Konqueror, and Apple Safari. How much impact can Ajax really have on a web application? To answer this, you need to witness it for yourself. One of the strongest demonstrations of the usefulness of this technique in web design is the application, Google Suggest (notice I didn’t call it a web site). Start your browser and go to the beta version of Google Suggest (http://www.google.com/webhp?complete= 1&hl=en). Think of an obscure word or phrase to search for, and type it into the edit box on the page. As you type each letter, a list appears below the edit box that contains the best 10 matches (words and/or phrases) for what you have typed; see Figure 1. As you type each letter, the suggestions are refined until, more than likely, the word or phrase you were going to type is right there in the list. Simply navigate to the entry in the list with the mouse or keyboard and save yourself some typing. Obviously, Google does not deliver a dictionary of words with the initial Google

Creating dynamic web pages ERIC J. BRUNO TML was created to enable the publication and display of documents within a specialized browser application. The real power of HTML is its ability to link objects (text and images) on one document to other — entirely separate — documents. The end result is a global set of document pages that connect to one another much like a web; hence the name, “World Wide Web.” More important, HTML describes its content and its visual formatting in a manner independent from the actual viewer application. This abstraction led to HTML’s immediate popularity, as people were able to create content to be displayed equally on any computer, with any operating system, anywhere on the globe. Despite its popularity, HTML suffers from one drawback— it’s static by nature. In a world full of cheap computing power and otherwise rich desktop applications, static web-based applications seem primitive. It wasn’t long before the two most popular web browsers of the time, Netscape Navigator and Internet Explorer, added support for scripting languages such as JavaScript. With the addition of script, a web page could be updated in the browser without the need to request a new page from the server. This began an age of dynamic web pages, much like those we see today that contain popup menus, tool-tips, and the like. Modern browsers make their HTML content available to embedded script code through an object hierarchy called the Document Object Model (DOM). A page’s

32

Dr. Dobb’s Journal, February 2006

Figure 2 shows the basic interaction between a web browser and a web server: 1. The browser makes an HTTP request to the web server. 2. The web server returns HTML to the browser over HTTP. 3. The browser renders the HTML, and waits to repeat this cycle. With Ajax, the pattern of communication between the browser and the web server is more involved than with a static web page. There is a paradigm shift in the notion that after the initial page is delivered, the HTML lives at the client, not the server. The server merely delivers raw data in the form of XML that the client uses to update portions of the HTML already on display. Figure 3 shows the more complex interaction that occurs between the Ajax client and server: http://www.ddj.com

1. In response to a user event, JavaScript on the page makes a request to the server, using XMLHttpRequest. 2. The XMLHttpRequest object sends the special request over HTTP to the server. 3. The server (a web or application server of some sort) receives the request, retrieves some data, and returns it to the client over HTTP, formatted as XML. 4. The XMLHttpRequest object provides the data to a JavaScript function on the page. 5. JavaScript on the page updates a portion of the HTML with the data retrieved. For example, a list box may be filled; text entries may be validated, and so on. Google Suggest is not the only example of Ajax in action. Other Ajax applications include: • Microsoft Outlook and its web interface; arguably one of the first Ajax applications. • Google Maps (http://maps.google.com/). • Google Groups (http://groups.google .com/). • Google GMail (http://mail.google.com/). • Amazon A9 (http://www.a9.com/). • Writely, a free online word processor (http://www.writely.com/). • Flickr, a photo-sharing site (http://www .flickr.com/). As a matter of fact, I’m writing this article using the Writely word processor. Using Ajax You can develop an Ajax application with nothing more than a browser and a web server that provides some sort of CGI support (such as Apache Tomcat), where you can run server-side code, such as PHP or Java. However, there are toolkits that make it easier to develop Ajax applications — Dojo (http://dojotoolkit.org/), GLM from SourceForge (http://sourceforge.net/ projects/glm-ajax/), and DWR from Getahead (http://getahead.ltd.uk/dwr/). I’m using DWR here because it’s an Apachelicensed, open-source, Java-based toolkit, and I like its development paradigm. DWR’s development paradigm is interesting because it lets you develop server code as plain old Java objects (POJOs), which you can access from JavaScript within the browser. The JavaScript uses the server-side objects as though they were local; DWR uses Ajax as a proxy between the browser and the server. Nothing gets downloaded to the browser besides the HTML page that has JavaScript embedded. The DWR Java Servlet running on the server transparently maps the Ajax requests and responses to and from the POJOs you supply (Figure 6). DWR also integrates well with frameworks such as Struts, Spring, and Hibernate. http://www.ddj.com

Sample Magazine Archive Viewer The application I present here is a magazine archive viewer, meant to display article content from back issues of your favorite magazines, such as DDJ (Figure 4). To run the sample Ajax application, you need to download the DWR toolkit (http://getahead.ltd.uk/dwr/download/). You can download the toolkit’s JAR file to add to an existing Java-based web application, a WAR file to deploy as its own web application, as well as the complete source to both. You will also need a Java Servlet-enabled server, such as Apache Tomcat (http://jakarta.apache.org/tomcat/). Finally, you can download the sample magazine viewer application (available electronically; see “Resource Center,” page 6). To create the web application, create a subfolder named “DDJViewer” in the webapps folder where you have Tomcat (or another Servlet container) installed. Copy the file, main.html, to this folder. Next, create a subfolder named WEB-INF within the “DDJViewer” folder. Copy the files dwr.xml and web.xml into this folder. Next, within the WEB-INF folder, create two subfolders named “classes” and “lib.” Copy the file MagViewerImpl.class into the classes folder. Finally, copy the files dwr.jar and xalan.jar into the lib folder. The resulting directory structure, with proper file placement, should look like Figure 5. The MagViewerImpl.java class delivers all of the magazine content to the caller. The methods are:

ple, the choices are “Dr. Dobbs Journal,” “C/C++ Users Journal,” and “Software Development Magazine.” • getPublicationYears returns a list of years for which articles are available. • getYearTopics returns a list of the monthly topics for the specified magazine and year. • getIssueDetails returns a list of article titles for the magazine issue specified. • getArticle returns the content of an article for the magazine issue specified.

Figure 3: Ajax-enhanced browser/server interaction.

• getMagazines returns a list of magazines whose articles are available. In this sam-

Figure 4: Sample Ajax-based application.

Figure 1: Google Suggest.

Figure 2: Standard interaction between web browser and web server. Dr. Dobb’s Journal, February 2006

Figure 5: Directory structure for the sample J2EE/Ajax web application. 33

The file, dwr.jar, contains the Java Servlet and supporting Java code for the DWR toolkit. The file, MagViewerImpl.class, is the sample class that implements the aforementioned methods, which are referenced in the JavaScript within the web application’s HTML page. To instruct DWR to expose the methods of any class to the client, you need to add entries into the dwr.xml file. The contents of this file for the sample application are:

<param name="class" value= "MagViewerImpl"/>

This file tells DWR which classes to allow the client JavaScript to create and access. In this case, the and <param> tags specify that there is a Java class MagViewerImpl that the JavaScript can access as MagViewer. Finally, there is the HTML file itself, main.html. This web page contains four drop-down list boxes, and one text field. The list boxes allow you to drill down to a specific magazine article, and the text box displays the article contents. Within

Figure 6: DWR acts as a proxy between the client JavaScript and the server’s Java classes.

34

Dr. Dobb’s Journal, February 2006

the HTML section, DWR requires you to reference the JavaScript that your web page will access. In the sample application, I use the standard DWR engine and utility script, dwr/engine.js and dwr/util.js, respectively. The application also references the script generated for our application, dwr/interface/MagViewer.js. These references are: DDJ Ajax Demo Application <script type='text/javascript' src= 'dwr/interface/MagViewer.js'> <script type='text/javascript' src= 'dwr/engine.js'> <script type='text/javascript' src= 'dwr/util.js'>

The remainder of the page contains the JavaScript that forms the dynamic nature of the web application. The function init (Listing One) is set up as the function to call when the page is first loaded. This function tells DWR to display errors and warnings as pop-up alerts. This is something you may want turned on for debugging, but turned off when deployed to production. Next, the call to DWR’s useLoadingMessage function tells DWR to display a message whenever an Ajax request for data is made to the server. The text “Loading…”

http://www.ddj.com

will be displayed in red in the upper righthand corner of the page as it waits for the Ajax response. Once the response is received, the message goes away. Finally, the page’s update function is called. Here, a call is made to the MagViewerImpl class’s getMagazines method. In the Java implementation for this method, there are no parameters. However, in the JavaScript, the parameter createMagList is supplied. This parameter is not passed to the server, but is instead the name of the JavaScript function that will be called to receive the response data. Because each Ajax call is asynchronous and we don’t want to block the page while it waits for data from the server, DWR requires that the first parameter to any server call be a callback function to handle the response (Figure 6). The callback function is defined with the single parameter, data, because it contains the data returned from the server. This code is: function update() { MagViewer.getMagazines(createMagList); } function createMagList(data) { DWRUtil.removeAllOptions("maglist");

DWRUtil.addOptions("maglist", data); }

The result is that the first drop-down list is populated with the magazine names returned from the server. You can write this DHTML code yourself, but the DWRUtil class referenced in the JavaScript makes it easier to perform this task. Each dropdown list on the page has defined, within the HTML, a function to be called when a selection is made: <select id="maglist" onclick="populateYearList();" style="vertical-align:top;">

In this example, when a selection is made from this drop-down list (which contains magazine names), the JavaScript function populateYearList is called. This function, in turn, calls the getPublicationYears method on the MagViewerImpl class, which results in the population of the second drop-down list. This pattern is repeated with the remaining drop-down lists; as each selection is made, a request is made to the server, and the next drop-down list is populated. Finally, once a specific article is chosen in the last drop-down list, the article contents are requested and displayed within the text

Listing One

Conclusion Ajax has helped redefine a technique that has been implemented for years, but has never been standardized. With more dynamic web applications appearing each day, combined with the increasing adoption of broadband Internet connectivity, the browser is being transformed into a rich desktop application. Ajax, and toolkits that support it, are turning the Web and its protocols into more of a dialog as opposed to a one-way, browser to server, conversation. Look for increasing integration of Ajax tools and techniques with popular web and application servers, such as Tomcat and WebSphere. It will be interesting to see what variants of the Ajax technique arise. However, the most important artifacts that will come from Ajax are the powerful, dynamic, web applications that transform our use of the Web. DDJ

{

if (window.addEventListener) { window.addEventListener("load", init, false); } else if (window.attachEvent) { window.attachEvent("onload", init); } else { window.onload = init; } function init() { DWREngine.setErrorHandler(function(message) { alert(message); }); DWREngine.setWarningHandler(function(message) { alert(message); }); DWRUtil.useLoadingMessage(); update(); }

Listing Two

Choose magazine: <select id="maglist" onclick="populateYearList();" style="vertical-align:top;">

Choose year: <select id="yearlist" onclick="populateMonthList();" style="vertical-align:top;">

Choose month: <select id="monthlist" onclick="populateArticleList();" style="vertical-align:top;">

Choose article to read: <select id="articlelist" onclick="displayArticle();" style="vertical-align:top;">

Article contents:

Listing Three function populateYearList() { MagViewer.getPublicationYears(onYearData, maglist.value); } function onYearData(data)

http://www.ddj.com

box at the bottom of the page. The HTML for the drop-down lists and the text box can be seen in Listing Two. Listing Three contains the JavaScript functions that are used to dynamically request data and update the contents of the page.

DWRUtil.removeAllOptions("yearlist"); DWRUtil.addOptions("yearlist", data); } function populateMonthList() { MagViewer.getYearTopics(onMonthData, maglist.value, yearlist.value); } function onMonthData(data) { DWRUtil.removeAllOptions("monthlist"); DWRUtil.addOptions("monthlist", data); } function populateArticleList() { var month = 99; for ( var intLoop = 0; intLoop < monthlist.length; intLoop++) { if ( monthlist[intLoop].selected ) month = intLoop; } MagViewer.getIssueDetails(onArticleList, maglist.value, yearlist.value, month); } function onArticleList(data) { DWRUtil.removeAllOptions("articlelist"); DWRUtil.addOptions("articlelist", data); } function displayArticle() { var month = 99; for ( var intLoop = 0; intLoop < monthlist.length; intLoop++) { if ( monthlist[intLoop].selected ) month = intLoop; } var article = 99; for ( var intLoop = 0; intLoop < articlelist.length; intLoop++) { if ( articlelist[intLoop].selected ) article = intLoop; } MagViewer.getArticle(onArticleData, maglist.value, yearlist.value, month, article); } function onArticleData(data) { articletext.value = data; }

Dr. Dobb’s Journal, February 2006

DDJ 35

Examining µC++ High-level object-oriented concurrency for C++ PETER A. BUHR AND RICHARD C. BILSON

C

oncurrency is the most complex form of programming, with certain kinds of real-time programming being the most complex forms of concurrency. Often the mechanisms for writing concurrent programs only exacerbate the complexity because they are too low level and/or independent from the language. Currently, concurrency is being thrust upon all programmers indirectly through the push for higher performance in hardware. To maintain Moore’s Law, it is becoming necessary to add parallelism at a number of hardware levels — instruction pipeline, multithreading, multicore processors, shared-memory multiprocessors, and distributed clusters. Some successful attempts have been made to implicitly discover concurrency in a sequential program; for instance, by parallelizing loops and access to data structures. While this approach is appealing because of the simple sequential programming model and ability to parallelize legacy code, there is a limit to how much parallelism can be found, and current techniques only work on certain kinds of programs. Therefore,

Peter is an associate professor in the School of Computer Science at the University of Waterloo. He can be contacted at pabuhr@ uwaterloo.ca. Richard is a research assistant in the School of Computer Science at the University of Waterloo, working in the Programming Languages Group. He can be contacted at rcbilson@ plg.uwaterloo.ca. 36

explicit concurrent mechanisms are necessary to achieve maximum concurrency potential. Luckily, approaches to this are complementary, and can appear together in a single programming language. C++ Concurrency Given the need for explicit concurrency, modern programming languages such as Beta, Ada, Modula-3, Java, and C#, among others, provide some direct support for concurrency. Surprisingly, however, C++ has no concurrency support. During C++’s 20-year history, many different concurrency approaches for C++ have been suggested and implemented with only varying degrees of adoption. As a result, there is no de facto standard dominating concurrent programming in C++. (In C, there are two dominant, but incompatible, concurrency libraries —Win32 and pthreads.) In this article, we argue that C++’s lack of concurrency is significantly limiting the language’s future. This deficiency has also been recognized by the C++ Standards committee, which is currently examining concurrency extensions. We also outline how high-level object-oriented concurrency can be added to C++ through a freely available concurrent dialect of C++ called “µC++” (http://plg.uwaterloo.ca/ usystem/uC++.html). Concurrent Design Principles There are a number of major design principles for adding concurrency to objectoriented languages, such as: • Object-oriented design is built on the notion of the class. Hence, concurrency should be built on the class notion, allowing it to leverage other class-based language features. • All concurrent systems must provide three fundamental properties: thread, a mechanism to sequentially execute statements, independently of (and possibly concurrently with) other threads; execution context, the state needed to permit independent execution, including a Dr. Dobb’s Journal, February 2006

separate stack; and mutual exclusion/synchronization (MES), mechanisms to exclusively access a resource and provide necessary timing relationships among threads. These properties cannot be expressed in an architectureindependent way through existing language constructs. (Even algorithms for MES, such as Dekker’s algorithm, do not

“µC++ was designed to provide high-level, integrated, lightweight, object-oriented concurrency for C++” always work without a sufficient memory model.) Therefore, any concurrency system must provide abstractions to implement these properties. • Because MES causes the most errors for programmers and the greatest difficulty for safe code optimizations, it should be implicit through concurrent language constructs. • If the routine call is the basis for normal object communication, it should also be used for concurrency. Mixing mechanisms, such as routine call with messagepassing/channels, is confusing and error prone, and may lose important capabilities such as static type checking. Joining the fundamental concurrency properties with the class model is best done by associating thread and execution-context with the class, and MES with member routines. This coupling and the http://www.ddj.com

interactions among the concurrency properties generate the programming abstractions in Table 1. • Case 1 in Table 1 is a standard C++ object. Its member routines do not provide MES, and the caller’s thread and stack are used to perform execution. • Case 2 has all the properties of case 1 but only one thread at a time can be executing among the member routines with the MES property, called a “mutex member.” Within a mutex member, synchronization with other tasks can be performed. This abstraction is a monitor, which is well understood and appears in many concurrent languages (Java, for instance). • Case 3 is an object that has its own execution context but no MES or thread; the execution context is associated with a distinguished member in the object. This abstraction is a coroutine, which goes back to the roots of C++ in Simula. • Case 4 is like case 3 but deals with concurrent access by adding MES. This abstraction is a coroutine monitor. • Cases 5 and 6 are a thread without a stack, which is meaningless because a thread must have a stack to execute. • Case 7 is an object that has its own thread and execution context but no MES. This case is questionable because explicit locking is now required to handle calls from other threads, which violates design principle 3. • Case 8 is like case 7 but deals with concurrent access by adding MES. This abstraction is a task, which is an active object and appears in many concurrent languages (Ada). Note, the abstractions are derived from fundamental properties and not ad hoc decisions by a language designer, and each has a particular set of problems it can solve well. Simulating one abstraction with the others often results in awkward solutions that are inefficient; therefore, each has a place in a programming language. µC++: Concurrency in C++ µC++ was designed using these concurrency design principles and engineered to provide high-level, integrated, lightweight, object-oriented concurrency for C++. By being high level, you can code in a race-free style, which eliminates the need for a complex memory model. By being integrated into the C++ language, the compiler can understand precisely when it can safely perform optimizations. Currently, µC++ is a translator that converts to C++, but its design ultimately assumes it is part of C++. Figure 1 shows the syntax for adding the programming abstractions in Table 1 to C++. There are two new type constructors _Coroutine and _Task, extensions http://www.ddj.com

sumes restart at the last suspend from main. Routine suspend restarts the last resume executed by a public member. A coroutine object becomes a coroutine when main starts (first resume); the coroutine becomes an object again when main ends. Listing Two is a simple FSM for recognizing phone numbers of the form: (555)opt 123 – 4567. Characters of the phone number are passed one at a time to the next member, which returns the current status of the parse. Note how the coroutine main retains its execution location and restarts there when it is resumed; for example, when parsing groups of digits, the coroutine suspends in the middle of a for loop and restarts within the particular loop when resumed. The killer application for a coroutine is device drivers, which cause 70– 85 percent of failures in Windows/Linux. Many device drivers are FSMs parsing a protocol; for instance:

of class, implicitly associating the execution context and thread properties to objects. There are two new type qualifiers, _Mutex and _Nomutex, for qualifying member routines needing the mutual exclusion property and which contain synchronization. There are implicitly inherited members providing context-switch/synchronization, suspend( ), resume( ), wait( ), signal( ), signalBlock( ), and one new statement —_Accept. Each of these new constructs is explained through examples. Coroutine A coroutine is not a concurrent abstraction, but it appears directly from the combination of fundamental concurrency properties and supports direct implementation of finite-state machines (FSM). In µC++, the execution context (stack) for a coroutine object is associated with its distinguished member main; see Listing One. A coroutine type implicitly inherits member routines resume and suspend, which provide control flow among coroutines. Like a class, a coroutine’s public members define its interface but also provide the interaction with the coroutine’s main; multiple public member routines allow complex, type-safe communication. The resume routine is called from the public members and the suspend routine is called directly or indirectly from the coroutine’s main. The first call to resume starts main, which executes on its own stack. Subsequent re-

…STX…message…ESC ETX…message… ETX 2-byte crc…

Here, a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic redundancy check. Control characters can appear in the message if preceded by an ESC. An Ethernet driver is just a complex version of this simple protocol, and the FSM for the Ethernet protocol can be directly coded as a coroutine. Because FSMs can be complex and occur frequently

object properties

member routine properties

thread

stack

No MES

MES

No No Yes Yes

No Yes No Yes

1 class 3 coroutine 5 reject 7 reject

2 monitor 4 coroutine-monitor 6 reject 8 task

Table 1: Programming abstractions. MES

No MES

1

class c {

2

public: m() { }

No Stack No Thread };

3 Stack No Thread

_Coroutine C { void main() { suspend } public: m() { resume } };

4

8 Stack Thread

// or _Monitor Mutex class M { uCondition variables; public: m() { wait/signal/accept } }; _Mutex _Coroutine CM { // or _Comonitor uCondition variables; void main() { suspend/wait/signal/accept } public: m() { resume/wait/signal/accept } }; _Task T { uCondition variables; void main() { wait/signal/accept } public: m() { wait/signal/accept } };

Figure 1: µC++ constructs. Dr. Dobb’s Journal, February 2006

37

in important domains, direct support of the coroutine is crucial, independent of concurrency. Monitor A monitor is a concurrency abstraction that encapsulates a shared resource with implicit mutual exclusion and provides for complex synchronization among tasks using the resource; see Listing Three. Any member routine can be qualified with the MES qualifiers, _Mutex/_Nomutex, indicating the presence or absence of MES, respectively. Only one thread at a time can be executing among the mutex routines of a monitor object; other threads calling mutex routines of the same object implicitly block. Recursive entry is allowed for the thread currently using the monitor; that is, it may call other mutex members. The MES qualifiers can also qualify a class, which defines the default qualification for the public member routines. Hence, the presence of a single mutex member in a class makes it a monitor. Member variables cannot be MES qualified. The destructor of a monitor is always _Mutex, because the thread terminating a monitor object must wait if another thread is executing in it. The mutex property ensures exclusive access to the monitor’s data by multiple threads. For simple cases, such as an atomic counter, exclusive access is sufficient and the order of access is unimportant. For complex cases, the order of access can be crucial for correctness; for example, one task may need to communicate information to another task and wait for a reply, or a resource may have strict ordering rules with respect to thread usage. Ordering is controlled by threads synchronizing among calling

(a)

shared data wait signal urgent condition c signalBlock exit (b)

themselves within a monitor using condition variables and operations wait( ), signal( ), signalBlock( ), or an _Accept statement on mutex members. A condition variable is a place where a task within the monitor can wait for an event to occur by another task using the monitor. Figure 2(a) illustrates the internal parts of a monitor object for synchronization with condition variables. Calling threads wait until no mutex member is active. A condition variable (for example, c) is a queue of waiting threads. The thread active in the monitor object waits on queue c by executing c.wait( ) (dotted line), which either implicitly restarts the last signalled thread, or if no signalled threads, releases the monitor lock so a new thread may enter. The active thread may execute c.signal( ) (dashed line) to restart a waiting thread at the front of a condition queue; the signalled thread can only restart after the signaler thread blocks or exits due to the mutual exclusion property, which is accomplished by having the signalled thread wait temporarily on the hidden urgent condition. Alternatively, the active thread may execute c.signalBlock( ) (solid line), which makes the active thread wait on the urgent queue and immediately starts the signalled thread at the front of the queue. Using these mechanisms, order of access within the monitor can be precisely controlled. Tasks within the monitor can wait for an event to occur by a calling task using an accept statement. Figure 2(b) illustrates the internal parts of a monitor object for synchronization with an _Accept statement. _Accept selects which mutex member call to execute next (like Ada’s select). _Accept(m1) unblocks the thread on the front of the mutex queue after the accepter is implicitly blocked (like signalBlock). If there is no calling task, the accepter waits (on the hidden urgent queue) until a call to the specified member occurs. When the member call ends, the accepter implicitly restarts after the _Accept statement. _Accept can appear at any level of routine nesting in the monitor. The _Accept statement can check multiple mutex members for calls: _Accept( m1, m2,…);

mutex routines m1 m2 calling shared data

The call on the first nonempty mutex queue is selected (so the order of member names is important); if no calls are pending, the accepter waits until a call occurs. Finally, each selected member can be separated and supplied with a guard:

Figure 2: Monitor synchronization: (a) condition variable; (b) accept statement.

_When( conditional-expression ) _Accept( m1 ) statement else _When( conditional-expression ) _Accept( m2 ) statement … else statement

38

Dr. Dobb’s Journal, February 2006

urgent exit

The guard must be true before a mutex queue is considered; if there is a terminating else, the accepter does not block, rather it polls for callers. The statement after an _Accept is executed by the accepter after the mutex call, allowing it to perform different actions depending on which call occurred. Monitor Examples Listing Four(a) shows the classic datingservice problem implemented with condition variables, where two kinds of tasks exchange information based on some criteria. In this case, there are girl and boy tasks exchanging phone numbers if they have matching compatibility codes (values 0 –19). A girl task checks if a boy with the same code is waiting. If not, she waits; otherwise, she copies her phone number to the shared variable GirlPhoneNo, and does a signalBlock to immediately restart the waiting boy while she waits on the urgent queue. The waiting boy restarts, copies his phone number to the shared variable BoyPhoneNo, and returns with the girl’s phone number. The waiting girl is then implicitly restarted from the urgent queue after the boy returns, and she now returns with the boy’s phone number. Listing Four(b) shows the classic read/write problem implemented with _Accept, where multiple reader tasks can simultaneously read a resource, but writer tasks must be serialized to write the resource. Tasks access the resource like this: ReadersWriter rw; reader task rw.StartRead(); // read resource rw.EndRead();

writer task rw.StartWrite(); // write resource rw.EndWrite();

The variables rcnt and wcnt count the number of simultaneous reader or writer tasks using the resource. EndRead/EndWrite decrement the appropriate counter when a task finishes using the resource. StartRead checks if a writer is using the resource, and if so, accepts EndWrite, causing the reader task to wait on the urgent queue and preventing calls to any other mutex member. When the current writer finishes writing, it calls EndWrite; then the waiting reader implicitly restarts in StartRead and increments rcnt. StartWrite begins with the same check for a writer and the same actions as for a reader if a writer is using the resource. Alternatively, if there are rcnt readers using the resource, the writer loops and performs rcnt accepts of EndRead, one for each of the completing reader tasks. After the last reader finishes reading and completes its call to EndRead, the waiting writer implicitly restarts and increments wcnt. Because the accept statement strictly controls entry into http://www.ddj.com

the monitor, new (calling) tasks may not enter out of order. Coroutine Monitor The properties of a coroutine and monitor can be combined to generate a concurrency abstraction for resource sharing and synchronization along with retaining data and execution state; see Listing Five. A coroutine monitor is ideal for an FSM used by multiple threads, such as a shared formatter printer. The printer is the shared resource called by multiple threads to print each thread’s data, and the printer can be a complex FSM organizing the data into rows and columns with appropriate markings and headings. Combining these fundamental properties into a single construct simplifies the job of developing the solution for a complex problem. Task The properties of a coroutine monitor can be combined with a thread to generate a concurrency abstraction for an active object that dynamically manages a resource; see Listing Six, for example. Active objects appear in many concurrent languages. The use of both wait/signal on condition variables and accepting mutex members occurs frequently in a task (less so in a monitor). Finally, because the destructor is a mutex member, it can be accepted to determine when to terminate a task or monitor. Listing Seven(a) shows a basic worker task, Adder, generating the subtotals for each row of a global matrix by summing the elements of a particular row. (Global variables are used to simplify the example.) In µC++, the member routine uMain::main serves as the program’s main (starting) routine. This routine reads the matrix and starts a block that creates an array of Adder tasks, one for each row of the matrix. Each task’s main starts implicitly after its constructor completes — no explicit start is needed. Similarly, no explicit join is needed because the block containing the array of tasks cannot end until all the tasks in the array terminate, otherwise, the storage for the tasks could be deallocated while threads are executing. After all tasks in the block terminate, the block allocating the array of tasks ends, and the subtotals generated by each worker task can be safely summed to obtain the total. The constructor for each Adder task selects a specific row to sum by incrementing a shared variable; no mu-

tual exclusion is necessary for the selection as each task of the array is created serially. The main member of each Adder task adds the matrix elements for its particular row in its corresponding subtotal location. Listing Seven(b) shows a classic administrator server, where clients call different members for service. The server may provide multiple interface members for different kinds of clients and client requests. A client’s call may be serviced immediately or delayed using condition variables. The server’s main loops accept client calls using the _Accept statement; an accepted call may complete immediately or require subsequent servicing and signalling of the delayed client. Finally, the server’s destructor is also being accepted to know when the server is being deleted by another thread. Miscellaneous µC++ Features µC++ has a number of other features that integrate concurrency with C++: • Both termination and resumption exception handling are supported, as well as the ability to raise exceptions among coroutines and tasks; see Listing Eight(a). For resumption, the stack is not unwound, and control returns after the _Resume when the handler completes. The _At clause provides nonlocal delivery of an exception to another coroutine or task. Nonlocal delivery of exceptions is controlled by _Enable and _Disable statements; see Listing Eight(b). Specifying no exceptions enables/disables all exceptions. • The execution environment can be structured into multiple clusters of tasks and processors. Each cluster has a scheduler to control selection of its tasks to run on its processors; tasks and processors can migrate among clusters. • C++ streams and UNIX files/sockets are augmented to be thread safe, object-oriented, and nonblocking; for example, safe stream I/O is performed like this: isacquire( cin ) >> . . .; osacquire( cout ) << ...<< endl;

The declaration at the start of the I/O expression provides necessary locking of the specified stream for the duration of the expression. There are three classes for accessing sockets: uSocketServer, uSocketAccept, and uSocketClient, which hide most of the socket complexity and

Listing One _Coroutine C { void main() { // distinguished member / executes on coroutine's stack ...suspend()... // restart last resume }

http://www.ddj.com

support connectionless and connected protocols with timeout capabilities. • Basic real-time programming is available through three extensions of the task: _RealTimeTask R {…}; _PeriodicTask P {…}; _SporadicTask S {…};

Fixed and dynamic priority schedulers are provided for use with clusters, including a transitive priority-inheritance protocol. The _Accept statement is extended to handle timeouts: _Accept( M1, M2 ) {…} else _Accept ( M3 ) {…} else _Timeout( 1 ) {…}

// restart after // 1 second if // no call

• There is a debug mode for testing with many assertions and runtime checks, and µC++ also generates reasonable error messages. µC++ compiles on GCC 3.2 or greater and Intel icc 8.1/9.0; for Linux Intel x86/Itanium and AMD 32/64; Solaris 8/9/10 SPARC; and IRIX 6.x MIPS. Conclusion Providing concurrency via low-level libraries such as pthreads makes no sense for C++. This approach is error prone and does not integrate with existing C++ mechanisms. Medium-level approaches that attempt to leverage existing language features with a concurrency library also fall short, as programmers still struggle with multiple coding conventions and limitations of use, and some primitive concurrency properties are still hidden from the compiler. To truly help you and the compiler, concurrent programming requires high-level concurrency models and constructs. The three fundamental properties in concurrency — thread, execution context, and mutual-exclusion/ synchronization— can be integrated directly into C++’s core programming notion — the class — and subsequently work with other C++ mechanisms. This approach retains the language’s object-oriented programming model and provides multiple concurrency approaches and models, while requiring only a few new keywords and mechanisms in C++. µC++ is a full implementation of these ideas, providing a system that lets you tackle complex concurrent projects. DDJ

public: void m1(...) {... resume();...} // restart last suspend void m2(...) {... resume();...} // restart last suspend };

Dr. Dobb’s Journal, February 2006

(continued on page 40) 39

Listing Five

(continued from page 39) Listing Two _Coroutine Phone { public: enum status { MORE, GOOD, BAD }; private: char ch; status stat; void main() { int i; stat = MORE; // continue passing characters if ( ch == '(') { // optional area code ? for ( i = 0; i < 3; i += 1 ) { suspend(); if ( ! isdigit(ch) ) { stat = BAD; return; } } suspend(); if ( ch != ')') { stat = BAD; return; } suspend(); } for ( i = 0; i < 3; i += 1 ) { // region code ? if ( ! isdigit(ch) ) { stat = BAD; return; } suspend(); } if ( ch != '-') { stat = BAD; return; } // separator ? for ( i = 0; i < 4; i += 1 ) { // local code ? suspend(); if ( ! isdigit(ch) ) { stat = BAD; return; } } stat = GOOD; } public: status next( char c ) { // pass one character at a time to FSM ch = c; resume(); // activate coroutine return stat; } };

Listing Three _Mutex class M { // default MES for public member routines // SHARED DATA ACCESSED BY MULTIPLE THREADS uCondition c1, c2[10], *c3 = new uCondition; //different condition variables // default for private/protected is no MES void m1(...) {... /* MES statements */...} // no MES _Mutex void m2(...) {... /* MES statements */...}; // MES public: void m3(...) {.../* MES statements */...} // MES _Nomutex void m4(...) {...} // no MES ... // destructor is ALWAYS mutex };

Listing Four (a) _Mutex class DatingService { uCondition Girls[20], Boys[20]; int GirlPhoneNo, BoyPhoneNo; public: int Girl( int PhoneNo, int code ) { if ( Boys[code].empty() ) { Girls[code].wait(); GirlPhoneNo = PhoneNo; } else { GirlPhoneNo = PhoneNo; Boys[code].signalBlock(); } return BoyPhoneNo; } int Boy( int PhoneNo, int code ) { if ( Girls[code].empty() ) { Boys[code].wait(); BoyPhoneNo = PhoneNo; } else { BoyPhoneNo = PhoneNo; Girls[code].signalBlock(); } return GirlPhoneNo; } };

(b) _Mutex class ReadersWriter { int rcnt, wcnt; public: void ReadersWriter() { rcnt = wcnt = 0; } void EndRead() { rcnt -= 1; } void EndWrite() { wcnt -= 1; } void StartRead() { if ( wcnt == 1 ) _Accept( EndWrite ); rcnt += 1; } void StartWrite() { if ( wcnt == 1 ) _Accept( EndWrite ); else while ( rcnt > 0 ) _Accept( EndRead ); wcnt += 1; } };

40

_Mutex _Coroutine CM { // default MES for public member routines uCondition c1,c2[10],*c3 = new uCondition; // different condition variables void m1(...) {.../* MES statements */ ...} // no MES _Mutex void m2(...) { .../* MES statements */ ...}; // MES void main() {...} // distinguished member / has its own stack public: void m3(...) { ...resume()/* MES statements */ ...} // MES _Nomutex void m4(...) { ...resume(); ...} // no MES ...// destructor is ALWAYS mutex };

Listing Six _Task T { // default MES for public member routines uCondition c1,c2[10],*c3 = new uCondition; // different condition variables void m1( ... ) {.../* MES statements */ ... } // no MES _Mutex void m2(...) {.../* MES statements */...}; // MES void main() {...} // distinguished member/has own stack/thread starts here public: void m3(...) {.../* MES statements */ ...} // MES _Nomutex void m4(...) {...} // no MES ...// destructor is ALWAYS mutex };

Listing Seven (a) const int rows = 10, cols = 10; int M[rows][cols], ST[rows]; _Task Adder { // add specific row static int row; // sequential access int myrow, c; void main() { ST[myrow] = 0; // subtotal location for ( c = 0; c < cols; c += 1 ) ST[myrow] += M[myrow][c]; } public: Adder() { myrow = row++; } // choose row }; int Adder::row = 0; void uMain::main() { // read matrix { Adder adders[rows]; // create threads } // wait for threads to terminate int total = 0; // sum subtotals for ( int r = 0; r < rows; r += 1 ) total += ST[r]; cout << total << endl; }

(b) _Task Server { uCondition delay; void main() { for ( ;; ) { // for each client request _Accept( ~Server ) { // terminate ? break; // service each kind of client request } else _Accept( workReq1 ) { ... delay.signalBlock(); // restart client ... } else _Accept( workReq2 ) { ... } } // shut down } public: void workReq1( Req1_t req ) { ...delay.wait();...// service not immediate? // otherwise service request } void workReq2( Req2_t req ) { ... } ... };

Listing Eight (a) _Throw [ throwable-exception [ _At coroutine/task-id ] ] ; // termination _Resume [ resumable-exception [ _At coroutine/task-id ] ] ; // resumption

(b) _Enable <E1><E2>... { // exceptions E1, E2, ... delivered } _Disable <E1><E2>... { // exceptions E1, E2, ... not delivered }

DDJ Dr. Dobb’s Journal, February 2006

http://www.ddj.com

Native Queries for Persistent Objects Conquering the shortcomings of string-based APIs WILLIAM R. COOK AND CARL ROSENBERGER

W

hile today’s object databases and object-relational mappers do a great job in making object persistence feel native to developers, queries still look foreign in object-oriented programs because they are expressed using either simple strings or object graphs with strings interspersed. Let’s take a look at how existing systems would express a query such as “find all Student objects where the student is younger than 20.” This query (and other examples in this article) assume the Student class defined in Example 1. Different data access APIs express the query quite differently, as illustrated in Example 2. However, they all share a common set of problems: • Modern IDEs do not check embedded strings for syntactic and semantic errors. In Example 2, both the field age and the value 20 are expected to be numeric, but no IDE or compiler checks that this is actually correct. If you mistyped the query code — changing the name or type of the field age, for example — all of the queries in Example 2 would break at runtime, without a single notice at compile time. • Because modern IDEs will not automatically refactor field names that appear in strings, refactorings cause class model and query strings to get out of William is an assistant professor of computer science at the University of Texas in Austin. Carl is chief software architect at db4objects. They can be contacted at [email protected] and [email protected], respectively. http://www.ddj.com

sync. Suppose the field name age in the class Student is changed to _age because of a corporate decision on standard coding conventions. Now all existing queries for age would be broken, and would have to be fixed by hand. • Modern agile development techniques encourage constant refactoring to maintain a clean and up-to-date class model that accurately represents an evolving domain model. If query code is difficult to maintain, it delays decisions to refactor and inevitably leads to lowquality source code. • All the queries in Example 2 operate against the private implementation of the Student class student.age instead of using its public interface student.getAge( )/student.Age (in Java/C#, respectively). Consequently, they break objectoriented encapsulation rules, disobeying the principle that interface and implementation should be decoupled. • You are constantly required to switch contexts between implementation language and query language. Queries cannot use code that already exists in the implementation language. • There is no explicit support for creating reusable query components. A complex query can be built by concatenating query strings, but none of the reusability features of the programming language (method calls, polymorphism, overriding) are available to make this process manageable. Passing a parameter to a string-based query is also awkward and error prone. • Embedded strings can be subject to injection attacks. Design Goals Our goal is to propose a new approach that solves many of these problems. This article is an overview of the approach, not a complete specification. What if you could simply express the same query in plain Java or C#, as in Example 3? You could write queries without having to think about a custom query language or API. The IDE could actively help to reduce typos. Queries would be fully typesafe and accessible to the refactoring features of the IDE. Queries could also be prototyped, Dr. Dobb’s Journal, February 2006

tested, and run against plain collections in memory without a database back end. At first, this approach seems unsuitable as a database query mechanism. Naively executing Java/C# code against the complete extent of all stored objects of a class would incur a huge performance penalty

“Queries should be expressed in the implementation language, and they should obey language semantics” because all candidate objects would have to be instantiated from the database. A solution to this problem was presented in “Safe Query Objects” by William Cook and Siddhartha Rai [3]. The source code or bytecode of the Java/C# query expression can be analyzed and optimized by translating it to the underlying persistence system’s query language or API (SQL [6], OQL [1,8], JDOQL [7], EJBQL [1], SODA [10], and so on), and thereby take advantage of indexes and other optimizations of a database engine. Here, we refine the original idea of safe query objects to provide a more concise and natural definition of native queries. We also examine integrating queries into Java and .NET by leveraging recent features of those language environments, including anonymous classes and delegates. Therefore, our goals for native queries include: • 100-percent native. Queries should be expressed in the implementation language (Java or C#), and they should obey language semantics. • 100-percent object oriented. Queries should be runnable in the language 41

itself, to allow unoptimized execution against plain collections without custom preprocessing. • 100-percent typesafe. Queries should be fully accessible to modern IDE features such as syntax checking, type checking, refactoring, and so on. • Optimizable. It should be possible to translate a native query to a persistence architecture’s query language or API for performance optimization. This could be done at compile time or at load time by source code or bytecode analysis and translation. Defining the Native Query API What should native queries look like? To produce a minimal design, we evolve a simple query by adding each design attribute, one at a time, using Java and C# (.NET 2.0) as the implementation languages. (a) // Java public class Student { private String name; private int age; public String getName(){ return name; } public int getAge(){ return age; } } (b) // C# public class Student { private string name; private int age; public string Name { get { return name; } } public int Age { get{ return age; } } }

Example 1: (a) Java class; (b) C# class.

Let’s begin with the class in Example 1. Furthermore, we assume that we want to query for “all students that are younger than 20 where the name contains an f.” 1. The main query expression is easily written in the programming languages; see Example 4. 2. We need some way to pass a Student object to the expression, as well as a way to pass the result back to the query processor. We can do this by defining a student parameter and returning the result of our expression as a Boolean value; see Example 5. 3. Now we have to wrap the partial construct in Example 5 into an object that is valid in our programming languages. That lets us pass it to the database engine, a collection, or any other query processor. In .NET 2.0, we can simply use a delegate. In Java, we need a named method, as well as an object of some class to put around the method. This requires, of course, that we choose a name for the method as well as a name for the class. We decided to follow the example that .NET 2.0 sets for collection filtering. Consequently, the class name is Predicate and the method name is match; see Example 6. 4. For .NET 2.0, we are done designing the simplest possible query interface. Example 6 is a valid object. For Java, our querying conventions should be standardized by designing an abstract base class for queries — the Predicate class (Example 7). We still have to alter our Java query object slightly by adding the extent type to comply with the generics contract (Example 8). 5. Although Example 8 is conceptually complete, we would like to finish the

(a) String oql = "select * from student in AllStudents where student.age < 20"; OQLQuery query = new OQLQuery(oql); Object students = query.execute(); (b) Query query = persistenceManager.newQuery(Student.class, "age < 20"); Collection students = (Collection)query.execute(); (c) Query query = database.Query(); query.Constrain(typeof(Student)); query.Descend("age").Constrain(20).Smaller(); IList students = query.Execute();

Example 2: (a) Object Query Language (OQL); (b) JDO Query Language (JDOQL); and (c) db4o SODA (using C#).

Adding all required elements of the API in a step-by-step fashion lets us find the most natural and efficient way of expressing queries in Java and C#. Additional features, such as parameterized and dynamic queries, can be included in native queries using a similar approach [4]. We have overcome the shortcomings of existing string-based query languages and provided an approach that promises improved productivity, robustness, and maintainability without loss of performance. Specification Details A final and thorough specification of native queries is only possible after practical experience. Therefore, this section is speculative. We would like to point out where we see choices and issues with the native query approach and how they might be resolved. Regarding the API alone, native queries are not new. Without optimizations, we have merely provided “the simplest concept possible to run all instances of a class against a method that returns a Boolean value.” Such interfaces are well known: Smalltalk-80 [2, 5], for instance, includes methods to select items from a collection based on a predicate. Optimization is the key new component of native queries. Users should be able to write native query expressions and the database should execute them with performance on par with the string-based queries that we described earlier. Although the core concept of native queries is simple, the work needed to provide a solution is not trivial. Code written in a query expression must be analyzed and converted to an equivalent database query format. It is not necessary for all code in a native query to be translated. If the optimizer cannot handle some or all code in a query expression, there is always the fallback to instantiate the actual

(a) // Java student.getAge() < 20 && student.getName().contains("f")

(a) // Java student.getAge() < 20

(b) // C# student.Age < 20 && student.Name.Contains("f")

(b) // C# student.Age < 20

Example 4: (a) Java; (b) C#. 42

derivation of the API by providing a full example. Specifically, we want to show what a query against a database would look like, so we can compare it against the string-based examples given in the introduction. Example 9 completes the core idea. We have refined Cook/Rai’s concept of safe queries by leveraging anonymous classes in Java and delegates in .NET. The result is a more concise and straightforward description of queries.

Example 3: (a) Java query; (b) C# query. Dr. Dobb’s Journal, February 2006

http://www.ddj.com

objects and to run the query expression code, or part of it, with real objects after the query has returned intermediate values. Because this may be slow, it is helpful to provide developers with feedback at development time. This feedback might include how the optimizer “understands” query expressions, and some description of the underlying optimization plan created for the expressions. This will help developers adjust their development style to the syntax that is optimized best and will enable developers to provide feedback about desirable improved optimizations. How will optimization actually work? At compile or load time, an enhancer (a separate application or a plug-in to the compiler or loader) inspects all native query expressions in source code or bytecode, and will generate additional code in the most efficient format the database engine supplies. At runtime, this substituted code will be executed instead of the original Java/C# methods. This mechanism will be transparent to developers after they add the optimizer to their compilation or build process (or both). Our peers have expressed doubts that satisfactory optimization is possible. Because both the native query format and the native database format are well defined, and because the development of an optimizer can be an ongoing task, we are very optimistic that excellent results are achievable. The first results that Cook/Rai produced with a mapping to JDO implementations are very encouraging. db4objects (http://www.db4o.com/) already shows a first preview of db4o with unoptimized native queries today and plans to ship a

production-ready Version 5.0 with optimized native queries. Ideally, any code should be allowed in a query expression. In practice, restrictions are required to guarantee a stable environment, and to place an upper limit on resource consumption. We recommend: • Variables. Variable declarations should be legal in query expressions. • Object creation. Temporary objects are essential for complex queries so their

creation should also be supported in query expressions. • Static calls. Static calls are part of the concept of OO languages, so they should be legal. • Faceless. Query expressions are intended to be fast. They should not interact with the GUI. • Threads. Query expressions will likely be triggered in large numbers. Therefore, they should not be allowed to create threads.

// Java public abstract class Predicate <ExtentType> { public <ExtentType> Predicate (){} public abstract boolean match (ExtentType candidate); }

Example 7: Predicate class.

// pseudoJava (Student student){ return student.getAge() < 20 && student.getName().contains("f"); } // pseudoC# (Student student){ return student.Age < 20 && student.Name.Contains("f"); }

Example 5: PseudoJava and pseudoC#. (a) // Java new Predicate(){ public boolean match(Student student){ return student.getAge() < 20 && student.getName().contains("f"); } } (b) // C# delegate(Student student){ return student.Age < 20 && student.Name.Contains("f"); }

Example 6: (a) Java; (b) C#. http://www.ddj.com

Dr. Dobb’s Journal, February 2006

43

• Security restrictions. Because query expressions may actually be executed with real objects on the server, there need to be restrictions on what they are allowed to do there. It would be reasonable to allow and disallow method execution and object creation in certain namespaces/packages. • Read only. No modifications of persistent objects should be allowed within running query code. This limitation guarantees repeatable results and keeps transactional concerns out of the specification. • Timeouts. To allow for a limit to the use of resources, a database engine may choose to timeout long-running query code. Timeout configuration does not have to be part of the native query specification, but it should be recommended to implementors. • Memory limitation. Memory limitations can be treated like timeouts. A configurable upper memory limit per query expression is a recommended feature for implementors. • Undefined actions. Unless explicitly not permitted by the specification, all constructs should be allowed. It seems desirable that processing should continue after any exception ocnew Predicate <Student> () { public boolean match(Student student){ return student.getAge() < 20 && student.getName().contains("f"); } }

Example 8: Adding the Java extent type.

curs in query expressions. A query expression that throws an uncaught exception should be treated as if it returned False. There should be a mechanism for developers to discover and track exceptions. We recommend that implementors support both exception callback mechanisms and exception logging. The sort order of returned objects might also be defined using native code. An exact definition goes beyond the scope of this article but, using a Java comparator, a simple example might look like Example 10. This code should be runnable both with and without an optimization processor. Querying and sorting could be optimized to be executed as one step on the database server, using the sorting functionality of the database engine. Conclusion There are compelling reasons for considering native queries as a mainstream standard. As we have shown, they overcome the shortcomings of string-based APIs. The full potential of native queries will be explored with their use in practice. They have already been demonstrated to provide high value in these areas: • Power. Standard object-oriented programming techniques are available for querying. • Productivity. Native queries enjoy the benefits of advanced development tools, including static typing, refactoring, and autocompletion. • Standard. What SQL has never managed to achieve because of the diversity of

(a) // Java List <Student> students = database.query <Student> ( new Predicate <Student> () { public boolean match(Student student){ return student.getAge() < 20 && student.getName().contains("f"); } }); (b) // C# IList <Student> students = database.Query <Student> ( delegate(Student student){ return student.Age < 20 && student.Name.Contains("f"); });

Example 9: (a) Java; (b) C#. // Java List <Student> students = database.query <Student> ( new Predicate <Student> () { public boolean match(Student student){ return student.getAge() < 20 && student.getName().contains("f"); } }); Collections.sort(students, new Comparator <Student>(){ public int compare(Student student1, Student student2) { return student1.getAge() - student2.getAge(); } });

Example 10: Defining the sort order of returned objects. 44

Dr. Dobb’s Journal, February 2006

SQL dialects may be achievable for native queries: Because the standard is well defined by programming-language specifications, native queries can provide 100-percent compatibility across different database implementations. • Efficiency. Native queries can be automatically compiled to traditional query languages or APIs to leverage existing high-performance database engines. • Simplicity. As shown, the API for native queries is only one class with one method. Hence, native queries are easy to learn, and a standardization body will find them easy to define. They could be submitted as a JSR to the Java Community Process. Acknowledgments Thanks to Johan Strandler for his posting to a thread at TheServerSide that brought the two authors together, Patrick Roomer for getting us started with first drafts of this paper, Rodrigo B. de Oliveira for contributing the delegate syntax for .NET, Klaus Wuestefeld for suggesting the term “native queries,” Roberto Zicari, Rick Grehan, and Dave Orme for proofreading drafts of this article, and to all of the above for always being great peers to review ideas. References [1] Cattell, R.G.G., D.K. Barry, M. Berler, J. Eastman, D. Jordan, C. Russell, O. Schadow, T. Stanienda, and F. Velez, editors. The Object Data Standard ODMG 3.0. Morgan Kaufmann, 2000. [2] Cook, W.R. “Interfaces and Specifications for the Smalltalk Collection Classes.” OOPSLA, 1992. [3] Cook, W.R. and S. Rai. “Safe Query Objects: Statically Typed Objects as Remotely Executable Queries.” G.C. Roman, W.G. Griswold, and B. Nuseibeh, editors. Proceedings of the 27th International Conference on Software Engineering (ICSE), ACM, 2005. [4] db4objects (http://www.db4o.com/). [5] Goldberg, A. and D. Robson. Smalltalk-80: The Language and Its Implementation. Addison-Wesley, 1983. [6] ISO/IEC. Information technology — database languages — SQL — Part 3: Call-level interface (SQL/CLI). Technical Report 9075–3:2003, ISO/IEC, 2003. [7] JDO (http://java.sun.com/products/ jdo/). [8] ODMG (http://www.odmg.org/). [9] Russell, C. Java Data Objects (JDO) Specification JSR-12. Sun Microsystems, 2003. [10]Simple Object Database Access (SODA) (http://sourceforge.net/projects/ sodaquery/). [11]Sun Microsystems. Enterprise JavaBeans Specification, Version 2.1. 2002 (http://java.sun.com/j2ee/docs.html). DDJ http://www.ddj.com

Dynamic Bytecode Instrumentation A new way to profile Java applications IAN FORMANEK AND GREGG SPORAR

P

rofiling Java applications is often considered a black art. There are tools available to help you track down performance bottlenecks and memory-allocation problems, but they are not always easy to use, particularly when profiling large applications. Because large applications tend to be those that most need profiling, this presents a significant problem. It is therefore not surprising that Java applications have a reputation for running slowly — in many cases, this is solely because no performance- related analysis and tuning has been done. Dynamic bytecode instrumentation is an innovative solution to these problems. It lets you control precisely which parts of an application are profiled. As a result, only relevant information is reported, and the impact on application performance is reduced so that even large applications can be profiled easily.

Gregg is a technology evangelist for Sun Microsystems and Ian is project lead and architect of the NetBeans Profiler. They can be contacted at gregg.sporar@sun .com and [email protected], respectively. http://www.ddj.com

Obstacles to Profiling The two biggest obstacles to profiling have been runtime overhead and interpretation of the results. The runtime overhead imposed by a Java profiler can be a showstopper. The instrumentation added by the profiler can cause the application to run differently — which may change the performance problem symptoms, making it harder to find the cause of the problem. At the very least, if the application is running more slowly, it will take longer for it to get to the point where problems occur. In a worst-case scenario, the application might not even run correctly at all — unexpected timeouts caused by slower performance could result in an application crash. After wrestling with those issues, you then have to interpret the results produced by the profiler. This can be overwhelming, even when working on a small application. For large applications, it can be a serious impediment to tracking down the cause of the problem. The larger the application, the higher the likelihood it has code in it that you did not write and therefore might not understand. This is particularly true for web and enterprise applications with several layers of abstraction that are specifically designed to be hidden from developers. Additionally, larger applications tend to have larger heaps and more threads. Profilers will deliver information on all of these things, which is usually more information than you can interpret efficiently. Filters are provided, but frequently are not precise enough and can therefore end up excluding useful information. Dr. Dobb’s Journal, February 2006

Traditional Profiler Technologies There are two Java APIs available for profiling — the Java Virtual Machine Profiling Interface (JVMPI) and Java Virtual Machine Tool Interface (JVMTI). Most Java profiling tools use one of these APIs for doing instrumentation of an application and for notification of Virtual Machine (VM) events.

“Dynamic bytecode instrumentation lets you control precisely which parts of an application are profiled” The most widely used profiling technique is bytecode instrumentation. The profiler inserts bytecode into each class. For CPU performance profiling, these bytecodes are typically methodEntry( ) and methodExit( ) calls. For memory profiling, the bytecodes are inserted after each new or after each constructor. All of this insertion of bytecodes is done either by a postcompiler or a custom class loader. The key limitation of this technique is that once a class has been modified, it does not change. That lack of flexibility causes problems. If you choose to profile the entire application, the overhead of all 45

that instrumentation can cause serious performance problems. But if you use a package or class-based filter to limit the instrumentation, you might end up not profiling an important part of the application. Further, if changing the instrumentation requires restarting the application, the profiling process slows down considerably. An alternative to bytecode instrumentation is to monitor events generated by the JVMPI or JVMTI interface. Both APIs generate events for method entry/exit, object allocation, monitor entry, and so on. But there are some drawbacks: Using these features disables VM optimizations and not much flexibility is provided. As a result, most profilers avoid using these features, although there are some VM events that must be tracked this way (garbage collection, for instance).A third approach is to use sampling instead of instrumentation or

event monitoring. The profiler periodically examines the stack of each thread in the VM. The interval is typically between 10 and 100 milliseconds (ms). The stack traces are recorded and the method at the top of the stack is assumed to have been running since the last examination. So if the interval is 10 ms and method foo( ) is at the top of the stack four times, then the profiler will report that method foo( ) ran for 40 ms. This technique imposes little overhead, but suffers from some drawbacks because the results are generally less accurate. Most importantly, there is no information about the number of times foo( ) was called. Another drawback is that the profiling overhead grows as the number of threads grows. And finally, sampling provides no solution for memory profiling, so if your application has memory-allocation problems, a profiler that does only sampling will be of no use.

Figure 1: CPU performance profiling options.

Dynamic Bytecode Instrumentation A new approach, profiling with dynamic bytecode instrumentation (DBI), offers some important benefits. With DBI, the insertion of instrumentation bytecodes is done at runtime. After you have selected a class or method for profiling, the next time it is invoked by the application, the profiler will insert the necessary instrumentation. This is done using

Figure 2: Information overload. Every application thread is listed.

Figure 3: Incomplete performance results for the run() method.

Figure 4: Complete performance results for the run() method. 46

Dr. Dobb’s Journal, February 2006

the JVMTI’s redefineClass(Class [ ] classes, byte[ ][ ] newBC) method. This means you can change the profiling overhead as the application is running. You can even remove all instrumentation so that your application is no longer being profiled. These changes can be made adaptively as you learn more about the behavior of the application. Perhaps best of all, the changes can be made without restarting your application. CPU Performance Profiling To get the maximum benefit from DBI, the profiler has to provide additional functionality. For CPU performance profiling, this means intelligent usage of DBI so that only the methods that need profiling have bytecodes added for instrumentation. Performance problems are typically associated with certain operations or functionality in an application. As an example, in a reporting system, a specific report might take longer than others to display. Once you identify the specific method that starts the execution of the feature with the problem, the profiler should be able to instrument just that method, allowing the rest of your application to run at full speed. Limiting the profiling also reduces the output from the profiler so that it only reports on the interesting parts of your application. The top-level methods you select for profiling are root methods. To provide a complete picture of the performance problem, the profiler must instrument not only the selected root method(s), but also all methods that it invokes. This analysis must be done iteratively so that the profiler calculates the transitive closure of the call chain. When an instrumented method is invoked by the application, the instrumentation code must efficiently determine if a root method is in the call chain. If no root method is in the call chain, then the instrumentation code exits immediately to minimize overhead. If a root method is in the call chain, then profiling information is recorded. A profiler that lets you select root methods and then applies DBI intelligently allows you to define the problem in your terms. Filters based on package and class names are still useful, but selecting root methods is a much more precise way of controlling profiling overhead while maintaining useful results in the profiler’s output. Memory Profiling As with CPU performance profiling, to get maximum benefit from DBI when doing memory profiling, a profiler must use additional logic. This means smarter techniques for keeping track of objects because that is the largest source of overhead when doing memory profiling. It turns out the http://www.ddj.com

smartest technique is one of the simplest: Track only a subset of the allocated objects. Tracking as few as 10 percent of the allocated objects usually yields the same information, but with much less overhead. This is particularly true on larger applications. A profiler should not only track object allocations, it should also help you determine if your application is leaking memory. A memory leak occurs when your application is no longer using an object but is inadvertently still holding a reference to that object, thereby preventing the VM from freeing up the memory used by that object. The sooner you can identify the objects that are potential leaks, the sooner you can reduce profiling overhead by having the profiler use DBI to remove instrumentation from all other classes. Identifying potential leaks can be difficult, especially in large applications. The memory heap in a large application can be so big that attempting to compare heap snapshots frequently offers little help because the leaking objects are not obvious. This is especially true for leaks that start out small but continue to grow over time; sometimes weeks of application uptime are required before the problem is large enough to be noticeable. A statistical approach for identifying memory leaks works well and imposes very little overhead. There are two key metrics: the age of each object and the generation count for each class. The age of an object is the number of garbage collections it has survived. The generation count for a class is the number of different ages of all objects of that class. A low generation count indicates that all the objects of a class have been in memory about the same amount of time. A high generation count (or a generation count that always increases) indicates that your application is continuing to allocate new objects of that class without letting go of references to older objects of that class. These are the best candidates for being memory leaks because applications do not typically intend to periodically allocate long-lived objects. Instead, they typically have longlived objects that are allocated at roughly the same time and short-lived objects that are periodically allocated and then removed by the VM once they are no longer needed. CPU Performance Profiling Example We will use the NetBeans Profiler (http:// profiler.netbeans.org/) to demonstrate DBI. For CPU performance profiling, we chose the Java2D program, which is one of the demos included with Sun’s Java Development Kit (JDK). When we started the profiling session, we configured the NetBeans Profiler to perform instrumentation of the entire application, as in Figure 1. http://www.ddj.com

This caused a dramatic slowdown in application startup time: 10 seconds instead of 2 seconds. Application performance was also sluggish. When run without profiling, the application was able to keep up with a rapid series of mouse clicks on its top-level display tabs. When run with full instrumentation, it was not able to keep up. An additional problem is that, with full instrumentation, it took longer to find the information we needed. We were interested in the performance of the java2d.AnimatingSurface.run( ) method. With full instrumentation, we had to wade through 13 threads to find the three that invoke the run( ) method; see Figure 2.

We then restarted the profiling session and applied a filter that removed the instrumentation from all java.* classes. Application performance improved dramatically, but the comprehensiveness of the reported results suffered quite a bit. Compare Figure 3 with Figure 4. In Figure 3, the java.* classes were excluded and as a result important details about how much time was spent executing the methods called by the run( ) method are not available. Figure 4 shows the output when all classes were profiled. To get acceptable application performance and comprehensive results, we used a DBI feature of the NetBeans Profiler to change the instrumentation. We removed

Figure 5: Only the interesting threads are listed.

Figure 6: Memory profiling options.

Figure 7: Default heap view sorted by size. Dr. Dobb’s Journal, February 2006

47

the filter that had eliminated the java.* classes and we chose a single root method: java2d.AnimatingSurface.run( ). As a result, application performance was very responsive and it was much easier to find the information we needed; only the three threads that had invoked the run( ) method were displayed, as in Figure 5. And because no java.* classes were filtered out, full details of the run( ) method were available; see Figure 4. This application is relatively small, but the lessons learned apply to larger applications as well. When full profiling was done, 19,579 methods were instrumented. Adding a filter to remove the java.* classes dropped the number of methods with instrumentation to 712, but also reduced the amount of information available. With a single root method selected and the java.* classes no longer filtered out, the

number of methods with instrumentation dropped to 344. Additionally, we still had comprehensive profiling results for the method that was of interest. Memory Profiling Example The sample application we created to demonstrate memory profiling contains a common Java antipattern: Objects are periodically placed into a Map and then forgotten. This prevents the VM from removing those objects from memory. Given enough time, the application eventually runs out of memory and crashes. The objects are small, however, so the amount of time before a crash occurs could be days or weeks. Instead of waiting for it to crash, we used the NetBeans Profiler to watch the sample application’s memory usage. The NetBeans Profiler can track just allocations, or allocations and garbage

Figure 8: Heap view sorted by generation count.

collection. We chose the second option and also to track 10 percent of allocations, as in Figure 6. The NetBeans Profiler has a dynamic display of the heap contents, as in Figure 7. By default, it lists classes by the amount of space that the objects of that class are using. When we switched the display order to be by generation count and then let the application run for a while, the memory leak was easy to spot. Three classes had generation count values that were noticeably larger and were continuing to increase as the application ran (Figure 8). The three classes are java.util.HashMap$Entry, double[ ], and float[ ]. After those were identified, we used a DBI feature within the NetBeans Profiler to turn off the instrumentation of all other classes. This reduced profiling overhead dramatically. Knowing which classes are leaking is usually only part of the challenge because frequently there are leaking and nonleaking instances of a class. In other words, some of the allocations are okay, some are not. The NetBeans Profiler helped us find the allocations that are leaking by showing us stack traces for each allocation. The top-level list of stack traces in Figure 9 lists two methods that are allocating HashMap$Entry objects: addEntry( ) and createEntry( ). The generation count for addEntry( ) is dramatically larger than for createEntry( ), so it is the more likely source of the leak. Expanding the stack trace of addEntry( ) shows that it is called by put( ), which in turn is called by several different methods. Of all the different execution paths that result in the creation of HashMap$Entry objects, one has a generation count that is dramatically larger than the rest. That code path begins with the run( ) method in the aptly named demo.memoryleak.LeakThread class, as in Figure 10. The NetBeans Java editor was used to display the source code, which allowed us to spot the memory leak bug: map.put (new float[1], new double [1]);

Figure 9: The two top-level stack traces for HashMap$Entry allocations.

Conclusion You do not have to endure untenable amounts of overhead and an overwhelming amount of output to profile your Java applications. Dynamic Bytecode Instrumentation, when properly applied, can make profiling fast and easy. References Dmitriev, Mikhail. “Design of JFluid: A Profiling Technology and Tool Based on Dynamic Bytecode Instrumentation,” http://research.sun.com/techrep/2003/ abstract-125.html.

Figure 10: Expanded stack trace for HashMap$Entry allocations. 48

Dr. Dobb’s Journal, February 2006

DDJ http://www.ddj.com

Range Tracking & Comparison Algorithms Managing/comparing ranges denoted by base/limit pairs KIRK J. KRAUSS

S

ome information is best viewed as a list of ranges. Most software developers are familiar with the concept of virtual address ranges occupied by memory heaps, stacks, executable code modules, and the like. Application programs that need awareness of such ranges might store a range list that gets updated over time; for example, when a heap shrinks or grows. Similar range lists could also be used in other applications to represent sections of an image in a motion picture, or age groups of participants in a long-term study, or layers in a collection of sediment samples, or candidate choices at exit polls across the country, and so on. Range-based data is commonly used by accounting, vision, storage, operatingsystem software, and elsewhere. In any of these examples, a range might represent a contiguous set of items, such as the depth of certain strata in a sediment sample, or opinions of voters between 25 and 35 years old. Each range in the list might be represented as a pair of numbers — one number that represents the base of the range and another number that represents that range’s size or limit. To keep a list of ranges up to date and useful, you need algorithms for the creation, modification, comparison, and deletion of the numeral pairs that represent the ranges in the list. Range-list comparison comes in handy when the ranges can change in size. For example, suppose a range list is used to

Kirk is a software developer working in the Rational Software division of IBM. He can be contacted at [email protected]. 50

represent heap memory in a virtual address space, where ranges of heap memory are listed as a series of base/limit pairs. Each base represents the bottom address of a range of contiguous virtual address space that the operating system has committed for use by that heap. Each limit represents the top of such a range. As the heap grows and/or shrinks, a new list of base/limit pairs is generated for comparison with the existing list. A diff comparison of an existing list and a new, updated one then selects the areas of heap growth and shrinkage, possibly for use in memory-state tracking or similar purposes. Building and Managing Lists Range-list creation and destruction are implemented via a pair of algorithms (Listings One and Two) that assume that the ranges to be tracked reside in the virtual address space for a process. You might need to make some minor implementation changes to apply these algorithms to other specific types of range data. To use these algorithms, you should provide code to pinpoint the appropriate beginning and ending points of an item’s range, at lines in the pseudocode that are identified via relevant text. You should also provide code that allocates, initializes, and deallocates the range-tracking elements themselves, also at points shown as text in the listings. A notable feature of the range-list creation algorithm is the coalescing of ranges for contiguous items. Multiple objects that occupy contiguous virtual address space are represented by a single range-tracking element. This is done to achieve optimal performance when these range lists are accessed in situations such as heap-range tracking where the boundaries between contiguous items aren’t important. If you need to keep track of boundaries between separate ranges even when they are contiguous, you can simplify the algorithm so that it doesn’t coalesce contiguous ranges. A part of a range list can be replaced by splicing in a new set of ranges using the Splice Range List algorithm (Listing Three, available electronically; see “ReDr. Dobb’s Journal, February 2006

source Center, page 6). This saves you from having to replace an entire list when updates are needed for just part of the list. This pseudocode assumes that the ranges to be tracked reside in the virtual address space for a process. With some minor implementation changes, the algorithm could be applied to other types of range data. You should provide code that accesses the appropriate list elements wherever this pseudocode contains statements such as

“Range-based data is commonly used by accounting, vision, storage, and operating-system software” “look up the next range on the new list.” You should also provide code that allocates, initializes, and deallocates the rangetracking elements themselves, at the element insertion and deletion points specified in this pseudocode. The startAddr and stopAddr input parameters denote virtual addresses in the context of this pseudocode. More generically, these values correspond to beginning and ending delimiters that apply to both the old and new range lists. Old list elements corresponding to ranges that do not reside between these delimiters are left in place. New list elements effectively replace the old ones between these delimiters. In some conditions, where an old range spans one of these delimiters, an old list element may be modified to reflect the changes represented via the new list. The pseudocode provides special case logic to address these conditions. Comparing lists of base/limit pairs requires a more complex algorithm than one might first suspect. Before I developed http://www.ddj.com

/* /* /* /* /* /*

...

Lower end of the overall space

...

Upper end of the overall space

Figure 1: Information supplied by the range-list comparison algorithm. fy starting and stopping addresses to obtain a list applicable to the interesting portion of virtual memory. The algorithm ignores any objects that lie outside of these specified boundaries, which you specify as input parameters. No matter what kind of ranges you want to compare, if you want to limit the comparison to part of a larger list, you can apply these parameters to limit the set of ranges that will be compared, to save time. DDJ

END END

The Create Range List algorithm Tracks a current set of address ranges for the objects that meet specific criteria. Coalesces the ranges of a contiguous set of these objects into a single range list element, minimizing the list's size and optimizing subsequent range list diff performance Returns the list of ranges that are identified and tracked here.

*/ */ */ */ */ */

/* Input parameters */ ADDRESS startAddr ADDRESS stopAddr /* Local variables */ LIST a new list of ranges BOOL bListRange = FALSE /* All other variables are references to address range tracking /* list elements. At minimum, there will be a references to the /* "current" element in the new list being created.

*/ */ */

Allocate and initialize the header for the new list of ranges IF (startAddr) DO Determine the first object that resides at or above startAddr Make that object "current" END ELSE DO Determine the first object that can be found Make that object "current" END

IF (bListRange) DO IF (there is a "current" range tracking element) DO /* Add up the sizes of any contiguous objects that /* meet the selection criteria for our list. Add the "current" object's size to the size of the "current" range tracking element END ELSE DO /* This is the first listworthy object we've /* encountered lately. /*** A GOOD BREAKPOINT ***/ Allocate a range tracking element and make it "current" Set the range base to the base of the "current" object Set the range size to the size of the "current" object

ELSE bListRange = FALSE IF (there is a "current" range tracking element) DO /* Did we just walk past the "upper" end of a set of /* contiguous objects to be listed? IF (!bListRange) DO /*** A GOOD BREAKPOINT ***/ Add the "current" range element to the range list Update local state so that there is no "current" range element END END

*/ */

find the object at the next "higher" address make it "current" END /* WHILE loop */ IF (there is a "current" range tracking element) DO /* Clean up loose ends. */ Add the "current" range element to the range list END RETURN the new list of ranges

Listing Two

/* Walk through the address space, building the list of address */ /* that meet a user-defined set of selection criteria */ WHILE ((the address of the "current" object) < stopAddr) DO IF (the "current" object meets range listability selection criteria) bListRange = TRUE

http://www.ddj.com

Old range list ...

added

Listing One

New range list ...

added removed

list elements, as that information could be costly. Another performance optimization is that the number of comparisons of list elements themselves has been minimized. For each pair of list elements, the algorithm checks for range overlap, for growth or shrinkage at the bottom of the range, and for growth or shrinkage at the top of the range — and only then does it bump forward to do the same thing with the next element(s) of one or both lists. As with the splicing algorithm, to implement range-list comparison, you should provide code that accesses the appropriate list elements wherever this pseudocode contains statements such as “look up the next new range.” The diff results are collected at points designated “tell the outside world <whatever>.” You could add function calls at these points to update appropriate tracking elements in other lists, and so on. Alternatively, you could collect elements of a third list at these points and return a reference to those results when the diff is complete. A potentially time-saving feature of the range-list comparison algorithm is that, like the splicing algorithm, it lets you specify the boundaries applied to the beginning and ending points of the list. Suppose you’re interested in comparing only the ranges of objects that reside within a portion of a heap occupying a subset of the virtual address space. You can speci-

removed

such an algorithm, a coworker and I first searched for any comparable base/limit, list-wise diff routine. We searched through textbooks and web pages showing algorithms used in computer vision and DNA sequencing. Because we could find no such algorithm, I wrote my own. This algorithm is suitable for any situation where lists of ranges represented by base/limit pairs need to be compared. In the pseudocode for the range-list comparison algorithm (Listing Four; available electronically), I’ve chosen to represent the two lists as “old” and “new” because diff algorithms are probably most often used to determine which elements of a dataset have been updated between two points in time. Figure 1 shows the kind of information supplied by the range-list comparison algorithm. Given an old and new list, this algorithm indicates which subsets of the list have changed, including whether the changes represent data specific to the old list or the new list. A notable aspect of the algorithm is the number of Boolean variables needed to control the passage through the old and new lists. It is coded this way largely for “debugability” because the state information at any point in the algorithm is rather complex. There is also a performance consideration: This algorithm frequently needs to know whether there is a current and next element on each list, and to repeatedly dig into the

/* The Destroy Range List algorithm /* Frees all the memory associated with an outdated list of /* contiguous address ranges.

*/ */ */

/* Input parameter */ LIST a list of ranges /* All other variables are references to address range tracking /* list elements. These include references to the "current" and /* "next" elements in the list being destroyed.

*/ */

look up the first range on the old list and make it "current"

*/ */

WHILE (there is a "current" list element) DO look up the next range on the old list and make it "next" delete the "current" list element make the "next" element "current" END

*/ */ */

free the list header

Dr. Dobb’s Journal, February 2006

DDJ 51

Displaying GIF Images on J2ME Mobile Phones With some phones, you have to roll your own GIF decoder TOM THOMPSON

T

oday’s mobile phones are versatile devices. Besides placing calls, they offer features such as Bluetooth connectivity, Internet support, built-in cameras that take pictures or capture video, and 3D graphics rendering capability, to name a few. Surprisingly, they sometimes lack certain important features. For instance, despite their ability to connect to and surf the Web, many mobile phones, from within their own Java environment, are unable to display a Graphics Interchange Format (GIF) file. I discovered this odd omission in graphics capability while writing a program that made use of a mobile phone’s GPS APIs. The idea was to get a GPS fix of the phone’s location (presumably its owner would be nearby) and present a map of the area. I thought that the hard part of the project would be to generate the location map, while the easy part would be to get the latitude and longitude position fix from the GPS APIs. I didn’t anticipate any problems displaying the map, given the mobile phone’s rich and varied graphics support. Tom is a technical writer providing support for J2ME wireless technologies at KPI Consulting Inc. He can be contacted at [email protected]. 52

Typical of software-project scheduling, I got it all backwards: Obtaining the map turned out to be the easy part. I discovered the U.S. Census Department’s Tiger server that, if provided latitude and longitude and the desired image size in pixels, renders a map and returns it as a GIF file to you. As you can see in Figure 1, the map is exquisitely detailed, using standard map colors for parks, cities, state boundaries, and highways labeled with their route numbers. The hard part turned out to be displaying that GIF file. It took only several crashes in the debugging emulator to trace the problem to the Java 2 Mobility Edition (J2ME) platform, which is the phone’s Java runtime environment. Simply put, J2ME couldn’t decode and display GIF files. As the census server provides its maps only as GIF files, completing the project required that I roll my own GIF decoder for J2ME. In this article, I show how this was done. The complete source code that implements this technique is available electronically; see “Resource Center,” page 6. The Law of Unintended Consequences J2ME’s inability to display GIFs is particularly baffling considering that some mobile phones can download JPEG, BMP, and animated GIF files as wallpaper. How could this gap in graphics capability occur? The answer lies in the unintended consequences of how J2ME implements security and graphics. The J2ME platform uses a “sandbox” model to carry out its security mechanism. MIDlets — mobile applications that are similar in design to a web browser’s Java applet — execute within a separate memory space that’s walled off from the phone’s native operating system and resources. MIDlets are free to play in the Dr. Dobb’s Journal, February 2006

sandbox, but can’t access any hardware or resources outside of the box by themselves. Access to the phone’s screen, audio system, and other hardware features is mediated through J2ME API classes. Simply put, if an API isn’t available for a particular hardware feature (such as Bluetooth), a MIDlet can’t use it. This prevents a malicious MIDlet from crashing the phone just before a critical call is placed, or tampering with the phone’s address book, or doing funny things with Bluetooth.

“J2ME’s inability to display GIFs is particularly baffling” Per the specifications of J2ME’s support APIs, the execution environment only has to implement the display of Portable Network Graphics (PNG) images. Ironically, support for this one particular format has to do with the legal problems surrounding GIF files. CompuServe invented the GIF format in the 1980s to handle the transfer and display of online graphics. Given the limited capabilities of the graphics hardware of the time, GIF files contained a 24-bit color table, with pixel color information represented as 8-bit values. The 8-bit value http://www.ddj.com

limits GIF files to displaying only 256 colors. Because of the low bandwidth connections, the file contents were reduced in size through the use of a Lempel-ZivWelch (LZW) compression scheme. For a while, the fact that the LZW algorithm was patented wasn’t an issue. Then came the Web, and with it, the demand for graphic images soared. At the start of 1995, Unisys — the owner of the LZW patent — and CompuServe began aggressively pursuing fees for the use of GIF images. This event precipitated the development of the PNG format as an alternative to the GIF format, and by late 1996, the PNG file format was adopted as a standard by the W3C. PNG is a lossless image format that supports images with pixel depths ranging from 1 bit (black-and-white), up to 48 bits (what’s termed “truecolor”), an alpha channel, and data compression. Crucially, PNG was carefully designed so that its data structures and compression algorithms didn’t infringe on any patents. For more information on the PNG file format, see “PNG: The Portable Network Graphic Format,” by Lee Daniel Crocker (DDJ, July 1995). In 1999, when Sun Microsystems developed the J2ME platform specification and its supporting APIs, PNG was chosen as the default image format because of its graphics capabilities, small file size, and that it was unencumbered with patent issues. GIF was, to a large extent, sidelined as a graphics format. The situation has improved for GIF since 2004, when the world-wide Unisys patents expired. While GIF has a limited color palette, this range is adequate for most graphic images, and its small size is still valuable for the mobile phone’s limited bandwidth wireless connections. Its animation feature is widely supported by all web browsers. Because of GIF’s checkered history, the J2ME platform often doesn’t support the format. For the same reason, the native operating systems of many mobile phones don’t handle GIF files, either. Even if the phone’s native OS happens to support GIFs, the sandbox security mechanism blocks its use unless the vendor goes through the trouble to expose the GIF decoder routine to J2ME. Checking for Native GIF Support Once I knew what the problem was, how could I go about fixing it? It might seem that when your MIDlet receives a GIF file, the best course of action is to simply invoke the GIF-decoder routine. Practically, to conserve the phone’s limited working memory, you want to check for the presence of any GIF-decoding capability. J2ME has a collection of classes known as the Mobile Information Device Profile http://www.ddj.com

(MIDP) that implement the MIDlet’s runtime environment and thus constitute its operating system. The MIDP 2.0 implementation provides utility methods that can query the host phone as to its capabilities and features. To determine if J2ME handles a specific media format, you invoke the getSupportedContentTypes( ) method. This method returns a string array of supported media types, and includes the audio and video formats along with the image formats. The strings present this information in MIMEtype format. To check for GIF support, you scan this array, looking for the GIF MIMEtype string. If there is a match, then the phone’s J2ME implementation supports native display of GIF files. Listing One shows how to use getSupportedContentTypes( ) to obtain the media types array, along with loop code that does the array scan. The code sets the Boolean flag, gifSupported, if a match is found. Because this check only has to execute once, a good place for this code is in the class’s constructor. If the MIDlet executes on one of those rare phones with GIF support, then displaying it becomes a matter of writing a few lines of code. J2ME stores its graphic data in an Image object. A method in this class, createImage( ), takes as an argument an array whose data is organized into a self-identifying image format such as PNG, GIF, and others. All that’s required is to route the data stream returned from the server into createImage( ), and it generates the Image object for you. For the more common case, the mobile phone lacks GIF decoder support, so you must invoke the homemade GIF decoder class (call it GifDecoder) to handle the chore. You’d therefore make an instance of the GIF decoder object, supply it with the data stream, and invoke one of GifDecode’s methods (call it read( )) to read and decode the data. Another method (getImage( )) then extracts the converted image and returns it in an Image object. Listing Two shows the code that tests the state of the gifSupported flag and calls the appropriate method. Undertaking the Port Now all that was left to do was to close that gap in Listing Two by writing the GIF decoder class. Seasoned programmers don’t want to reinvent the wheel if they can help it, particularly because the odds were good that a Java-based GIF decoder already existed. Therefore, I immediately Googled to see what code was available on the Internet. Unfortunately, Sturgeon’s Law applies equally to Internet content: 90 percent of what the search turned up was junk. Still, the remaining 10 percent appeared promising and could be sifted through quickly. Dr. Dobb’s Journal, February 2006

I soon happened upon a lightweight Java-based GIF decoder written by Kevin Weiner, from FM Software. His decoder class, GifDecoder, has a small resource footprint, provides LZW decompression,

Figure 1: GIF map output by the Tiger server.

Figure 2: Format of a GIF89a animated image file. 53

and offers methods that can read an image either from a data stream or from a file (http://www.fmsware.com/stuff/gif .html). The decoder could even parse animated GIF files and methods were provided to help animate such images. In addition, Kevin offered the source code to the community at large, with no stringent copyright conditions. GifDecoder was written for the J2SE platform, but porting this class to J2ME appeared manageable.

Figure 3: How an animated GIF image is displayed.

The port of GifDecoder was relatively painless, thanks in large part to the fact that J2ME’s capabilities are identical to J2SE in many areas. The largest variations between the two Java platforms lie in their GUI classes. Because GifDecoder processes data behind the scenes for other objects, this averted dealing with the GUI differences. I started by commenting out those GifDecoder methods I didn’t need for my project. I eliminated GifDecoder’s file read( ) method because my application receives its images over the air through an HTTP connection. I would have handled this differently if the target phones used either the Symbian OS or the JSR 75 FileConnection API, both of which provide file access. However, I did modify the file read( ) method to retrieve data from MIDlet resources. Resources are part of the MIDlet’s archive file, and often store the graphic images the MIDlet uses to draw its GUI. Next, I changed all instances of BufferedInputStream to DataInputStream. The BufferedInputStream class isn’t available on J2ME, and DataInputStream’s capabilities were adequate for the job. I did spend some time struggling with getting the code to read the GIF file’s color tables. I tracked this down to a bug where, when reading an HTTP stream, DataInputStream’s read( ) method can throw an exception if the amount of bytes for the requested read is larger than the stream buffer’s size (255 bytes). The workaround was to write a loop that used readByte( ) to fetch the color table, byte by byte. Another modification was to replace Image’s getRaster( ) J2SE method with a pair of MIDP 2.0 methods, getRGB( ) and

Result

Constructor/Method

Description

Constructor GifDecoder

GifDecoder(void)

Constructor that generates an instance of the GIF decoder object.

Methods int

getDelay(int frameNumber)

Image int

getFrame(int frameNumber) getFrameCount(void)

Image int

getImage(void) getLoopCount(void)

int

read(DataInputStream is)

int

read(InputStream is)

int

read(String name)

Returns the delay time, in milliseconds, associated with the image specified by frameNumber. Returns the image specified by frameNumber. Returns the number of images (frames) in an animated GIF file. Returns the first image in an animated GIF file. Returns a value that represents the times the animation should loop. Reads the referenced DataInputStream object. Returns a status code of zero if no errors occurred during the read. Reads the referenced InputStream object. Returns a status code of zero if no errors occurred during the read. Reads the named resource in the MIDlet’s JAR file. Returns a status code of zero if no errors occurred during the read.

Table 1: GifDecoder class and methods. 54

Dr. Dobb’s Journal, February 2006

createRGBImage( ). J2SE’s getRaster( ), in one swoop, converts the decoded array of RGB pixels that comprise the picture into the Image object’s native format. Because J2ME lacks getRaster( ), the data translation became a two-step process. The getRGB( ) call converts the decoded GIF pixels into integers, and the subsequent call to createRGBImage( ) translates the integer array into the Image format. GifDecoder is also able to read GIF files that contain multiple images for animation, plus any control blocks that contain a delay interval associated with each image. See Figure 2 for the structure of an animated GIF file. On J2SE, GifDecoder stores the images and delay values using the ArrayList class. ArrayList isn’t implemented on J2ME, but its superclass, Vector, is. So I modified all appearances of ArrayList to use Vector, and rewrote the code to use Vector’s methods. The entire process didn’t take long, and I had an example GIF appearing on the phone’s screen within a few days. The bulk of that time was spent tracking down and identifying the read( ) bug. Using the GIF Decoder Adding the GIF decoder in a mobile application is easy: Just include the GifDecoder.java file along with the MIDlet’s source files. In your application code, make an instance of GifDecoder, then use the appropriate methods to read and display the GIF images. Table 1 documents the methods available. Listing Two shows how it’s done. Once you’ve made an instance of GifDecoder, you invoke the appropriate read( ) method, depending upon whether you obtained the GIF file from a data stream or are reading a MIDlet resource. If the read completes without errors, then you call getImage( ), which returns the GIF picture in an Image object. If getImage( ) is applied to an animated GIF file, this method returns the first image in the file. Presenting an animated GIF is relatively straightforward, thanks to the utility methods GifDecoder provides. You first use getFrameCount( ) to fetch the value that specifies how many images (or frames) make up the animation. This becomes the termination value for a loop counter. Within the loop body, you pass the loop’s index as an argument to both getDelay( ) and getFrame( ) to obtain each frame’s delay time and its corresponding image, respectively. The delay values represent time intervals in hundredths of a second. Because most computers and cellphones have timers with millisecond resolution, getDelay( ) multiplies the value it obtains by 10 to convert the interval value into milliseconds. The result from getDelay( ) http://www.ddj.com

can therefore be jammed straight into J2ME’s Thread.sleep( ) method to implement the delay. GifDecoder supports the drawing options specified in the GIF file’s graphic-control extension blocks. It also supports a transparent color, although the phone’s J2ME implementation can affect this feature’s behavior. You’ll execute the GIF animation code in a separate thread, either as part of a Thread class or as part of another class’s run( ) method. This is done so that the MIDlet’s UI — which runs in another thread — is free to respond to any events that the user generates. Listing Three shows how to use a run( ) method to read and display the images in an animated GIF file. Figure 3 shows how the successive images in a GIF can present the animation of a UFO. The serial serviceRepaints( ) method acts as a simple synchronization mechanism in that it ensures that the image is

drawn completely before the next delay interval and image are fetched and displayed. An example MIDlet, GIF_Display, is available for download, and it displays several animated GIF images. Note that a phone that natively supports the GIF format may only display static images, not animated ones. When compiled, GifDecoder is only 11 KB in size, suitable for use in any MIDlet that has special graphics needs. Note that this size is before an obfusticator is applied to GifDecoder to further reduce its code footprint. The downside to using GifDecoder is that large and lengthy image animations can consume a lot of memory. An animation that consists of 24 144×52-pixel images can easily consume around 60 KB of memory. On a phone with 256 KB of working memory, that consumes over 20 percent of available memory just to support a graphic. You can re-

Listing One private String private final private boolean

mediaTypes[]; String GIF_MIME_TYPE = "image/gif"; gifSupported;

// Get the media types to check for support of GIF file display mediaTypes = Manager.getSupportedContentTypes(null); int count = mediaTypes.length;

duce the animation’s memory footprint by using smaller images and fewer of them. Conclusion While a GIF image has a limited color palette, it’s often adequate for most graphics purposes, including the display of some photos. In exchange for the limited color range, GIF images require less storage than JPEG images. In addition, there are plenty of GIF editing animation tools that allow you to easily add your own graphics and animations for addition to a MIDlet’s UI. The ability to read, display, and animate GIF images that appear on many Internet sites is an important asset to have for any MIDlet performing network tasks. GifDecoder is thus a valuable tool to have in your J2ME programming toolbox. DDJ

t = d.getDelay(i); repaint(); serviceRepaints(); try { Thread.sleep(t); } catch (Exception ex){} } // end for } // end while } // end if } // end run

// Check list for GIF MIME type; set support flag if present gifSupported = false; for (int i = 0; i < count; i++) { if (mediaTypes[i] == GIF_MIME_TYPE) gifSupported = true; } // end for

// Get frame's delay

// Delay as directed

DDJ

Listing Two private static final String READ_OK = 0 String url = "http://www.nosuchsite.net/StarTrek/enterprise.gif"; HttpConnection hC = null; DataInputStream dS = null; Image

mapImage = null;

// Open the connection as an HTTPConnection; send request try { hC = (HttpConnection) Connector.open(url); hC.setRequestMethod(HttpConnection.GET); hC.setRequestProperty("IF-Modified-Since", "10 Nov 2000 17:29:12 GMT"); hC.setRequestProperty("User-Agent","Profile/MIDP-2.0 Configuration/CLDC-1.1"); hC.setRequestProperty("Content-Language", "en-CA"); } catch (IOException e) { } // running without safety net! // Read the data stream for the returned GIF image int iByteCount; iByteCount = (int)hC.getLength(); dS = hC.openDataInputStream(); // Does J2ME implementation support native GIF format decode? if (gifSupported) { mapImage = Image.createImage(dS); // Yes, translate data // into an Image } else { // No, do it ourselves: get instance of GIF decoder and decode stream GifDecoder d = new GifDecoder(); if (d != null) { int err == d.read(dS); if (err == READ_OK) { mapImage = d.getImage(); } //end if } end if } // end else

Listing Three // The run method for the class. public void run() { int t; if (gifImage != null) { while (action) { int n = d.getFrameCount(); // Get # of frames for (int i = 0; i < n; i++) { // Loop through all gifImage = d.getFrame(i); // Get frame i // Delay duration for frame i in milliseconds

http://www.ddj.com

Dr. Dobb’s Journal, February 2006

55

Sudoku & Graph Theory Algorithms for building solvers EYTAN SUCHARD, RAVIV YATOM, AND EITAN SHAPIR

S

udoku is a logic puzzle in which there are 81 cells (vertices) filled with numbers between 1 and 9. In each row, the numbers 1,2,3,..,9 must appear without repetition. Likewise, the numbers 1,2,3,..,9 must appear without repetition in the columns. In addition to the row and column constraints, the numbers 1,2,3,..,9 must appear in the nine nonoverlapping 3×3 subsquares without repetition. So in short, the puzzle board is separated into nine blocks, with nine cells in each block (see Figure 1). There are several possible rules you can use to successfully fill in missing numbers. In this article, we examine two rules — Chain Exclusion and Pile Exclusion — for solving Sudoku puzzles. These rules are at the heart of a Windows-based Sudoku solver that we built using Visual C++. Executables and the complete source code for this solver are available electronically (see “Resource Center,” page 6). The goal of this logical Sudoku solver is to prove that only one possible number can be assigned to each vertex, and to find that number for each vertex in which the num-

Eytan, Raviv, and Eitan are software engineers in Israel. They can be contacted at [email protected], ravivyatom@bezeqint .net, and [email protected], respectively. 56

ber is not defined. Illogical Sudoku puzzles can also be solved, but require guesses (Implementation, OK button). We refer to possible numbers that should be assigned to a row, column, or one of the nine 3×3 subsquares as a “Permutation Bipartite Graph” or nodes. A node consists of a vector of n>1,n=2,3,4… vertices and all possible numbers that can be assigned to these vertices, such that there exists at least one possible match between the vertices of the vector and the numbers 1,2,…n. For example, the following are nodes: ({1,2,3,5},{2,3},{2,3,4},{3,4},{4,5}, n=5 ({1,2,3,7},{3,6},{3,4},{1,4},{5,6,7},{4,6},{2,7}, {8,9},{8,9}, n=9

A possible match for the first vector is easy: 1 2 3 4 5

-> -> -> -> ->

{1,2,3,5} {2,3} {2,3,4} {3,4} {4,5}

A possible match for the second vector is more tricky: 2 3 4 1 5 6 7 8 9

-> -> -> -> -> -> -> -> ->

{1,2,3,7} {3,6} {3,4} {1,4} {5,6,7} {4,6} {2,7} {8,9} {8,9}

A number can be only assigned to a vertex that contains the possibility of assigning that number. For instance, only the following possibilities are accepted: 7 -> {2,7} or 2 -> {2,7}.

Dr. Dobb’s Journal, February 2006

Pile Exclusion and Chain Exclusion provide the basis of logical elimination rules. To understand Pile Exclusion, consider the following nodes: ({1,2,3,5},{3,6},{3,4},{5,6},{1,7,8,9},{4,6},{5,7,8,9}, {4,6},{6,7,8,9},{1,4}, n=9

“There are several possible rules you can use to successfully fill in missing numbers” The numbers 7,8,9 appear only in three vertices: {1,7,8,9},{5,7,8,9},{6,7,8,9}

Because there is at least one possible match in the Permutation Bipartite Graph, one vertex will be matched to 7, one to 8, and one to 9. Thus, you can erase the other numbers from these three vertices to get the following three augmented vertices: {1,7,8,9} -> {7,8,9} {5,7,8,9} -> {7,8,9} {6,7,8,9} -> {7,8,9}

and the entire Permutation Bipartite Graph becomes: ({1,2,3,5},{3,6},{3,4},{5,6},{7,8,9},{7,8,9},{4,6}, {7,8,9},{1,4}), n=9

http://www.ddj.com

As for Chain Exclusion, consider these nodes: ({1,2,3,7},{3,6},{3,4},{1,4},{5,6,7},{4,6},{2,7},{8,9}, {8,9}, n=9

In the second, third, and sixth positions in the vertices vector, you have: {3,6},{3,4},{4,6}

Only the numbers 3,4,6 can be assigned to these vertices. From this, you infer that 3,4,6 are not a matching option in any of the remaining vertices. Thus, you can erase these numbers from all the other vertices, resulting in a new, more simple graph: ({1,2,7},{3,6},{3,4},{1},{5,7},{4,6},{2,7},{8,9}, {8,9}, n=9

You can do the same thing with {1}, so that the resulting graph is: ({2,7},{3,6},{3,4},{1},{5,7},{4,6},{2,7},{8,9}, {8,9}, n=9

Algorithms We now present an algorithm that finds all such chains in polynomial time. The first stage is to find the best bipartite matching using the Ford Fulkerson maximum flow algorithm or any other bipartite matching algorithm (see Introduction to Algorithms, Second Edition, by Thomas H. Cormen et al., MIT Press, 2001). Let W be the set of n cells in a row, column, or subsquare. That means that there are n cells or vertices in W when the entire Sudoku puzzle has n×n cells or vertices. Let D be the set of numbers that are matched to W. In an n×n Sudoku, D is the set of numbers 1,2,3,…,n. Let Sk be a subset of k cells out of the n cells in which the possible numbers are exactly k
u1∈W. Suppose u1∈Sk. We check all the vertices v i1 in D such that the edges (u1,vi1) exist in E. Because Sk is connected only to Tk,vi1 are all in Tk. We mark u 1 such that we will not consider u 1 again. We remove u1 from the list of vertices the algorithm will visit. Now we look for all the edges (ui1,vi1) in M that are connected to one of these vertices vi1. Obviously, ui1 are in Sk for all indices i1 because all the vertices in Tk are matched by M to vertices in Sk. We continue the process recursively. Now, for each ui1 we look for all vi2 in D that are connected to at least one of the ui1 vertices by edges. We mark ui1 such that these vertices will not be considered again. Again, the vi2 must be in Tk and we continue and look for all the edges (ui2,vi1)∈M. Obviously, ui2 are in Sk. Because u1 and the vertices ui1 were removed from the list of vertices that the algorithm visits, there are fewer vertices that can be visited by the algorithm. The process is repeated until all the vertices in W it can visit are reached. Because all the vertices it can reach are in Sk, what is left to prove is that there is no vertex u in Sk that is not visited by the algorithm. Suppose there is such a vertex u. Then, obviously, the vertex v in Tk that is matched to u could not be visited either; otherwise, u could be visited. But then, v is also not connected to any one of the vertices in Sk that were visited. So we can define Qk–1=Sk–{u}, which is connected to |Sk|–1 vertices in D. That is a contradiction to the requirement that Sk is minimal. A simpler algorithm is the Pile algorithm. Let G be a Permutation Bipartite Graph. We would like to find if there exist vertices vi1,…,vik in D such that they are all connected to the same k vertices in W, uj1,…,ujk. Such vertices are called “Pile” or “Set.” The algorithm is trivial. Start traversing the vertices vk in D serially in a loop. Activate a second loop within the first loop and count all the other vertices in D that are connected to the same k vertices uj1,…,ujk in W that vk is connected to. If there are k such vertices vi1,…,vik in D, then you are done. Although this algorithm can be improved, it is efficient and its runtime is |W|2/2. After finding a chain, the algorithm can erase all the edges that are connected to the vertices in Tk that are not connected to Sk. This operation is called “Chain Exclusion.” That is, if ∈=(u,v) and v∈Tk and u∈Sk, then ∈ is removed from the Permutation Bipartite Graph G. After finding a Pile, remove all the edges that connect uj1,…,ujk to D –{vi1,…,vik}. Implementation We’ve used the Chain Exclusion and Pile Exclusion algorithms described here to Dr. Dobb’s Journal, February 2006

5 9

7

1 8

1 3 7 4 6 2 5 3

9 2

1

7 8 4 2 8 3

9 4

7

9

Figure 1: Sudoku puzzle. build a Windows-based Sudoku solver in Visual Studio C++ 6.00. Executables and the complete source code are available electronically; see “Resource Center,” page 6. You will need to rename the file Sudoku.ex1 to Sudoku.exe. Alternatively, you can compile the entire project Sudoku.dsw or Sudoku.dsp with Visual Studio 6.00. We’ve included a simple console application, Logic.ex1, that demonstrates Chain Exclusion. This file should be renamed to Logic.exe. In Sudoku.exe, the Logic button calls the class Square3.cpp, and Square3.cpp calls Bipartite.cpp. The Logic button fills as many cells as possible with numbers, by using logic only. You can generate puzzles that usually have more than one solution by clicking on the Test button when the Unique option doesn’t have a check mark. If the Unique option is checked, the Sudoku puzzle is solved by logic only. The Very Hard option automatically checks the Unique option and takes 5 seconds. You can also enter your own puzzle and press either the Logic button or the OK button. If the puzzle can’t be solved because of contradictions, OK won’t fill it and Logic will fill as many cells as possible. When the contradiction is encountered, it inserts a double number in row, column, or square. That can be verified by the Check button. In addition, Sudoku.exe also solves illogical puzzles via the OK button that calls recursion that in each iteration fills in numbers in locations in which there are less degrees of freedom. For more information on Sudoku puzzles, see the Daily Sudoku (http://www.dailysudoku.co.uk/) or the Times Online (http://www.timesonline .co.uk/section/0,,18209,00.html). DDJ 57

GOOGLE’S SUMMER OF CODE

FreeBSD/nsswitch and Caching

N

sswitch is an extremely useful subsystem that exists on UNIX platforms such as Linux, Solaris, and FreeBSD. It provides a flexible and convenient way to configure how name-service lookups are done. Nsswitch operates with two basic concepts — database and source. When users query the particular database (password, group, hosts, and so on) for information, nsswitch decides which source (files, NIS, LDAP) this information should be taken from. The basic idea of nsswitch is that hard- coded name- service lookup functions (getpw**, getgr**, gethost**, getaddrinfo, and the like) are never called directly. The function: nsdispatch(void *retval, const ns_dtab dtab[], const char *database, const char *method_name, const ns_src defaults[], …)

is called instead. In turn, it dispatches the call to the appropriate sources. Name: Michael A. Bushkov Contact: [email protected] School: Rostov State University,

Russian Federation Major: Applied Mathematics Project: FreeBSD/nsswitch and

Caching Project Page:

http://wikitest.freebsd.org/moin.cgi/ NsswitchAndCachingFinalReport/ Mentors: Brooks Davis and Jacques Vidrine Mentoring Organization: FreeBSD (http://www.freebsd.org/)

FreeBSD nsswitch implementation supports various possible databases: password, groups, hosts, netgroups, and shells. In this project, we extended this list by adding services, rpc, protocols, OpenSSH, and Globus Grid Toolkit 4 databases. To add the support for the particular database, we had to implement several nsswitch sources for it, and then replace all hard-coded function calls with the nsdispatch calls. To add the support for services, rpc, and protocols, we also had to change the interface of the corresponding internal reentrant libc functions to improve the compatibility with Linux/Solaris nsswitch implementations. We’ve changed the interfaces from the HP-UX style: int getservbyname_r(const char *name, const char *proto, struct servent *serv, struct

to Linux/Solaris style: int getservbyname_r(const char *name, const char *proto, struct servent *serv, char *buffer, size_t bufsize, struct servent **result);

Because all nsswitch requests are passed through nsdispatch, it’s a great place to organize caching in a general way. Caching can significantly improve system performance. It’s useful for the services database, for example, because the /etc/services file becomes bigger and bigger, and getserv** functions become slower and slower. We had to modify the nsdispatch code, so that it could process the “cache” source. Besides, marshaling and unmarshaling routines had been implemented for every nsswitch database type.

Michael A. Bushkov To make the cache, nsdispatch interacts with the caching daemon, which is built on top of the caching library (they were both developed during the “Summer of Code”). The caching library provides a simple interface for cache organization. It uses hash tables to store data and supports different policies, which are applied when cache size is exceeded. The caching daemon uses UNIX socket to communicate with libc to perform read/write operations. The interesting feature of the caching daemon (and the caching library) is the multipart caching and the concept of sessions. This approach is very useful for getXXXent( ) functions. When the first call to getXXXent( ) is made, the write session is opened. If the setXXXent( ) or endXXXent( ) is called, the session is abandoned and all its data are freed. If the getXXXent( ) function indicates the successful end of a sequence, the session is gracefully closed and all session data are placed in the cache. DDJ

Userspace Filesystems Framework for NetBSD

A

long time ago, the two competing paradigms for designing an operating system were the monolithic and microkernel approaches. While the performance benefits of monolithic kernels with direct access to memory are unde-

niable, microkernels have more beauty and theoretical appeal. Since these days everybody is using excessive hardware performance as an excuse to add bloat; it is only fair to use it to add something useful.

Name: Antti Kantee Contact: [email protected] School: Helsinki University of Technology, Finland Major: Graduate student, Computer Science Project: Userspace Filesystems Framework for NetBSD Project Page: http://netbsd-soc.sourceforge.net/projects/userfs/ Mentor: William Studenmund Organization: NetBSD Project (http://www.netbsd.org/)

58

Dr. Dobb’s Journal, February 2006

Implementing a filesystem in userspace is beneficial for several reasons: • Development can take advantage of a faster bugfix-compile-restart cycle. Also, debugging is easier because it is possible to run the filesystem under a normal userspace symbolic debugger. • The filesystem can access information that traditionally has been difficult to access from inside the kernel. A simple example could be a web site accessed over HTTP using a readily available HTTP library. • The actual implementation does not necessarily have to be written in C. Of course, http://www.ddj.com

gloox: A High-Level Jabber/XMPP Library for C++

g

loox was born as part of a university project (XMPPGrid: A Grid Framework) that used Jabber/XMPP as a transport protocol. Because, at that time, there were no C++ XMPP libraries available that suited my needs, I decided to roll my own. gloox (http://camaya.net/gloox) heavily uses the Observer Pattern. There are listeners (“handlers” in gloox-speak) for almost every imaginable event that can occur, from connection establishment to error conditions. After a connection has been established, everything is event driven, and simple applications, such as bots, can easily do without a mainloop or threads. On the other hand, gloox exposes the necessary interface to manually initiate fetching of data from the socket. Right after the XML parser receives a complete stanza, it is parsed into a Stanza object that offers a convenient interface to take apart such an XML element. The respective handlers are then called based on the stanza’s type. The library offers classes to create regular clients as well as components. These only offer basic functionality, but can be extended with several included implementations of so-called Jabber Enhancement Proposals (JEPs) to create a fullfeatured client/component. In general, using the library is as simple as:

• Creating a new Client or Component object. • Creating and registering the desired handlers. • Calling connect( ). Most protocol enhancements follow a similar approach: They simply register as handlers for one of the Stanza types. For example, the Info/Query (IQ) mechanism of the XMPP spec is an important tool to control various aspects of a session. The basic syntax of IQ packets is always the same and different protocols are distinguished based on the payload of an IQ packet: The child element and its namespace. gloox offers a handler for these namespaces, which makes it extremely easy to implement every IQbased protocol. Additionally, handlers for the remaining XMPP packet types (called “stanzas” in XMPP) are included, along with a generic tag handler for protocols not using these defined stanza types. While using these interfaces, the higher level layers offer handlers themselves, with data types tailored to their needs. This minimizes the need to know the XMPP protocol by heart if the included classes are used. Even though it is defined in the XMPP IM spec, Roster Management is an exam-

Name: Jakob Schröter Contact: [email protected] School: University of Applied Sciences, Bremen, Germany Major: Computer Science Project: gloox Project Page: http://camaya.net/gloox/ Mentor: Peter Saint-Andre Mentoring Organization: Jabber Software Foundation (http://www.jabber.org/)

having a userspace API for C is only half the battle (but it’s the larger half). • Leveraging existing application code written against the well-known libc filesystem API is made possible. Producing a framework involved attaching a new filesystem to the kernel frameworks, creating a communication pipe to the userspace and a serialized representation of filesystem operations, and creating an API to which userspace implementations could attach. Adding a new filesystem to the kernel side was mostly a question of leg work. However, one problem was having to think somewhat differently from the typical case: Usually, filesystems are implemented with a clear idea of the semantic effects of each http://www.ddj.com

vnode operation. But in this case, a “generic implementation” had to be devised. Communication to the userspace was implemented as a device node, some ioctls, and argument structures. This is an area for future work that may possibly produce a framework for generic kernel upcalls. The userspace API is dictated by the need to have an implementation backing each vfs and vnode operation. Also, the API aims to lift the burden of communication subroutines common to all filesystem implementations without restricting the potential for, say, an asynchronous implementation. Currently, the framework is still very much in the infant prototyping stage. After the system is stress tested, hardened, and perfected, it would be interesting to investigate providing similar frameworks in NetDr. Dobb’s Journal, February 2006

Jakob Schröter ple for such a higher level protocol. The RosterManager registers itself as a handler for IQ stanzas carrying “query” elements qualified by the jabber:iq:roster namespace. It can then add or remove items from a user’s contact list, and react to incoming so-called roster pushes, an updated contact list item sent by the server. The RosterManager offers clients a rich interface to be notified about any changes happening to the contact list. Events exist for adding and removing contacts, as well as for changes in subscription states. The decision of using/activating one (or more) of the protocol enhancements is with the user of the library. The modular structure allows addition and removal of those enhancements at runtime. More JEPs can easily be implemented, usually by creating handlers for the respective XML namespaces a JEP uses. gloox is licensed under the GPL and commercial licenses are available. DDJ

Antti Kantee BSD for other subsystems, such as networking stacks and device drivers. DDJ 59

SPARQL for Sesame

T

he initial goal of my project was to write a Java interpreter of the SPARQL query language for use in Sesame, an RDF data server. SPARQL, the first W3C standardized query language for the RDF data format, is a step toward standardization of the Semantic Web vision of W3C. The language is reminiscent of SQL — users specify a series of set and value constraints on the data in the server: SELECT ?title

WHERE { _:book :title ?title . FILTER (?title != "Old Title") }

The server then returns data values that fit those constraints. However, RDF data

is not relational and is usually visualized as a graph of data relationships. Therefore, queries are more akin to graph pattern matching, with variables being bound to certain matched parts of the graph. The first design was a library of classes that processed the query from within the object structure created by parsing the query into an abstract syntax tree. This design, however, suffered from one of the common problems in OO programming— a dependence on inheritance for extension. To customize the interpreter for other servers, one had to subclass certain query objects and rebuild the library. The final design uses a combination of design patterns to overcome this dependence. The main principle of the design is the separation of interpretation logic and query data, via prolific use of the Strategy pattern. Because abstract syntax trees lend themselves ideally to the Visitor pattern, a visitor is used at interpretation time to walk the AST query structure and bind logic to each part of the query, using an Abstract Factory to create the logic ob-

jects. Developers wishing to implement a customized query interpreter can shortcut the default logic using their own factory implementation to rewrite any part of the logic, without ever needing to recompile the main library. The primary efficiency penalty of the design is found in the data interface between the library and the server. Because most servers use a slightly different data object representation, every data value that is used by the interpreter has to be passed through its own adapter, which either passes on method calls or creates a new interpreter-compatible data value. For greater speed, the most computationally intensive set logic in the interpreter can be overridden to let servers do their own native data manipulation. Hopefully, the benefits of using a standardized specification library will allow server developers to focus more on the front-end server interfaces and underlying persistent storage and less on the particular quirks of this new query language. DDJ

Name: Ryan Levering Contact: [email protected] School: Binghamton University Major: Ph.D. candidate, Computer Science Project: SPARQL for Sesame Project Page: http://sparql.sourceforge.net/ Mentors: Giovanni Tummarello and Jeen Broekstra Mentoring Organization: Semedia Semantic Web and Multimedia Group (http://semedia.deit.univpm.it/).

Ryan Levering

TSC-I2: A Lightweight Implementation for Precision-Augmented Timekeeping

T

he quality of timekeeping is critical for many network protocols and measurement tools. TSC-I2 (TSC-Internet2) ensures accuracy and precision by making TSC rate calibration a continuous process, thus the accuracy of interpolation parameters could be ensured, which in turn results in satisfying clock precision. TSC-I2 maintains a soft clock of its own, periodically comparing this clock to the system clock. During each comparison, it synchronizes itself with the system clock, and adjusts the interpolation rate based

Xun Luo

60

on the offset and rate errors regarding to system clock. Whenever the accuracy of the soft clock is ensured, TSC-I2 uses this clock to report time to the library user; otherwise, the system clock value is reported. The advantage of this design is that system clock is enhanced rather than substituted. The clock discipline algorithm is enlightened by NTP. A state-machine-controlled PLL (Phase Lock Loop) traps the rateinduced phase difference between TSC clock and system clock. Rate wander is captured within one loop delay, and corrected in three to four following loops. To Name: Xun Luo Contact: [email protected] School: University of Illinois at Chicago Major: Ph.D. candidate, Computer Science Project: Timekeeping Using TSC Register Project Page: http://tsc-xluo.sourceforge.net/ Mentor: Jeff Boote Organization: Internet2 (http://www.internet2.edu/)

Dr. Dobb’s Journal, February 2006

avoid incorrect recognition of noise as a rate-induced error, two filters — a popcorn filter and a spike detector— are used. There are two usage modes: DAEMON and CLIENT. In DAEMON mode, a standalone daemon takes charge of timekeeping, serving one or more clients. In CLIENT mode, the library creates a thread running within the hosting process. Thus, it minimizes the application’s external dependency. There are also clear distinctions between TSC-I2 algorithms and its NTP counterparts, mainly due to the different natures of referencing sources. Readers interested in TSC-I2 internals can visit the project web site, where more details are illustrated. TSC-I2, which is fully implemented in around 2000 lines of C, is fairly lightweight. It has been published under the Open Source License at http://tsc-xluo .sourceforge.net/. TSC-I2 currently supports IA32, AMD64, and Power PC architectures, as well as Linux, FreeBSD, Mac OS X, and Microsoft Windows. DDJ

http://www.ddj.com

WINDOWS/.NET DEVELOPER

Viewing & Organizing Log Files LogChipper—a generic approach to tracking log content PHIL GRENETZ

D

evelopers charged with supporting mission-critical applications need to be alerted to problems when they arise in the production environment. Once a problem is identified, it is essential that you can browse and/or search application log files for clues as to the nature and cause of the problem. Only then can the implications of the problem be assessed and resolved as quickly as possible. Today’s applications are highly distributed. Clients interact with server processes asynchronously. As a result, logged events reflecting user activity are intermixed in the strictly chronological log files with events reflecting numerous types of notifications from server processes. Diagnosing production problems is a form of forensic analysis. Typically, on being alerted to an application error, the first step is to open the log file in a text viewer such as Notepad and search or browse the file for the exceptional event and other events that may have contributed to it. This can be a tedious process. Relevant clues can be missed. In addition, thirdparty tools maintain log files in their own formats. A generic approach to viewing and organizing log file contents is highly valuable in such an environment. In this article, I present LogChipper — one solution to this problem for the .NET platform. Figure 1 is LogChipper’s user interface. It uses two ListView controls to present the original view and the sorted and filtered view of events. Note the radio buttons for toggling between them. Also note the Phil has developed software for over 20 years in the defense, publishing, database, and financial industries. He can be contacted at [email protected]. http://www.ddj.com

checkboxes for enabling autoscroll and dynamic load and pausing the loading process. The column chooser lets users select the desired columns and rearrange them. The Format menu is populated by the plug-in parser with items for selecting custom features offered by the plug-in. The plug-in in Figure 1 parses logs of a popular FIX engine, a protocol used in the financial industry to send buy and sell orders to the exchanges, and communicate about events on these orders (see http:// www.fixprotocol.org/). Plug-In Architecture LogChipper is designed as a plug-in framework. Such a framework has three key ingredients. • The interface is one ingredient. LogChipper is a WinForms application designed as several assemblies; see Figure 2. The assembly named “LogView” contains the application’s host executable. “LogViewCommon” contains definitions of some common enumeration types. “LogViewInterfaces” is the assembly that exposes the definition of the plug-in parser interface. “LogViewPlugin_FIX” contains a specific implementation of the interface for FIX engine logs. One assembly houses a definition of the plug-in interface and the other implements that interface. The plug-in interface ILogViewParser is defined in Listing One. Any parser must implement ILogViewParser. In so doing, it must be responsible for processing events into fields of information, assigning the corresponding values to grid columns, and exploiting metadata (such as data type, identifying tag, and column heading). • Dynamic activation is another key ingredient. To be effective, the application needs to be able to instantiate any parser that implements ILogViewParser at will. In the .NET Framework, the Activator class makes this possible. Listing Two demonstrates what happens when users select a plug-in parser. Note the use of the Activator class method CreateInstance. There are several overloads, but the one that takes the assembly file name, and the name of the type to be Dr. Dobb’s Journal, February 2006

instantiated serves the current purpose best. CreateInstance constructs an instance of the requested type and returns a System.Runtime.Remoting.ObjectHandle. Calling Unwrap on this handle reconstitutes the object. Casting this object to the requisite interface type completes the process, equivalent to calling CoCreateInstance in COM, but without GUIDs or registry accesses.

“LogChipper is designed as a plug-in framework” • The third key ingredient of a plug-in framework is the ability to insert its custom features into the host application. Recall from Listing One that ILogViewParser defines the method GetPluginMenuItems. The plug-in implements this method by returning an array of objects of type MenuItem, defined in the System.Windows.Forms namespace. A plugin typically constructs the MenuItem array and linked MenuItem arrays (if any) representing cascading submenus during its initialization. Continuing in Listing Two, after instantiating the plug-in parser object, the host application calls GetPluginMenuItems on the interface and populates the Format menu. The plug- in parser constructs the menus as in Listing Three(a). The menu items allow for three modes of column heading display: • Events in the FIX protocol consist of tagvalue pairs, where the tags are numeric. • The parser maintains a mapping between the numeric tags and alphabetic equivalent headings. • The modes of display provided by this parser are numeric, alphabetic equivalent, and a blend of the two. Listing Three(b) illustrates one of the Format menu handlers. 61

You need a way of determining which parser assemblies are available to users. You could iterate over the DLLs in the application directory using CreateInstance to determine which DLLs are .NET assemblies that implement ILogViewParser. This solution offers dynamic discovery, but can be expensive. Instead, I decided to list the available parsers in the application configuration file (Listing Four). This way, users can choose from a collection of parsers by familiar format names, while the application instantiates the parser based on the assembly DLL filename. Multithreaded Design To let users browse, sort, and filter logs while they are being loaded, the file I/O for loading and parsing the events is placed in its own thread. This requires some thread synchronization; for instance, to ensure that new rows are not being inserted into a grid while new sort or filter criteria are being applied. Listing Five(a) contains a portion of the handler for opening a log file. After restoring the user’s settings for this file, it constructs a new thread to perform the file I/O and parsing. The member variable m_thPopulateList is of type Thread, defined in the System.Threading namespace. Creating the thread is a matter of constructing a Thread object, passing a ThreadStart delegate to its constructor. Delegate is a .NET type representing a callback method with a specified signature. A ThreadStart delegate represents a callback method that takes no arguments and returns void. A ThreadStart delegate is created by passing its constructor a reference to a method that has the proper signature and is designed to perform the work of the thread. In this case, that method is PopulateListThreadFunc. Listing Five(b) contains a portion of PopulateListThreadFunc demonstrating thread synchronization and indirect communication with the main thread. First,

note the use of m_SortParseMutex, a member variable of type Mutex, defined in the System.Threading namespace. Mutex offers a way to ensure that an operation that affects the state of a shared resource from one thread will not conflict with one in progress on another thread. A Mutex instance representing a Win32 mutex kernel object is created for each shared resource. All threads call the WaitOne method on the applicable Mutex instance before beginning an operation that affects the shared resource’s state. WaitOne blocks if another thread holds the mutex, returning only when Release has been called on it. In this case, the ListView controls must be protected from concurrent manipulation by the user and the file I/O thread. To prevent such a change from occurring while a new row is being inserted in its proper sequence into the sorted ListView control, changes to the sort sequence are synchronized via a call to WaitOne on m_SortParseMutex. “Tailing” the File Returning to Listing Five(b), note the references to various Boolean flags — variables m_bInitialLoadInProgress, m_bDynamicUpdate, m_bLoadPaused, and m_bStopRequested. To load new events from the log file as they are written, the I/O loop is continuous. If set to True, the variable m_bInitialLoadInProgress indicates that the end of the file has not yet been reached. Once the end of file is reached, new events (if any) are read from the file after putting the file I/O thread to sleep briefly so as not to hog the CPU when the bulk of the I/O task is finished. Again, the UI thread communicates indirectly with the file I/O thread. The checkbox labeled “Dynamic Update” is initially checked. The variable m_bDynamicUpdate alternates between True and False as users uncheck/recheck the checkbox. While False, “tailing” the file is disabled. Similarly, the variable m_bLoadPaused is synchronized with the state of the Pause

Multicolumn Sorting The .NET Framework defines the IComparer interface in the Systems.Collections namespace. It is used to specify how pairs of objects are compared for sorting and searching purposes. For instance, IComparer is used by the static methods Sort and BinarySearch of the .NET Framework class Array. The default sort behavior of the ListView class is case sensitive based on item text, the text displayed in the left-most column of the grid. By creating a class that implements the IComparer interface, it is possible to alter this behavior. Listing Six contains the ListViewItemComparer class, which derives from IComparer. Note that it has a custom constructor that takes an array of sort columns, a corresponding array of sort orders, and a reference to the plugin parser interface. Its implementation of the interface method Compare iterates over the sort columns starting with the most dominant sort column, using the sort order and data type of each, to determine which ListViewItem is greater. The data type of a field determines how to properly compare two items on that field. The parser holds the attributes of all of the fields and exposes them via the plug-in parser interface. Hence, the call to the interface method GetSortDataType is needed. Conclusion Among other improvements over MFC and other older frameworks, .NET represents a consistent programming model that hides the details of Win32 API programming and offers a rich class library. Although the details of filtering and parsing were beyond the scope of this article, there are many ways to present a UI for filtering the rows of a grid and to perform the filtering task. Likewise, there are numerous techniques for parsing events in a log file. Each format imposes constraints that emphasize one technique over others. The classes in the .NET namespace System.Text.RegularExpressions unleash the power of regular expressions. They can be applied wherever a pattern can be identified in the text. It can be advantageous to have several related log files open at the same time for browsing/searching. A multidocument extension is a planned enhancement for LogChipper. DDJ

Figure 1: LogChipper’s user interface. 62

Load checkbox. While False, file loading is disabled. Also, when a user selects Close, Exit, or Open from the File menu and clicks OK on the confirmation prompt, the variable m_bStopRequested is set to True. On detecting that the user has confirmed closing the current file, PopulateListThreadFunc returns.

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

Listing One /******************************************************** This file is part of the LogChipper(tm) software product. Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/ public interface ILogViewParser { void Initialize(string sFileName, Mutex sortParseMutex); void SetListViewColumnInfo(ref ListViewColumnInfo lvColumnInfo); int FormatGridColumns(ListView listViewMain,ListView listViewSort); bool ParseLineIntoGrid(string line, ListView listViewMain, ListView listViewSort,SortOrder order,ref int nNewRow); void GetAllHeadings(string[] sHeadings); SortDataType GetSortDataType(int iCol); string GetColumnHeading(int iCol); int GetColumnWidth(int iCol); void DisplayHeading(ListView listView,int iCol,string adornedValue); void GetPluginMenuItems(MenuItem[] menuItems); }

Listing Two /******************************************************** This file is part of the LogChipper(tm) software product. Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/

int nMenuItems = menuItems.Length; for (int iItem = 0; iItem < nMenuItems; iItem++) { menuFormat.MenuItems.Add(iItem, menuItems[iItem]); } }

Listing Three (a) /******************************************************** This file is part of the LogChipper(tm) software product. Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/ m_HeaderMenuItems = new MenuItem[3]; m_HeaderMenuItems[0] = new MenuItem("&Numeric Tags", new System.EventHandler( HeaderMenu_NumericTags_OnClick)); m_HeaderMenuItems[1] = new MenuItem("&Alphabetic Labels", new System.EventHandler(HeaderMenu_AlphaLabels_OnClick)); m_HeaderMenuItems[2] = new MenuItem("&Both", new System.EventHandler(HeaderMenu_Both_OnClick)); m_MenuItems = new MenuItem[1]; m_MenuItems[0] = new MenuItem("&Header Format", m_HeaderMenuItems);

(b)

System.Runtime.Remoting.ObjectHandle handle = Activator.CreateInstanceFrom(sPluginFile + ".dll", "LogViewPlugin.Parser"); m_LogViewParser = (LogViewInterfaces.ILogViewParser)handle.Unwrap(); m_LogViewParser.Initialize(sFormatFile + ".xml", m_SortParseMutex); m_nColumns = m_LogViewParser.FormatGridColumns(listViewMain, listViewSort); m_lvColumnInfo = new ListViewColumnInfo(); m_lvColumnInfo.SetSortColumns(m_anSortCols); m_lvColumnInfo.SetSortOrders(m_anSortOrders); m_LogViewParser.SetListViewColumnInfo(ref m_lvColumnInfo);

private void HeaderMenu_AlphaLabels_OnClick(object sender, System.EventArgs e) { if (m_headerFormat != HeaderFormat.Alpha) { m_headerFormat = HeaderFormat.Alpha; UpdateColumnHeadings(m_asColLabels); } }

if (m_nColumns >= 1) { m_sFormatFileName = sFormatFile; menuFormat.MenuItems.Clear(); MenuItem[] menuItems = null; m_LogViewParser.GetPluginMenuItems(ref menuItems);

Listing Four

http://www.ddj.com

/******************************************************** This file is part of the LogChipper(tm) software product.

(continued on page 64)

Dr. Dobb’s Journal, February 2006

63

(continued from page 63)

private int m_nSortColumns; private ArrayList m_anSortCols; private ArrayList m_anSortOrders; private LogViewInterfaces.ILogViewParser m_LogViewParser; public ListViewItemComparer() { } public ListViewItemComparer(ArrayList anSortCol,ArrayList anSortOrder, LogViewInterfaces.ILogViewParser logViewParser) { m_nSortColumns = anSortCol.Count; m_anSortCols = anSortCol; m_anSortOrders = anSortOrder; m_LogViewParser = logViewParser; } public int Compare(object x, object y) { int nRet = 0; for (int iCol = 0; iCol < m_nSortColumns; iCol++) { nRet = CompareSingleColumn(x,y,(int)m_anSortCols[iCol], (SortOrder)m_anSortOrders[iCol]); if (nRet != 0) { break; } } return nRet; } public int CompareSingleColumn(object x, object y, int iCol, SortOrder order) { int nRet = 0; string s1, s2; SortDataType type = m_LogViewParser.GetSortDataType(iCol); switch (type) { case SortDataType.AlphaNoCase: nRet = String.Compare( ((ListViewItem)x).SubItems[iCol].Text, ((ListViewItem)y).SubItems[iCol].Text, true); break; case SortDataType.AlphaCase: nRet = String.Compare( ((ListViewItem)x).SubItems[iCol].Text, ((ListViewItem)y).SubItems[iCol].Text); break; case SortDataType.Date: case SortDataType.Time: s1 = ((ListViewItem)x).SubItems[iCol].Text; s2 = ((ListViewItem)y).SubItems[iCol].Text; if ((s1.Length == 0) || (s2.Length == 0)) { nRet = String.Compare(s1, s2); break; } try { DateTime dt1 = DateTime.Parse(s1); DateTime dt2 = DateTime.Parse(s2); nRet = DateTime.Compare(dt1, dt2); } // If neither object has valid date format, compare as strings. catch { // Compare the two items as a string. nRet = String.Compare(s1, s2); } break; case SortDataType.Number: double d1 = 0; double d2 = 0; s1 = ((ListViewItem)x).SubItems[iCol].Text; s2 = ((ListViewItem)y).SubItems[iCol].Text; if ((s1 != null) && (s1.Length > 0)) { d1 = Convert.ToSingle(s1); } if ((s2 != null) && (s2.Length > 0)) { d2 = Convert.ToSingle(s2); } nRet = (d1 < d2) ? -1 : 1; break; } if(order == SortOrder.Descending) { nRet *= -1; } return nRet; }

Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/

Listing Five (a) /******************************************************** This file is part of the LogChipper(tm) software product. Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/ // Restore selected columns, their widths and order // and selected sort columns and their sort order. RestoreViewSettings(); // Populate list view. m_bInitialLoadInProgress = true; m_bLoadInProgress = true; m_thPopulateList = new Thread(new ThreadStart(PopulateListThreadFunc)); m_thPopulateList.Priority = ThreadPriority.Lowest; m_thPopulateList.Start();

(b) while (!m_bStopRequested) { int nNewRow = -1; while (((line = sr.ReadLine()) != null) && !m_bStopRequested && (m_bInitialLoadInProgress || m_bDynamicUpdate)) { if (m_bStopRequested) { m_bStopRequested = false; m_bLoadInProgress = false; return; } while (m_bLoadPaused) { Thread.Sleep(100); if (m_bStopRequested) { m_bStopRequested = false; m_bLoadInProgress = false; return; } } m_SortParseMutex.WaitOne(); nNewRow = -1; if (m_bAutoScroll) { listViewMain.BeginUpdate(); } bool bParse = m_LogViewParser.ParseLineIntoGrid(line, listViewMain, listViewSort, listViewSort.Sorting, ref nNewRow); if (m_bAutoScroll) { listViewMain.EndUpdate(); } if (!bParse) { m_SortParseMutex.ReleaseMutex(); break; } if (m_bAutoScroll && (nNewRow >= 0)) { listViewMain.EnsureVisible(nNewRow); } m_SortParseMutex.ReleaseMutex(); } }

Listing Six /******************************************************** This file is part of the LogChipper(tm) software product. Copyright (C) 2004 Ivden Technologies, Inc. All rights reserved. ********************************************************/ } class ListViewItemComparer : IComparer {

64

DDJ

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

EMBEDDED SYSTEMS

Porting an RTOS To a New Hardware Platform Porting RTXC to the NPE-C167 BYRON MILLER

P

roduct development cycles are market driven, and market demands often require vendors to compress development schedules. One approach to this is to simultaneously develop similar products, yet with varying levels of product complexity. However, scheduling pressures coupled with increased product complexity can be a recipe for disaster, resulting in slipped schedules and missed opportunities. Consequently, vendors are always on the alert for silver bullets, yet as developers, we know that they don’t exist. That said, it is still in our best interest to seek better ways of compressing development cycles, and one way to do this is to port existing products to new hardware platforms, adding new features along the way. This is the approach we used to demonstrate a proof-in-concept when porting a legacy security application to a new hardware platform. Our firm was hired to make enhancements to the client’s existing 6502-based product, and we quickly realized that this platform was running out of steam. Specifically, the proposed features would significantly impact performance. Consequently, we proposed three options for fixing this problem: • Completely rewriting the application on the current hardware. • Rewriting the application on a new, higher performance hardware. • Migrating portable portions of the application to the new hardware. Byron is an independent firmware developer specializing in microprocessor and DSP design and development for data acquisition, control, and Internet appliances. He can be reached at [email protected]. http://www.ddj.com

After considering the options, we decided to port to new hardware. Overview Like most firmware projects, this project consisted of software and hardware, specifically the Quadros System’s RTXC Real-Time Operating System (http://www.quadros .com/) and test code that we ported to NorthPole Engineering’s NPE-167 Single Board Computer (http://www.npe-inc .com/), which is based on the Infineon C167 processor. The C167 supports a 16-channel, 10-bit A/D converter, two 16-channel capture and compare units, four-channel PWM, two general-purpose timer units, asynchronous/ synchronous serial channel (USART), highspeed synchronous serial channel, watchdog timer, bootstrap loader, 111 I/O lines, 2K internal RAM, 16 priority-level interrupt system, an eight-channel Peripheral Event Controller (PEC), and a Controller Area Network (CAN) 2.0B specification. In its basic configuration, the NPE-167 (see Figure 1) contains both 512K of Flash and SRAM. Memory is expandable to 17 MB of combined Flash and SRAM. It also supports a real-time clock, RS-232 serial port, and modified Serial Peripheral Interface (SPI) bus used to control four I/O cards. To bring up a real-time operating system (RTOS) on a new board, most vendors provide Board Support Packages (BSPs)— software and documentation that provide guidance on getting the RTOS to execute on new hardware. RTXC’s BSP for the C167 comes with the source code for the RTOS kernel, a system-generation utility (RTXCgen) for configuring the number of tasks, timers, and resources used for the specific application. It also includes documentation, some source-code examples, and makefiles for building the kernel and a sample application. RTXC Overview The Real-Time eXecutive kernel (RTXC) supports three kinds of priority-based task scheduling: preemptive (the default), round-robin, and time-slice. RTXC is robust, supports hard deadlines, changeable task priorities, time and Dr. Dobb’s Journal, February 2006

resource management, and intertask communication. It also has a small RAM/ROM code footprint, standard API interface, and is implemented in many processors. RTXC is divided into nine basic components: tasks, mailboxes, messages, queues, semaphores, resources, memory partitions, timers, and Interrupt Service Routines (ISRs). These components are further subdivided into three groups that

“The Real-Time eXecutive kernel supports three kinds of priority-based task scheduling” are used for intertask communication, synchronization, and resource management. Moreover, component functionality is accessed via the standard API interface. Porting Activities Overview Porting an RTOS to a new board requires four activities: • Determining the system’s architecture. • Figuring out what files to change based on the architecture. • Making changes to the files; this includes writing the code. • Creating test code and exercising the board to ensure that the RTOS is working properly. The first activity is design related, while the others are implementation related. Moreover, the last three activities require an understanding of the new hardware — knowing the specifics of what needs to happen to make the RTOS interact with the board. System Architecture The purpose of determining the system architecture requirements is to identify the 65

hardware and software components that need modifying to get the RTOS up and running on the NPE-167 board. For most porting projects, hardware components include I/O, memory, timers, and other unique peripherals. For this project, these components are no different. We had the I/O ports that control the LEDs, CAN bus, serial communication, memory selection, and card-slot selection. Memory had both Flash and SRAM. Memory is selected through the I/O component using the SPI bus. Therefore, I/O and memory selection are interrelated. For this project, we also had to identify the timer to run RTXC’s real-time clock. The real-time clock is the master timer used for all RTOSbased time-keeping functions. Additional-

Figure 1: NPE-167 SBC. RTXC Task definitions - RTXC demo

ly, for this project, we were not going to use any other on-chip peripherals. The best way to identify hardware components is to study the board’s schematics. Examining the NPE-167 board revealed that the I/O ports would be key for this project. Why? Because this board used the processor’s general-purpose ports to handle switches to control CAN bus operation, the board’s operating mode, control LED outputs, and memory selection. I/O cards were controlled via the SPI bus, rather than I/O ports. Ports can be configured as either inputs or outputs. Examination of the NPE-167 board showed that 17 ports are used. Eleven ports are used as switch inputs. From the schematic we saw that switches 1–7 were used to set the MAC address for the CAN device. CAN bus speed is controlled by switches 8 – 9, while the board operating mode is controlled by switches 11–12. Switch 10 is not used. Four ports control the LEDs. There are three in total. One LED is green, one red, and the third bicolor. Thus, four outputs are required to control the three LEDs. Finally, two output ports are used as page selection for extended memory. 06/14/05 23:43

Default task stack size is 512 Number of dynamic tasks is 0 # Name Pri Entry Stack Start Description FPU 1 RTXCBUG 1 rtxcbug 512 -1 RTXC System Debugger N 2 SIOIDRV 2 sioidrv 128 -1 serial input drvr N 3 SIOODRV 3 sioodrv 128 -1 serial output drvr N 4 EXAMPLE1 9 txtask 512 22 example task 1 N 5 EXAMPLE2 8 rxtask 512 21 example task 2 N Total number of RAM bytes required = 2480 Total number of ROM bytes required = 78

Figure 2: RTXC task definitions.

Changing Files We knew from the previous step that 11 ports were used for input and six ports for output. Because these were generalpurpose I/O ports, they needed to be initialized to work as either inputs or outputs. This gave us an idea of where NPE-specific initialization code needed to go — specifically, initialization code to set up these ports goes in the startup code. For this project, initialization code was located in the cstart.a66 file that is located in the Porting directory. Listing One is the code that configures the NPE-167 board I/O. Once configured, I/O can be used by higher level RTOS and API functions. Once we figured out where the I/O changes go, we needed to turn our attention to discovering and setting up the master timer. BSP set up the master timer for us because we were using default timer 6. Setup code for this timer is located in cstart.a66 and rtxcmain.c. Listing Two is a snippet of the RTXC-specific code. After analyzing the architecture requirements, we discovered that the only file to change for porting the NPE-167 board was cstart.a66. Granted, we knew we would have to change other files as well, but those files are application specific. Changing Files and Writing Code This brought us to the third step, which was straightforward because we knew what needed to be changed and where. Recall that all changes for basic porting

Figure 3: Test code debug session. 66

Referring to the schematic, we saw that the NPE board addresses up to 512K of memory before having to make use of the page-selection ports. Although we would configure the page-selection ports for the porting process, we didn’t have to use them because the total code footprint of the kernel, plus test code, is 107K. RTXC’s kernel is about 76K, and the porting test code fits within another 31K. In short, we would only use about 1/5 of the default memory to validate the porting process. The last necessary component for the port was to determine which timer to use as the master time base. Timers are internal on the C167 processor, so they don’t show up on the schematic. So we had two options — choose a timer and write the code for that timer, or use the BSP default timer. RTXC’s C167 BSP uses a timer in its configuration. A trick to simplify the initial porting process is to use the default timer that the BSP uses. Reviewing the BSP documentation, we discovered that it uses timer 6 for the master timer. Once we determined the components associated with the porting process, we could turn our attention to figuring out which files needed to be changed.

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

functionality occurred in cstart.a66. We also needed to write the code for initialization. We wrote code to initialize the switches to handle CAN — but no other code — to deal with it because it is not used in the basic port. For specifics, look at cstart.a66 and search for npe and rtxc labels to find code changes specific to this port. Keep in mind, when porting to new hardware you may want to adopt a similar strategy for partitioning the code for hardware- and RTOS-specific changes. That is because partitioning code through the use of labels helps with code maintainability. Test Code Finally, we needed to create some test code to test our port. Building the test code application was a two-step process: 1. We compiled the RTXC kernel into a library object (rtxc.lib). 2. We compiled the test code and link in rtxc.lib to create the executable. There are two directories for generating the test code, and they are stored at the same level in the hierarchy. Moreover, all files for creating rtxc.lib are located in the kernel directory. Alternatively, test code-specific files are located in the Porting directory.

The RTXCgen utility creates a set of files corresponding to each RTOS component. For instance, application queues are defined in three files: cqueue.c, cqueue.h, and cqueue.def. The same holds true for tasks, timers, semaphores, mailboxes, and the rest. Changes to the number of RTOS components are handled by this utility. For example, if we wanted to change the number of tasks used by the test code, we use RTXCgen to do it. Figure 2 shows the contents of the task definition file for the test code application. Test code files created by RTXCgen are placed in the Porting directory. Once RTXCgen has defined the system resources, we are ready to build the project. Creating the executable test code requires the build of two subprojects — the kernel and test code. We performed builds using the Keil Microvision IDE (http:// www.keil.com/). Keil uses project files (*.prj files) to store its build information. RTXC kernel creation consists of building the code using the librtxc.prj file located in the kernel directory. Evoking the librtxc project compiles, links, and creates a librtxc object in the kernel directory. Building the test code is accomplished using the NpeEg.prj file stored in the Porting directory. Invoking the NpeEg project compiles and links files in the Porting di-

Listing One

MOV OR

; NPE specific code in cstart.a66 ; ; NPE-167 Special Function Register Addresses P2 DEFR 0FFC0H DP2 DEFR 0FFC2H ODP2 DEFR 0F1C2H P7 DEFR 0FFD0H DP7 DEFR 0FFD2H ODP7 DEFR 0F1D2H PICON DEFR 0F1C4H CC8IC DEFR 0FF88H EXICON DEFR 0F1C0H T6CON DEFR 0FF48H ;---------------------------------------------------------------; HARDWARE INITIALIZATION FOR NPE-167 A ; This code initializes the processor to work with the peripherials ; on the NPE-167 A. $IF (WATCHDOG = 1) SRVWDT ; SERVICE WATCHDOG $ENDIF BCLR T6CON.6 ; Shut off timer. ; Initialize Ports 2,3,7 and 8 as standard TTL levels. MOV R5,0 MOV PICON,R5 ; ; Initialize Port 2. ; Set Output = P2.0: IO Bus Reset ; = P2.1, P2.2: System LED ; = P2.3, P2.4: CAN LED ; = P2.6, p2.7: Memory Page Select ; ; Set Input = P2.10, P2.11: CAN Speed Select (SW8, SW9) ; MOV R5,#001Fh ; Set outputs to off. MOV P2,R5 MOV R5,#0001h ; IO is open drain. MOV ODP2,R5 MOV R5,#00DFh ; Set output direction. MOV DP2,R5 ; ; Initialize Port 7. ; Set Output = P7.0 - P7.3: IO Bus Slot Select. ; MOV R5,#000Fh ; Set outputs to off. MOV P7,R5 MOV R5,#000Fh ; IO is open drain. MOV ODP7,R5 MOV R5,#000Fh ; Set output direction. MOV DP7,R5 ; ; Setup IO Interrupt (EX0IN). ; Disable external interrupts and set Interrupt level to 7, ; group level to 3, negative edge. ; MOV R5,#001Fh MOV CC8IC,R5

http://www.ddj.com

rectory, and links the librtxc object in the kernel directory. The resulting executable is then placed in the Porting directory as well. Once the test code was fully built, we were ready to test the board port. The test code is a simple application used to validate the porting process. Most of the test code is located in main.c located in the Porting directory. The application works by starting five tasks — two user and three system. User tasks execute alternatively, while system tasks execute in the background. One user task begins running. It then outputs data via one of the system tasks to the console. Next, it signals the other to wake up, and it puts itself to sleep, thus waiting for the other task to signal it to wake up again. Figure 3 shows the executing test code. Conclusion That’s pretty much it. Porting software to a new hardware board doesn’t need to be hard. With a firm plan (and following this simple process), porting just got a lot easier. Acknowledgments Thanks to Joe Tretter and Rick Gibbs at Northpole Engineering for their assistance. DDJ

R5,#0001h EXICON,R5

Listing Two ;======================================================================== ; ** Beginning of RTXC specific code ** $IF MICROVISION ;======================================================================== ; NULL STACK SIZE DEFINITION ; ; define the size of the stack for the null task. ; NOTE: Ensure you modify the 'C' level constant of the same name in ; the RTXCOPTS.H file. ;---------------------------------------------------------------NULLSTKSZ EQU 80H $ELSE $INCLUDE(rtxcopts.inc) ; include kernel assembly definitions $ENDIF ; ** END RTXC specific code ** ;========================================================================= ;========================================================================= ; ** Beginning of RTXC specific code ** ; This user stack is used only for startup and entry into main() USTSZ

EQU

40H

; set User Stack Size to 64 Bytes.

; ** END RTXC of RTXC specific code ** ;========================================================================= ;========================================================================= ; ** Beginning of RTXC specific code ** EXTRN nullstack:WORD EXTRN DPP3:DPP1_INITVALUE:WORD EXTRN DPP3:DPP2_INITVALUE:WORD ; Initialize the 'C' variables for task frame initialization MOV DPP1_INITVALUE, DPP1 MOV DPP2_INITVALUE, DPP2 $IF NOT TINY MOV R0, #POF nullstack ; restore user stack pointer MOV R5, #PAG nullstack MOV DPP2,R5 NOP BSET R0.0FH ; User stack uses DPP1 $ELSE MOV R0, #nullstack ; restore user stack pointer $ENDIF MOV ADD

R10, #NULLSTKSZ R0, R10

; get to top of stack

; ** END RTXC of RTXC specific code ** ;===============================================================

Dr. Dobb’s Journal, February 2006

DDJ 67

PROGRAMMING PARADIGMS

The Search for Search and Other Inquiries Michael Swaine

W

hen I studied artificial intelligence in graduate school, we were encouraged to think of any programming task as a search of a problem space. You just decide what problem it is that you’re trying to solve, figure out how to characterize any solution to that problem, settle on a searchable space that includes all such solutions, and voil`a! Your arbitrary programming task has been transformed into the seemingly routine job of writing an efficient search routine. Of course, this daunting business of inflicting on yourself a radical paradigm shift over to the search model isn’t really something you’d want to task your imagination with every day. Writing a spreadsheet application, for example, isn’t obviously helped much by viewing the task as a search through a space of programs looking for one that has a lot of little boxes with numbers and formulas in them. Or any other problem-space-search representation that I can think of. Still, it can be a powerful technique, and in fact, it was responsible for bringing about the first successes in artificial intelligence — the early puzzle-solving and game-playing programs. So say Avron Barr and Edward Feigenbaum in The Handbook of Artificial Intelligence (William Kaufmann, 1981; ISBN 0865760047), and they should know. An entirely different idea regarding the desirable ubiquity of search is the notion that every viable 21st century software business model can and should be built around search. A corollary of this rather bold theorem is the idea that Microsoft wants to or needs to become Google, a notion that one would be tempted to discount as a gross oversimplification if it weren’t for the fact that the person seemingly most responsible Michael is editor-at-large for DDJ. He can be contacted at [email protected]. 68

for putting the thought into the technological public’s imagination is Bill Gates. And he should know. What we all know by now is that Microsoft really is searching for a new business model. Maybe they’ll find it in one of the many search-related projects underway in Microsoft labs. Maybe they’ll adopt Google’s when Google gets done with it. What I can tell you is that you’ll find here some random observations on search and a brief look at another of those fat books with ambitious titles for which I have an odd fondness. On my Fat Book Shelf right next to Stephen Wolfram’s A New Kind of Science (1197 pages) and Stephen Jay Gould’s The Structure of Evolutionary Theory (1433 pages) stands Roger Penrose’s The Road to Reality: A Complete Guide to the Laws of the Universe (1099 pages). I’ve wanted to have someone explain to me the laws of the universe, and because Penrose won a Wolf Prize with Stephen Hawking “for their joint contribution to our understanding of the universe,” he should know. Search Is Hardly Hard to Find As I write this, search is much in the news (with Google indexing blogs, news is much in the search, too). Google itself is much in the news. Stock price? You could drive across the United States in an SUV at summer 2005 gas prices for less than the price of a single share of Google stock. CNet writer Declan McCullough recently wrote a piece canonizing the old search protocol Gopher and its Veronica server; while in The New York Times, John Battelle was raving about the Web 2.0 conference, much of which was about search. Meanwhile, there’s a campaign to save the search mascot, Jeeves, and we read that “Bill Gates Visits the Holy Land and Talks Search.” Search is everywhere. Dr. Dobb’s Journal, February 2006

News aggregators? Ping servers? Mapping, GPS, people search, social bookmarking, tagging, communities of interest. Just more ways to search. Some communities of interest, particularly ones that lead to people making contact off the Internet, make some people nervous. And what interests are we talking about? Suicide, for example? Should search engines build technologies to push people who are searching for suicide information toward help? Or is it always a bad idea to subvert the proper working of a search engine? And before you answer, have you ever engaged in Googlebombing? Which raises the rather na¨ive but important question of whether you can trust search. Clearly you can’t, so what can be done about this? Dogpile is a search aggregator that could suggest the way: Don’t trust one engine, but apply some metric of trust over all of them to route around bias and error and Googlebombing. If All You Have Is a Card Catalog, Everything Looks Like a Lookup If the Internet is the center of your work, then search is the main task you have to perform. There was a lot of second guessing in the press about the meaning of Google and Sun agreeing to cooperate on technologies. Mostly this was speculation about how they would target Microsoft. Would the two companies use OpenOffice.org to challenge Microsoft in the Office suite software arena? Or is that yesterday’s platform, and would they push harder on the idea of the Internet as the center of the computer user’s world? I don’t think anyone knows exactly what Sun and Google will accomplish together, but some scenarios seem more likely than others. Here’s a question that I think brings some of the speculation into focus: Which is more likely, that Google http://www.ddj.com

copies Microsoft, or that Microsoft copies Google? It’s fun to ask questions like that, but it’s not much fun to watch judges grapple with tricky technological issues. When they have trouble deciding whether “Intelligent Design” is religion or science, I worry about their ability to determine the senses in which BitTorrent resembles Grokster. “Torrent files don’t contain any data,” a defender argued. “This is a search engine scenario. Why aren’t Google, Yahoo, or Microsoft getting sued?” In fact, Google has been threatened over its Google Print technology, despite the fact that it has gone out of its way to avoid copying copyrighted material. As I understand it, Google Print, which allows searching inside books, indexes those books rather than caching their content. This means that the book is nowhere copied, and that it would be extremely difficult to reassemble the book from the index. But Google does cache content, of course — the content of web pages. The truth is, this routine caching of web pages is much more clearly a case of copying than anything Google is doing with books. When I Google a topic and find several news stories on the topic, and click on a link and find that it has expired and the news service has removed the story or pulled it behind a subscriber wall, I just click back to the search results and go to the cached version. This clearly undermines the intentions of the news service. Is it illegal? Should it be? Making caching illegal could cause serious damage to the way search engines work and the way the Internet works. But this underscores one of the ways in which the Internet, working as it was intended, calls copyright and other intellectual property laws into question. Is Advertising Search? Search Me Personally, I think that a lot of problems are better described as challenges in visualization rather than search tasks. Sometimes you know where the information is and you just want to make some sense of it. Yes, you could define that as some sort of search. And I do suspect that when I’m staring at a spreadsheet I may be getting bogged down in data rather than viewing a solution or insight. But I think that sometimes we want to be intimately involved in the search process, and that converts the process into something other than pure search. Advertising is certainly search. Search with a hook, which is to say fishing, and much of it is of a type of fishing known as “chumming.” Throw the bait out on the water and hope that some big fish comes along and you’ll be able to snag it. http://www.ddj.com

The opposite of this is targeted advertising, which at first blush seems like a powerful idea that can improve the efficiency of advertising by orders of magnitude. The idea is not new, but technology today makes much greater targeting possible. To a scary degree. But on second glance (and after watching Glengarry Glen Ross again), it seems to me that it’s

“Whatever its virtues, The Road to Reality is not for the faint of heart” not so simple. Selling is about converting a nonprospect into a prospect and into a customer. There is an inherent problem in defining the search space, due to the unwillingness of the salesperson to refine — and thereby reduce — it. Or at least there are conflicting desires. So maybe the picture of advertising as search is not so clear. The Search for the Meaning of It All I finally reached the end of the road; that is, the last page of Roger Penrose’s The Road to Reality (Alfred A. Knopf, 2004; ISBN 0679454438). It was a tough slog. The math made my head hurt, and I like math. I was going to dedicate this whole column to a review of it, on the principle that if I have to wade through 1100 pages of complex manifolds and holomorphic functions, you should be forced to suffer proportionately. But the truth is, I don’t understand this book a whole column’s worth. According to its jacket, this book is addressed to the serious lay reader. Yeah, right. The audience is a little more rarefied than that: Nobody who hasn’t done graduate work in mathematics is going to get much out of this book. Not only is it richly endowed with dense footnotes, but most of the footnotes have homework problems in them. Like footnote 27.16: Give a general argument to show why a connected (3-)space cannot be isotropic about two distinct points without being homogeneous.

And make it snappy, serious lay reader. Whatever its virtues, this book is not for the faint of heart. Reading it, or anyway wading through it, was for me a humbling experience. Not only did I have it rubbed in my face how much math I’ve forgotten and how little I knew to begin with, but Dr. Dobb’s Journal, February 2006

I’ve in the past been critical of some things Penrose has written in his more popular and accessible writing, but here I wouldn’t dream of critiquing him. Just over my head. But I did get something out of the book and I do think that some DDJ readers might find this book interesting and I’m not sorry that I made the effort. Penrose is a brilliant, important thinker, a collaborator with Stephen Hawking, and he’s not kidding about the title. This book is a whirlwind tour through all the important questions in modern physics and all the math needed to truly understand the questions. I’ve written here about some unorthodox approaches (Wolfram, Fredkin) to understanding the laws of the universe, approaches that make those laws look like computer programs. Penrose has a different view of these things, but his approach is also challenging to orthodoxy. Although orthodoxy is probably the wrong term when any theory that fits the empirical data that quantum physics works with has to be flat-out crazy. What makes Penrose et al. germane to this admittedly wideranging column is that the information is central in all their theories. Information seems to be at the heart of everything; for example, in a brief moment of accessibility to that serious lay reader, Penrose exposes the common misconception that we depend on the energy from the sun for our survival. Nope: Entropy, not energy, is the key. We consume the sun’s information. It took me a while to figure out what Penrose was doing in this book. This, I think, is what he’s up to: He wants you to be able to visualize mathematical structures. Spaces, fields, bundle spaces. He provides an enormous number of illustrations, mostly looking like something left in the oven too long: surfaces or solids of odd shapes curled back on themselves. It’s a truism that you can’t really visualize 4D space, but you can use visualizations to gain insight into four dimensions, just as you can’t fully represent 3-space objects on 2D paper, but we manage to model them usefully via projections of various kinds. Penrose pushes this as far as he can. Even if you don’t understand all of his helpful diagrams, it’s impressive to realize that he works so hard to find a way to visualize every one of these extremely abstract concepts. In the later chapters, we see what all this visualization work was for, as he introduces concepts in physics that he — as a mathematician — sees mathematically. Now the foundation work in the early math chapters helps you get an idea how he visualizes the bizarre quantum properties of the universe. If you really work to understand the visualizations in the early chapters, you’ll be able to visualize the tough stuff in the later, physics, chapters. 69

Like: Ah, this is one of those things where the pie crust has lots of little fishhooks coming out of its upper surface. I realize that I’m not giving much of a sense of what Penrose covers in the book. Okay, for example, he critiques ontologies for quantum theory, including the Copenhagen interpretation, the manyworlds view, environmental decoherence, consistent histories, and pilot wave approach, and presents his own unorthodox view. He covers the Special and General Theories of Relativity, Quantum Theory and quantum phenomena, and candidates for a Theory of Everything. And like that. Findings When he gets to the Theories of Everything, he provides much material for

IDers to get excited about. Like his picture of an anthropomorphic Creator performing an extremely low probability act to place the universe in the immensely low entropy — thus special — Big Bang state. But it’s a metaphor, just as he’s being metaphorical when he presents Lee Smolin’s notion of multiple universes spawned by multiple universes, with a kind of intergalactic natural selection leading to the evolution of better fitted universes. Darwin on the largest scale. Finally, in chapters 30 and 33, Penrose presents his own approach, which I won’t attempt to summarize except to say that he takes on all the giants of quantum theory. His theory is unorthodox, and he acknowledges that it’s also short on testable

predictions. Its virtues are at present chiefly aesthetic. However, his theory is not alone in being hard to test. Penrose cautions against being seduced by mathematical beauty when the kind of theories you’re dealing with are Big Science, where empirical refutability can be hard to come by. As for me, I’m all for beauty and simplicity and the parsimonious explanation. I’m starting to lean toward that model that says that the universe can be most simply described using 11 dimensions. But maybe just because I want to refer to it as “Occam’s Eleven.” Sorry. DDJ

Dr. Ecco Solution Solution to “Fractal Biology,” DDJ January, 2006. 1. With eight nodes, you need only 16 links; see Figure 1. 2. With 12 nodes, you could design three sets of four nodes that are completely connected (requiring 3×6=18 nodes), then add-in four links between every pair of four, leading to an additional 12 links for a total of altogether.

cept the complete graph among all the switchboard nodes (for example, every switchboard node is connected to every other one). Note that XY' has the same connections as XY (both to the base nodes and to the hub nodes). Why does this work? Any two base nodes, say B and D, are connected through two switchboard nodes (in this case, BD and BD'). Two base nodes are connected through six switchboard nodes if the base nodes are in the same letter group. Any base node is connected to any switchboard node either directly or by connecting to any of the six switchboard nodes that the base node is directly connected to, plus a direct link to the other switchboard node.

3. If there is no limit on the number of links per protein node, then try Figure 2. You would need only 21 links. We call this the “two-fan design,” because each hub creates a fan. You need two fans in case one hub is wounded. 4. Divide the nodes into 96 nodes (the base nodes) that will have six links each and 12 nodes (the switchboard nodes) that will have 59 links each. This results in a total of 642 links. Number the 96 base nodes from 1 to 96. Call 1 through 24 the A nodes, 25 through 48 the B nodes, 49 through 72 the C nodes, and 73 through 96 the D nodes. The 12 switchboard nodes are divided into two groups of six. The first group is called AB, AC, AD, BC, BD, and CD. The second group is called AB', AC', AD', BC', BD', and CD'. Figure 3 shows all the nodes and all the interconnections ex-

A:1–24

DDJ

B:25–48

D:73–96

C:49–72

Figure 1.

AB

AB'

Figure 2.

70

AC

AC'

AD

AD'

BC

BC'

BD

BD'

CD

CD'

Figure 3.

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

EMBEDDED SPACE

Memory Matters Ed Nisley

The Difficult is that which can be done immediately; the Impossible that which takes a little longer. — George Santayana

A

pplications programmers regard computer memory as an essentially endless line of identical bytes awaiting their data structures. Systems programmers take a more nuanced view, reserving distinct regions for the operating system, application code and data, and perhaps a few memory-mapped peripherals. Embedded systems folks, alas, must contend with downright weird configurations. The intimate relationship between code and hardware occurs well below the usual levels of abstraction, where physics sets the speed limits, and manufacturing costs define what’s available. Using a fullfeatured operating system just adds another layer to that complexity. Two threads from the 2005 Linux Symposium lead back into memory matters. I’ll start with the good news, then proceed to something confirming the growing awareness that system security is really hard. NAND Versus NOR Versus RAM About a year ago, I observed that the serial nature of NAND Flash memory precluded running code directly from it. As with all things technological, where there’s economic pressure, there’s a way to make the seemingly impossible happen. It’s just a matter of trade-offs. The Consumer Electronics Linux Forum BoFS at the Linux Symposium included a brief discussion of a Samsung NAND Flash research project that allows XIP (Executein-Place) access, so you can run program code directly from the chip. The paper outlining the technique illustrates just how weird embedded system memory can be. However, first you must understand why Ed’s an EE, PE, and author in Poughkeepsie, NY. Contact him at ed.nisley@ieee .org with “Dr Dobbs” in the subject to avoid spam filters. http://www.ddj.com

everything you think you know about memory is wrong. On the small end of the scale, a product’s manufacturing cost can make or break it in the marketplace. That cost, in turn, depends on how many chips must be soldered to the board because the assembly cost can exceed the chip cost. Singlechip solutions reduce both board area and chip count, and may therefore reduce the overall cost even if they’re more expensive than the components they replace. Any tiny gizmo that handles music or video uses NAND Flash memory, which puts vast, cheap bulk storage behind a disk-like serial interface (ignoring, of course, those gizmos with miniature hard drives). That means a minimum of two chips: NAND Flash and a single-chip microcontroller with a CPU and on-chip program and data storage. That’s true for very small systems, but anything big enough for a real operating system requires a few megabytes of storage. Even with today’s technology, that means four chips: NAND Flash, NOR Flash for the program, RAM, and the microprocessor. Plus, of course, whatever analog widgetry may be required to turn it into a phone-camera-PDA-web-pod. In round numbers, a megabyte of NOR Flash costs five times more than NAND Flash and uses three times more power, so there’s a mighty force aligned against that fourth chip. Storing less code, compressing it, and using other tricks may allow a smaller NOR Flash chip, but you want to eliminate that thing entirely. Some of Samsung’s current NAND Flash parts use their internal buffer RAM as a tiny XIP random-access memory that’s automatically loaded with the contents of a specific NAND page before the CPU boots up. It’s small because NAND Flash chips have only a dozen or so address lines that normally select a block within the chip, so there just isn’t much address space available. That XIP code copies the bulk of the program from NAND Flash to external Dr. Dobb’s Journal, February 2006

RAM, then jumps into the main routine. That’s a comfortably familiar process occurring in hundreds of millions of larger systems (albeit using disk drives instead of NAND Flash), but in the embedded world it has a severe downside: The system must store two copies of the program code. Again in round numbers, RAM costs at least 10 times more than NAND Flash and dissipates over five times more power. Program code in RAM tends to be essentially read-only after it’s loaded, so you pay top dollar for a huge expanse of RAM that’s used as ROM. Worst, the copy in NAND Flash is completely unused after booting. The folks in charge of money don’t like hearing that, of course. NAND XIP The entire recent history of CPU development revolves around the simple fact that memory-access time is much, much slower than CPU instruction cycle time. In fact, high-end CPUs now have three levels of cache in front of the main memory, each dedicated to anticipating the CPU’s next request. Lower performance CPUs don’t have quite the same bandwidth pressure (that’s why they’re lower performance, after all) and small embedded systems tend to get by with relatively poky CPUs. The Samsung project combines the notion of NAND-as-disk with a liberal dash of Moore’s Law to come up with a memory subsystem that the CPU sees as a reasonably high-speed, random-access ROM. They implemented a prototype with an FPGA and some static RAM surrounding a NAND Flash chip, but reducing all that to a single chip is just a matter of time and, perhaps, economics. The NAND Flash chip is a read-only backing store for the much smaller SRAM cache, with the FPGA implementing the cache-control algorithms. The CPU sees the subsystem as a standard, randomaccess, read-only memory rather than a 71

serial-access NAND Flash chip. Reads from memory addresses currently in the cache proceed at SRAM speeds, while cache misses stall the system until the FPGA fetches the corresponding block from the NAND Flash. The overall performance of any cached memory depends critically on the cache hit ratio: If it’s below the mid to upper 90 percent range, you’re sunk. A crude estimate says that when a cache hit costs 10 nanoseconds, and a miss costs 35 microseconds, a 99 percent hit ratio makes the average access time 360 nanoseconds. Ouch! Unlike most cached systems, embedded applications tend to have fairly straightforward execution paths and a very limited number of programs. The Samsung designers analyzed a program trace to prioritize each address block by the number and frequency of accesses, then stored those priorities in the spare data area associated with each NAND Flash block. The FPGA controller can then decide which blocks are most likely to be required next, based on actual knowledge of the program’s execution, and fetch new blocks into the lowest priority SRAM cache lines. Their results for an MP3-player program show roughly 100 ns average access times for a 256-KB SRAM cache. It’s not clear whether media data is also streaming through the cache, which could either increase or decrease the hit ratio depending on how the caching algorithm handles relentlessly sequential accesses. In any event, the net result is randomaccess memory that’s somewhat faster than NOR Flash and somewhat slower than SDRAM. The overall energy cost, measured in nanosecond-milliwatts, is roughly half that of NOR Flash, which may be the single most important parameter for mobile applications. However, the risk of protracted stalls on cache misses requires careful system design to ensure uninterrupted execution of those time-critical music decoding routines. That’s the sort of memory trade- off embedded-systems designers and programmers must put up with all the time. Beyond the usual requirement for correct functions, even the code’s location can scuttle a project. Building a working system sometimes seems impossible, but the success of hand-held gizmos shows that it’s merely difficult. Stack Smashing Back in the land of infinite virtual memory, even applications programmers could benefit from a little memory differentiation, as a good idea can go badly wrong with just the right design decisions. Here’s a horror story from the world of big memory that extends down into the embedded world. 72

Back when Intel introduced the 8080 microprocessor, solid-state memory was still breathtakingly expensive. The 8080, implemented with a countably finite number of transistors, had an 8-bit ALU and 16-bit addresses. Filling that 64-KB address was more than most folks, myself included, could afford. The 8086 microprocessor had a 16-bit ALU, but was more-or-less compatible with the 8080 at the assembly language level. In order to access more than 64 KB of memory, Intel introduced segment registers, which basically provided the upper 16 bits of a 20-bit address. Programmers became intimately familiar with the CPU’s CS, DS, SS, and ES registers because large programs sprawled their code, data, and stack storage into different 64-KB segments. Those segments were different in name only, as the hardware didn’t enforce any access semantics. You could reach any type of data using any segment register with complete impunity. Needless to say, that led to some truly baffling bugs. The 32-bit 80386 (the less said about the 80286 the better) enhanced the notion of segments to provide memory protection, while grafting paged virtual memory onto the side. You didn’t have to use the VM paging hardware, but memory segmentation was mandatory. Segment registers became pointers into tables of segment descriptors and, with the hardware now enforcing access semantics, a once- quirky architecture abruptly grew teeth. Segments contained only code or only data, selected by a single descriptor bit. Code segments could be execute-only or execute-read, while data segments could be read-only or readwrite. Stack segments became a specialized data segment, with the ability to grow downward rather than upward. Once upon a time, I actually wrote a bare-metal protected-mode program with full-throttle segmentation and can state from personal knowledge that figuring out the segmentation was somewhere beyond difficult. While my demo system worked, it became obvious that scaling it up wasn’t in the cards. I wasn’t alone, as most OS designers opted for “flat model” segmentation in x86 systems. Although the hardware enforces the segment rules, there is nothing preventing you from defining all the segments to refer to the same chunk of memory. That turned addresses into 32-bit offsets from the common segment base, rather than unique values tied to a specific segment. The fact that you could only write data in a Data segment, pop registers from a Stack segment, and execute code in the Code segment became completely irrelevant. If you could manage to write arbitrary data into the stack segment, you Dr. Dobb’s Journal, February 2006

could easily run it in the code segment without the hardware ever noticing. And that, party people, explains why Windows is so vulnerable to stacksmashing attacks. As it turns out, the Linux kernel has the same exposure. Windows just makes it easier for Other People’s Code to gain access to the stack in the first place. No Execute, No Cry? The textbook heap and stack implementation puts the two at opposite ends of a common storage block with the heap growing up from the lowest address, and the stack growing down from the highest. All is well, so long as the two never meet. The C language, lacking any inherent array index checking, makes buffer overruns trivially simple: Feeding a long string into a strcpy( ) function expecting, say, a username will do the trick. A sufficiently long string not only overflows the target buffer, but can extend all the way up into the stack storage area. In fact, if the string is stored on the stack (where automatic variables within C functions live), you don’t even need the heap. Strings in C, being just linear arrays of bytes, can contain nearly anything except the binary zero terminator that makes this attack possible. Attackers can therefore write both a small program and the register contents that pass control to it into the stack, ready for action when the abused strcpy( ) function executes a RET instruction. The details of this process are tedious and depend on exactly what’s going on in the attacked program and the OS. However, stack-smashing attack generation can be automated and, should the attacker get it wrong, the attacked program crashes and wipes out the evidence. If the attack happens in the kernel stack, it can take down the entire system. Various Linux kernel patches have made stack-smashing attacks far more difficult, but its flat-memory layout means they can’t be completely eliminated. AMD, with Intel tagging on behind, has added an NX (No-Execute) bit to the virtual page description in x86-64 mode, which obviously applies only to 64-bit programs, that does solve the problem. All this assumes that nobody in their right mind would want to execute code from the stack. That turns out to be not quite correct, as it’s often convenient to build trampolines on the stack, a subject quite outside the scope of this column. In any event, turning off the ability to run code from the stack can break innocent programs doing entirely reasonable things, so changing the OS underneath existing code may require recompiling some applications. http://www.ddj.com

But the AMD NX bit should solve the problem for new code running in 64-bit mode, right? Nope, not quite. Frankencode Although 64-bit CPUs aren’t commonly found in current embedded systems, let alone hand-held devices, Moore’s Law tells us that it’s only a matter of time. Let’s suppose you’re building a must-be-secure system, using both a CPU and an OS that can prevent code execution from the stack. Does a no-execute stack render buffer overflow attacks harmless, other than perhaps trashing the stack and crashing the program? I found a paper by Sebastian Krahmer describing how a stack-smashing attack can execute arbitrary code, even on an x8664 CPU with a properly NX-protected stack. The technique involves stitching together chunks of code from the ordinary library routines that are linked into essentially every compiled program. Basically, an attacker can arrange the stack so that a RET instruction passes control to the last few instructions of a library function that pops the attacker’s data into registers. Synthesizing system calls with the proper parameters requires finding the proper function epilogs and creating the appropriate stack contents to fill the reg-

http://www.ddj.com

isters. This is, of course, subject to automation. The buffer overflow manipulates pure data on the (necessarily) writable stack, leaving code execution for already-existing functions in the code segment. The CPU’s protection mechanisms have no idea anything is amiss. The lesson to be drawn from all this resembles the lessons found in copy protection, digital-rights management, and Trusted Computing: The attackers are at least as smart as you are, they have better tools, and they will find a way around whatever technological measures you put in place. Declaring that hardware makes an attack impossible may be strictly correct, but finding an alternative vulnerability is merely difficult. If you’re building an embedded system that must be reliable and secure, getting the code working is just the first step. You must also control the environment around it, the access to it, and the data in it. Concentrating your attention on any one aspect, no matter how tempting, simply shifts the attacks to a weaker entry point. Happy memories! Reentry Checklist The Linux Symposium proceedings are at http://www.linuxsymposium.org/2005/

Dr. Dobb’s Journal, February 2006

and http://www.linuxsymposium.org/ proceedings.php. The Samsung paper on XIP NAND Flash is at http://www.iccd-conference .org/proceedings/2003/20250474.pdf. A summary of their existing parts, each sporting a tiny XIP boot block, is at http://www .samsung.com/Products/Semiconductor/ Flash/OneNAND_TM/. The Linux Kernel Mailing List discussion of Ingo Molnar’s NX bit patch is at http://kerneltrap.org/node/3240/. Krahmer’s explanation of the x86-64 NX exploit is at http://www.suse.de/~krahmer/no-nx.pdf, but the link for reference 3 should be http:// www.cs.rpi.edu/~hollingd/comporg/notes/ overflow/overflow.pdf. Find more on stacksmashing protection at http://www.research .ibm.com/trl/projects/security/ssp/. There is a description of GCC trampolines at http://www.delorie.com/gnu/docs/gcc/ gccint_136.html. Maybe you can’t throw out your Bartlett’s Familiar Quotations yet, but http://www .brainyquote.com/ is in the running to replace it. “Everything You Know Is Wrong” is a vintage Firesign Theatre album I haven’t heard in quite a while. Bob Marley’s “No, Woman, No Cry” is a true classic. DDJ

73

CHAOS MANOR

Beware of Sony’s DRM Jerry Pournelle

I

n April 2004, Sony covertly added a Digital Rights Management (DRM) scheme to its music CDs, and all Sony music CDs released after that date incorporated DRM software. This DRM system has no effect on Linux and Macintosh computers or standalone music players. However, if you attempt to play a Sony music CD on a Windows PC, you will be told that you must install a Sony music player. You are advised not to allow that installation. If you do install the music player, you will also install a rootkit. The term comes from UNIX, where the primary superuser is known as “root” and if you are the root, you can do anything. A rootkit is a particular kind of spyware that hides from detection by spoofing the operating system into believing no such spyware exists. The directory in which the rootkit files reside is hidden and you’ll never find it with any normal operating-system command. That’s what the Sony music CD system installs on your computer in the name of digital rights protection. The Sony rootkit is a serious invasion of your system, and is so successful at hiding that third-party spyware people can use it to hide their own malware, and at least one is reported to have done so. Moreover, savvy World of Warcraft online players used the Sony rootkit software to hide their cheat software. It gets worse. Not only does the Sony DRM rootkit hide, but if you detect it, you cannot safely remove it. Attempts to remove it have resulted in blue-screen crashes and the requirement to reformat the disk and reinstall the operating system and all applications. Naturally all unsaved data were lost — and this happened to experts. The Sony rootkit was discovered by former DDJ Contributing Editor Mark Russinovich at Sysinternals. His story (see http://www.sysinternals.com/blog/2005/10/ sony- rootkits-and- digital- rights.html) makes for fascinating, if horrifying, reading. Jerry is a science-fiction writer and senior contributing editor to BYTE.com. You can contact him at [email protected]. 74

But it gets worse. Not only is the Sony DRM rootkit impossible to uninstall, but it “phones home,” giving coded information to a server at Sony headquarters. As I write this, no one has any idea of what Sony plans to do with that information. The important point here is that this is stuff you don’t want on your computer, and you can’t detect it with any normal antispyware programs. It takes a rootkit detector to find it, as a horrified Mark Russinovich discovered during a test of rootkit detection software. The Sony DRM rootkit had been on his computer for some time, and he had never suspected a thing. If that can happen to the system internals guru, it can happen to you. Now the final horror. Under the DMCA, it is very likely a criminal act for you to remove the Sony rootkit from your system. Worse, it is likely a criminal act if I tell you how to remove the Sony DRM rootkit. And thus my advice: Don’t buy Sony music CDs, especially if there is any chance at all that they will be played on a Windows PC. Of course, by the time you read this Sony may have provided an uninstaller for its rootkit DRM system. Still, Sony’s actions do not indicate that the company understands the seriousness of this situation, and at this writing Sony has yet to offer an uninstaller I would trust. Stay tuned. But even if it is legal to remove it, it is dangerous to do so. In particular, booting in DOS and examining the directories to find the rootkit directory, then deleting that, will almost certainly crash your system. The Sony rootkit alters the registry to redirect certain function calls, and if the OS can’t find the instructions it has been redirected to, it can’t recover. Sony does supply a patch to your operating system that lets you see the rootkit directory. However, the procedure for getting a legal copy of this patch is tedious. Of course, no sane person wants application software that requires an operating system patch from a third party. When Microsoft sends you updates and patches, Microsoft knows that code is there. When you patch your OS with softDr. Dobb’s Journal, February 2006

ware supplied by Sony, how is Microsoft to deal with it? I do not advise you to install a Sony-supplied OS patch. In response, one reader wrote about a possibly useful program: I pass along a tool I found to help deal with the Sony disks. http://www.smart-projects .net/ offers a freeware tool to read CDs called ISO Buster that sees the disk layout and allows extraction of the WAV files. John

Note that this program may be illegal under the DMCA, and thus may not be available for long. I leave all conclusions in this matter as exercises for the reader. Between the public (and artist) outcry and a bunch of lawsuits, it didn’t take Sony long to start backing down: First saying that its XCP DRM scheme applied to 20 titles, then later admitting that it was actually 52, Sony decided to pull CDs from the shelf and give customers the opportunity to exchange for nonXCP versions. U3: The Next Generation of Thumb Drives The twice-yearly Demo Conference shows off new technologies and services — up to 70 in two packed days of six-minute demonstrations. Leading off the demo cavalcade at Demo/Fall 2005 was U3 (http:// www.u3.com/), makers of embedded technology for “USB smart drives.” And the demo was indeed smart: Plug the device into the demo Windows laptop, and the software it contained was available to run there — no installation, no footprint on the host PC. Unplug it (even surprise removals) and it all disappears. There were trialware apps and other bundle deals, which vary depending on the U3 partner. Smartly demonstrating the technology was ready, I was handed no less than four U3 - enabled USB keys from Verbatim, Kingston, SanDisk, and Memorex. U3’s trick took some doing; long-time IT veteran and U3 CEO Kate Purmal hinted at a long development cycle, mostly software to fool the OS into working the way they wanted. U3’s software (Windows now, Mac soon) is a real stack, not merely a single-point http://www.ddj.com

hack. We’re keenly interested in learning how U3 achieves the application redirection (and the other cleverness). U3 promises much of the info on integrating with its capabilities will be public, so developers can make intelligent use of it. There is a freely downloadable SDK at http://u3 .com/developers/downloads/default.aspx. At home, we ripped open the Verbatim 1-GB U3 drive packaging, and plugged it in. As we surmised upon seeing the demo, to Windows, the U3 looks like two devices in one — a CD and a removable disk drive. The CD part autoplays, runs the (not very big) U3 software stack, which in turn, opens an intro clip, and walks you through a demo. From there, you can simply use it as a standard thumb drive, or add your own software, and have it with you no matter what computer you might use, without installing anything on that computer. It does put a new icon in your system tray when a U3 device is installed, which you should use to eject instead of Windows’ usual icon. Astute readers will realize U3’s basic strategy is exactly the same method that Sony uses to install its “rootkit” software on your computer, though with far more positive intentions. It could probably silently leave other software behind on your computer, as well, though that’s not the intent, and we’ve never found any signs of that. Instead, it’s a pocket-sized Place For Your Stuff, externally indistinguishable from a standard thumb drive. The Verbatim 1-GB U3 drive we tested comes with McAfee antivirus, ready to run (again, without installation). We used it to check out a computer that had been running without virus protection, then ejected the Verbatim U3 thumb drive. Actually, as a test, we just yanked it loose without notice (a “surprise removal” in Windows parlance), which caused the U3 software to politely remind us we should use the U3 icon to do that in the future. Lecture complete, it then went completely away. I’ve done the same test with the Kingston U3 drive, with the same results. It Just Works. And that is the point of U3’s technology: You can have all your favorite software, ready to run, on any Windowsbased computer you might use, without installing anything or corrupting the host machine. At Demo/Fall, Kate claimed you can install Microsoft Office right onto a U3 drive. We haven’t tried that yet, but what an idea for those willing to rely on the kindness of others. Prediction: U3-enabled thumb drives are going to become indispensable for road warriors (run your presentation from any available computer!), IT corridor warhttp://www.ddj.com

riors (all your favorite fix-it tools, instantly available), and well, just about anyone else who wants Their Own Stuff no matter what computer they happen to be using. For now, the advanced features only work on Windows, though U3 devices work like any other thumb drive on Macs (without the advanced features, at least for now). We’re told U3 will come to Mac, but that’s Real Soon Now. Seagate External Drives This is Chaos Manor, and our methods are sometimes, well, chaotic. Sometimes things are just so useful, and so ubiquitous here, that we forget to list them. That almost happened with the Seagate USB drives. These come in many sizes and flavors, and everyone loves them, and because we all use them and they Just Work, we nearly forgot them. They’re great gifts, and best of all, just about everyone can use another external storage drive, even if they already have one or two. The most popular Seagate external drives here are the 5-GB “Cookie,” which fits in a shirt pocket and goes with you anywhere and draws its power from the USB connection — it works wonderfully with Lisabetta— my TabletPC. The 100-GB “book” that is small enough to fit in a briefcase, and uses two USB connectors, one for data

Dr. Dobb’s Journal, February 2006

and one for power; and the 400 GB, which has its own wall brick power supply. One way to use the 100 is to have a powered USB hub expander (Belkin makes some good ones, and those are what I carry) so you’re not draining your laptop. Whatever size you get, you can be sure a Seagate USB external drive is welcome, and they are recommended. Winding Down The computer book of the month is Joli Ballew and Jeff Dunteman’s second edition of Degunking Windows (Paraglyph Press), which is better than the first edition. You will certainly profit from the chapters on registry cleaning and the recommended tools for doing that. There’s sound advice in every chapter, and I can pretty well guarantee you’ll learn a few you didn’t know. Recommended. The second computer book of the month is also from Paraglyph: Jesse M. Torres and Peter Sideris, Surviving PC Disasters, Mishaps, and Blunders. Most of it is just common sense, but if you’ve just had a disaster, common sense is the one thing you won’t have: Just having a book that shows someone else has thought through the situation can help. DDJ

75

PROGRAMMER’S BOOKSHELF

Inside C# and .NET

Core C# and .NET Stephen C. Perry Prentice Hall PTR, 2005 1008 pp., $49.99 ISBN 0131472275

Peter N. Roth

T

he target audience for Stephen Perry’s Core C# and .NET is the “experienced programmer.” I fit that profile, and based on this book would guess that at least three years of professional programming is a reasonable minimum to qualify as such. Part 1 of Core C# and .NET includes an introduction to .NET and C#, C# fundamentals, class design, and working with objects. If you’re new to C#, you’ll want another text to cover the language in more detail. (I recommend Peter Sestoft and Henrik Hansen’s C# Precisely.) In part 2, “Using .NET,” Perry includes chapters on text manipulation and file I/O, Windows forms programming and controls, graphics design, fonts/text/printing, XML, ADO.NET, and data binding. The file I/O chapter, in particular, goes a long way to answering questions that show up in newsgroups. And while the book covers Version 2.0 of C# and .NET, the buzz on Version 3.0 has already started, so you can expect that some of the ADO.NET stuff is “transitional” (but then, what isn’t?). The so-called “advanced” section (part 3) addresses the topics of threads, distribut-

ed apps, and refinement/security/deployment. While I claim to be an experienced programmer, I must confess that I have never written a threaded or distributed app, so this material was new to me. Part 4 deals with programming for the Internet, and includes chapters on ASP.NET web forms and controls, the ASP.NET application environment, and XML web services. Finally, two appendices display the differences between .NET 1.0 and 2.0, as well as the Events and DataGridView control. Thus, the text is broadly comprehensive. At the same time, you can do only so much in 1000 pages, so the depth is limited accordingly. Topics average about 56 pages each, which is still a solid chunk for each area addressed. Code examples are downloadable rather than on a CD, which seems to be the current trend in computer books. A bonus download is the Quick Reference — print it, fold it, and stick it into a niche on your desktop (if you have any room left). Admittedly, .NET and C# are moving targets, and I admire authors who take a shot at them. To that end, Perry does

1. Design Patterns: Elements of Reusable

2.

3.

4.

5.

Object-Oriented Software Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides HTML for the World Wide Web with XHTML and CSS: Visual QuickStart Guide, Fifth Edition Elizabeth Castro Amazon.com UML Distilled: A Brief Guide to the Recent Top 5 Standard Object Modeling Language, Third Edition Programming Books Martin Fowler Head First Servlets and JSP: Passing the Sun Certified Web Component Developer Exam Bryan Basham, Kathy Sierra, and Bert Bates Sun Certified Programmer & Developer for Java 2 Study Guide Kathy Sierra and Bert Bates

http://www.ddj.com

Dr. Dobb’s Journal, February 2006

a splendid job; though I have a couple of mild cavils: The presentation of conditional compilation should use the new idiom, which is the [Conditional(symbol)] attribute, rather than the older (and wellknown) #if (symbol)/#endif construct. And I found the idea of using a commandline compiler “to get started” a little unusual, given the number of free IDEs out there. Still, the text is clear of typos and misspellings. But somehow, the military outline for oral presentations (“tell them what you’re going to tell them, tell them, tell them what you told them”) has been carried over into texts. Hence, for each chapter, there is a beginning of chapter summary, a chapter, and an end of chapter summary. This insults our intelligence and wastes paper, because we can easily read the table of contents and scan the chapter to determine what’s coming. We can read the chapter to determine what it is; and we can review all the material to determine what it was. It also makes for a lighter book, or alternatively, the freedup pages provide more space to the author. In the blank, dark gray “separator” pages: They’re not quite dark enough to be easily visible. In this case, a black bleed strip down the edge of the first printed (nonblank!) page of a chapter would be preferred, and save yet another page. My rant notwithstanding (hey, I’m entitled, I’m an experienced programmer), in general, Core C# and .NET is an excellent production that meets its stated aim — to provide a foundation for programmers moving to .NET. DDJ Peter is the president of Engineering Objects International, producers of commercial and custom C++ components. Peter can be contacted at http://www .engineeringobjects.com/ and pete.roth@ verizon.net. 77

OF INTEREST

TransGaming Technologies has launched SwiftShader, a software-based rendering tool that enables the use of 3D graphics in games and applications, without the need for dedicated graphics hardware. SwiftShader supports both Pixel and Vertex Shaders. Direct3D 8- and Direct3D 9compatible APIs are available, and OpenGL-compatible APIs are also under development. Vertex Shader 1.1 and Pixel Shader 1.4 features are currently supported. The company claims that SwiftShader can perform over 50 times faster than Microsoft’s Direct3D Reference Rasterizer in tests with sample applications. SwiftShader is currently available for x86 CPUs with Intel’s SSE multimedia instruction set extensions. SwiftShader runs on Microsoft Windows 98 and higher, and on Linux through TransGaming’s Cedega portability technology. TransGaming Technologies Inc. 55 Byward Market, 2nd floor Ottawa, ON, Canada K1N 9C3 http://www.transgaming.com/ Aonix has released Version 8.2 of its ObjectAda for Windows’ development environment. In addition to supporting an Eclipse-based development environment, ObjectAda for Windows integrates current Microsoft platform improvements for compatibility with Microsoft Visual Studio .NET. ObjectAda for Windows includes a Java-call interface, enabling Java applications to be called from an Ada program. ObjectAda for Windows 8.2 includes the comprehensive Ada libraries needed for calling Windows Win32 and the Visual C++ .NET 2003 MFC interfaces from application source code written in Ada. In ObjectAda for Windows, these Ada-binding libraries are fully compatible with the Microsoft Visual Studio .NET 2003 tools and libraries. Aonix North America Inc. 5930 Cornerstone Court West, Suite 250 San Diego, CA 92121 858-457-2700 http://www.aonix.com/ http://www.ddj.com

Systinet has announced Version 6.0 of its Systinet Server family for creating Java and C++ web services. Systinet Server for Java 6.0 is a platform-independent tool for creating and deploying web services in Java and J2EE applications. Version 6.0 includes support for limiting the number of concurrent JMS threads, WS-RM (ReliableMessaging) in clusters, WS-Interoperability (WS-I) Basic Profile 1.1 and WS-I Attachments, publishing to any UDDI registry including V.3, and platform support for JBoss 4, Sun JDK 1.5 on Windows, Sun, and Solaris. Systinet Developer for Eclipse 6.0 is a free companion product for Systinet Server for Java. It is designed to make it easy to create, test, debug, deploy, and publish Java web services, all from within the Eclipse IDE. Systinet Server for C++ includes: interoperability support, WS-Interoperability (WS-I) Basic Profile 1.1 compliance, and easy publishing to any UDDI registry including v.3. Systinet One Van De Graaff Drive, 5th Floor Burlington, MA 01803 781-362-1300 http://www.systinet.com/ Smart Bear has released CodeCollaborator, a program that lets you perform peer code inspections via a web browser, Windows GUI client, or command-line utility. CodeCollaborator enforces peer code inspections by integrating with version control, issue tracking, reporting, and other external systems. The customizable workflow provides a variety of options based on current inspection processes. The workflow can range from formal rulesbased inspections, to casual “buddy passarounds.” CodeCollaborator’s reporting tool provides you with metrics, such as average defects per KLOC and defects per person-hour. It also provides a side-byside diff viewer with an IM-style audit trail. Smart Bear Inc. 12885 Research Boulevard, Suite 210 Austin, TX 78750 877-501-5651 http://www.codecollaborator.com/ QNX Software Systems has announced QNX Momentics MultiCore Edition, an integrated set of development tools and OS features for developing software that targets multicore processors. Among other multicore features, Momentics supports Asymmetric Multiprocessing (AMP) (for managing individual processors on a chip independently) and Symmetric Multiprocessing (SMP), Bound Multiprocessing (BMP) (which combines the resource management of SMP with the application control of AMP, letting you assign applications to specific processors). The package also Dr. Dobb’s Journal, February 2006

includes a system profiler and system builder tools to support the unique architectures of multicore processors. QNX Software Systems 175 Terence Matthews Crescent Ottawa, ON, Canada K2M 1W8 613-591-0931 http://qnxsoftware.com SDMetrics has released SDMetrics 2.0, a software-design quality measurement tool for UML. SDMetrics measures structural design properties such as coupling, size, and complexity of UML designs. SDMetrics also checks design rules to automatically detect incomplete or incorrect design, and to point out violations of style guidelines such as circular dependencies or naming conventions. UML-design measurement helps you identify opportunities for design refactoring, and supports effort estimation, project planning, and monitoring. SDMetrics features a suite of object-oriented-design quality metrics and rules covering all UML diagram types. You can add new design rules and metrics that are tailored to your local development process. SDMetrics works with all UML modeling tools and reverse engineering tools that support the UML model interchange format XMI. Version 2.0 includes support for UML 2.0 and XMI 2.0/2.1 standards, with an extended set of design metrics and rules for the new UML2 diagrams and model elements. SDMetrics In der Lache 17 67308 Zellertal-Harxheim, Germany + 49 6355 954 761 http://www.sdmetrics.com/ Canoo has released an upgrade of its free Eclipse Integration plug-in. The Canoo plug-in for UltraLightClient’s Rich Internet Application (RIA) development. It integrates UltraLightClient into the Eclipse IDE, thus letting you efficiently deliver pure Java-based RIAs. The plug-in is compatible with Eclipse 3.1. Canoo Engineering AG Kirschgartenstrasse 7 4051 Basel, Switzerland + 41 (61) 228 94 44 http://www.canoo.com/ DDJ Dr. Dobb’s Software Tools Newsletter What’s the fastest way of keeping up with new developer products and version updates? Dr. Dobb’s Software Tools e-mail newsletter, delivered once a month to your mailbox. This unique newsletter keeps you up-to-date on the latest in SDKs, libraries, components, compilers, and the like. To sign up now for this free service, go to http://www.ddj.com/maillists/.

79

SWAINE’S FLAMES

Harold Pinter Eats Shoots & Leaves Interior. Room, no people. A heavily lacquered bar near the back wall. On one end of this, a lightly lacquered bowl of bamboo shoots and mint leaves. Mike, a young man in his late 50s, enters upstage and rubs the bar with a bar rag. He stops. Silence. Harold, a playwright, enters downstage and looks around. Harold: Slow night. Mike: I scared away all the writers. Silence. Harold: These any good? Mike: I put rice vinegar on them. Slight pause. Harold: I’ll risk it. He eats shoots and leaves. Pause. Mike: I shouldn’t have done it. Harold: No, they’re good with the vinegar. Or anyway, they’d be worse without it. Mike: I said terrible things about some famous writers. Drove off my customers. Harold: Oh, I see. Small silence. Harold: Terrible things about — playwrights? Mike: Journalists. Judith Miller and Bob Woodward. Slight pause. Mike: But others, too. This place is normally full of journalists, but I chased them all away. Pause. Mike: It was its that sent me over the edge. Harold: Its what? Mike: Its. Just its. Journalists can’t spell “its.” Mainstream writers, bloggers, it’s all the same. They don’t know when it should have an apostrophe. Harold: You mean when its should have an apostrophe. Silence, except for sound of chewing. Harold: I know what you mean. I blame Aristophanes, myself. Pause. Harold: Aristophanes more or less invented punctuation, you know. Put raised dots between words to indicate the lengths of pauses. There’s this scene in The Frogs that, in the original Greek—Pause pause slight pause silence chewing bar-rubbing silence and pause. Harold: No original Greek then. Right-o. Anyway nowadays punctuation conveys a lot more. The pacing, yes, but tone and attitude. All the things that you lose when speech gets written down. That’s what people are groping for in online writing, after all. The cues that tell the reader how to hear what’s written. Mike: Like Smileys. I hate Smileys. And all that pseudotagging, like flame on, cleverness off, snark alert. Harold: But those are just a natural extension of what you might call verbal punctuation marks. Phrases that are only there to tell you how to hear the rest. Just kidding, all seriousness aside, nudge nudge wink wink. Mike: You know what’s the worst? Those writers who give direct instructions to the reader, like Pause or Silence. Silence, followed by long pause and then measured, thoughtful chewing. Mike: I feel bad about trashing mainstream journalists, though. The Internet has changed their rules on them. Harold moves downstage center. Lights down, single spot on Harold. Harold: They had it all worked out, you know. The print journalists, I mean. Broadcast, too. The MSM they call them now. Mainstream media. They had it to themselves. Nobody had their e-mails, you see. The readers didn’t cross the line. They stayed over there, the consumers. The media over here, the producers. You got a nice packaged product that way. Neat. All a mess now. No control. Control is the thing. With control you can be the newspaper of record and still get the scoop. You could. Now the bloggers get all the scoops, and the readers Google you and tell you how to spell its. Long pause. Lights up slowly. You know what you need to do? You need to get yourself some peanuts. He leaves.

Michael Swaine editor-at-large [email protected] Apologies to Harold Pinter, who was presented with the Nobel Prize for Literature on my birthday (how fair is that), and to Lynne Truss, author of Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation, but none to Judy and Bob.

80

Dr. Dobb’s Journal, February 2006

http://www.ddj.com