OCTOBER 2004
VOLUME III - ISSUE 10
TM
www.phparch.com
The Magazine For PHP Professionals
This copy is registered to: livia carboni jackson
[email protected]
Certification Central
TABLE OF CONTENTS
php|architect Departments
Features
9
I N D E X
5
TM
Row, Row, Row Your Boat ZIP on the Fly with the Streams API by Chung W. Leong
Editorial Home turf advantage!
17
Driving Multiple Databases Anywhere by Geoffrey Mondoux
6
What’s New! 25
Roll Your Own Template by Sérgio Machado
59
Security Corner 34
File Uploads By Chris Shiflett
Exposing Web Application Data Semantically Using RAP (RDF API for PHP) by Paul Cowles
62
Tips & Tricks By John W. Holmes
65
Integrating PHP and OpenOffice Using PHP to Dynamically Manipulate and Convert OO documents by Bård Farstad
exit(0); PHP and the Enterprise by Andi Gutmans and Marco Tabini
October 2004
40
●
PHP Architect
●
www.phparch.com
47
PHP-GTK and the Glade GUI Builder Building Client Applications with Style by Tony Leake
3
You’ll never know what we’ll come up with next EXCLUSIVE!
For existing subscribers
Subscribe to the print edition and get a copy of Lumen's LightBulb — a $499 value absolutely FREE †!
Upgrade to the Print edition and save!
In collaboration with:
Login to your account for more details.
† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada Name: ____________________________________________ Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Air Combo edition add-on (print + PDF edition)
$ 97.99 CAD $139.99 CAD $ 14.00 CAD
($69.99 US*) ($99.99 US*) ($10.00 US)
ZIP/Postal Code: ___________________________________ Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
E-mail address: ______________________________________ Phone Number: ____________________________________
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
Home turf
E D I T O R I A L
R A N T S
advantage!
A
s you may know, last month we held our first onland conference right here in our hometown of Toronto, Canada (I refer to it as an “on-land” conference simply because our actual first conference was php|cruise, which took place onboard a cruise ship). Someone stopped me on the way to lunch one day (note to prospective discussion-starters: never stop Tabini on his way to anything related to food) and told me that he bet I couldn’t wait for the end of the conference to find out whether people thought it was a success. Maybe it was the fact that I had not had breakfast and it was one o’clock in the afternoon, but I simply answered, in a sort-of offhand way, that I knew that the conference was a success and I didn’t need anyone to tell me so. The other person looked at me in a funny way—probably thinking that I was some sort of selfcentered egomaniac (which is probably not far from the truth—I am a small business owner, after all)—and walked away. The truth, however, is that I actually meant what I said. Once you’ve booked the speakers, reserved the meeting space, fought with the hotel over every little detail and made sure that nobody was going to be asked to sleep in the broom closet, the best you can do is to sit down and watch the event unfurl in front of your eyes—and you’ll know immediately whether you’ve done your job well: if you have, you’ll get some sleep. I am happy to report that I managed eight hours of sleep during every night of the conference (to be fair, I actually overslept one day, but the main advantage of holding a conference five minutes from your home is that you can be there by 8AM even if you wake up at 8:15). Naturally, making the conference happen was a team effort, and we couldn’t have done it, had we not had the best speakers around and a surprisingly (in a good way) attentive audience—the number of “defections” that I noticed were very low. In the aftereffects of the conference, I received lots of congratulatory e-mails not only from the attendees, but from the speakers as well (and from significant others, who were all-around happy for the existence of the huge shopping centre right next to the hotel). To all of you, thanks for making php|w such an enjoyable experience—and see you next year!
php|architect
TM
Volume III - Issue 10 October, 2004
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
Director of Marketing J. Scott Johnson
[email protected]
Account Executive Shelley Johnston
[email protected]
Authors Paul Cowles, Bård Farstad, John Holmes, Tony Leake, Chung W. Leong, Geoffrey Mondoux, Sérgio Machado, Chris Shiflett php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2004 Marco Tabini & Associates, Inc. — All Rights Reserved
October 2004
●
PHP Architect
●
www.phparch.com
NEW STUFF
Cgiapp.class.php 1.4 (Default) What is it?
N E W
S T U F F
What’s New!
“Cgiapp is a PHP framework for creating reusable web applications. It is a port of the perl module CGI::Application, with a few minor additions. It uses Smarty as its default template engine. It has been tested with both PHP4 and PHP5.” To view the online documentation or to download, check out the project’s homepage at: weierophinney.net/matthew/download? mode=view_download&id=11
PHP 5.0.2 released! “The PHP Development Team is proud to announce the immediate release of PHP 5.0.2 (http://www.php.net/downloads.php#v5). This is a maintenance release that in addition to many non-critical bug fixes, addresses a problem with GPC input processing. All Users of PHP 5 are encouraged to upgrade to this release as soon as possible.” Some changes include: • Added interface_exists() and made class_exists() only return true for real classes. • Implemented periodic PCRE compiled regexp cache cleanup, to avoid memory exhaustion. • Added new boolean (fourth) parameter to array_slice() that turns on the preservation of keys in the returned array. Get the latest release from php.net.
eZ publish 3.5 alpha (unstable) ez.no announces:
” eZ systems is proud to present this first alpha release of eZ publish 3.5. The new version has a lot of new features, but most noticeably, the administration frontend has been completely renewed. Of course, all the bugfixes from the 3.4 branch are incorporated in eZ publish 3.5.” Get more information from ez.no
Phase 1.0.5 default Freshmeat.net announces: ”Phase is a very small text editor written in PHP. It uses HTML for the interface, and is easily customized. It can access any directory that your platform allows. Phase was designed for localhost access in mind (on your PC running Apache with PHP), and thus it has no security built-in.”
phpMyFAQ 1.4.2 RC2 phpmyfaq.de announces the release of phpMyFAQ 1.4.2 RC 2. ” This version includes tons of bugfixes. Do not use this version in production systems, but test this version and report bugs!”
Grab the latest download from Phase’s soureforge homepage: http://www.vsy.cape.com/~jennings/phase.html
October 2004
●
PHP Architect
●
www.phparch.com
Get all the info at phpmyfaq.de
6
NEW STUFF
MaxDB™ 7.5.00.18 now available for Linux/AMD x86-64 Mysql.com announces: ”With the new version 7.5.00.18, MaxDB is shipping with 64-bit support for Linux/AMD platforms. MaxDB™ has a long history in supporting 64-bit platforms since 1995, [when] the database structures were adapted to 64-bit requirements to support the DEC OSF/1 platform. Subsequently, MaxDB was ported to the major 64-bit architectures. Since 1997/1998 IBM AIX, HP-UX, Sun Solaris, and FSC Reliant have been supported. With SAP DB 7.4, the next platform joined the club in 2001: Windows (NT) on the IA64 Itanium architecture. Together with the rapid adoption of Linux, Linux on IA64 was supported in 2002 with SAP DB 7.4.03. MaxDB for HP-UX/IA64 recently has been launched and there are further ongoing porting activities for MaxDB on Linux/IBM PowerPC/64, which should become available with the MaxDB 7.6 alpha-Version in late 2004. At the end of the current list of porting targets are MaxDB on Windows/AMD x86-64 and on Windows/Intel x86-64. Summing up, this shows that MaxDB development has always been aware of the 64-bit landscape and has extensive experience with the challenges of these architectures for nearly 10 years.” Get all the latest info from mysql.com.
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
apd 1.0.1 http://pecl.php.net/package-info.php?package=apd
APD is a full-featured profiler/debugger that is loaded as a zend_extension. It aims to be an analog of C’s gprof or Perl’s Devel::DProf. WinBinder 0.23.080 http://pecl.php.net/package-info.php?package=WinBinder
WinBinder is an extension that allows PHP programmers to build native Windows applications. It wraps a limited but important subset of the Windows API in a lightweight, easy-to-use library so that program creation is quick and straightforward. id3 0.2 http://pecl.php.net/package-info.php?package=id3
id3 enables to to retrieve and update information from ID3 tags in MP3 files. It supports version 1.0, 1.1 and 2.2+ (only reading text- and url-frames at the moment). zeroconf 0.1.2 http://pecl.php.net/package-info.php?package=zeroconf
Provides an interface for browsing and publishing network services via ZeroConf using Apple's Rendezvous/OpenTalk library. You can browse the network for specific services like database servers (PostgreSQL, Sybase, InterBase), Apple File Sharing, web services via Apache's mod_rendezvous, etc. and discover the IP address and port for each found service.
October 2004
●
PHP Architect
●
www.phparch.com
7
NEW STUFF
Check out some of the hottest new releases from PEAR. XML_Parser 1.2.1 http://pear.php.net/package/XML_Parser/
This is an XML parser based on PHP’s built-in xml extension. It supports two basic modes of operation: “func” and “event”. In “func” mode, it will look for a function named after each element (xmltag_ELEMENT for start tags and xmltag_ELEMENT_ for end tags), and in “event” mode it uses a set of generic callbacks. Since version 1.2.0, there’s a new XML_Parser_Simple class that makes parsing of most XML documents easier, by automatically providing a stack for the elements. Furthermore it’s now possible to split the parser from the handler object, so you do not have to extend XML_Parser anymore in order to parse a document with it. I18Nv2 0.8.0 http://pear.php.net/package/I18Nv2/
This package provides basic support to localize your application, such as locale based formatting of dates, numbers and currencies. In addition, it attempts to provide an OS independent way to setlocale() and aims to provide language and country names translated into many languages. LiveUser 0.13.1 http://pear.php.net/package/LiveUser/
LiveUser is a set of classes for dealing with user authentication and permission management. Basically, there are three main elements that make up this package: • The LiveUser class • The Auth containers • The Perm containers The LiveUser class takes care of the login process and can be configured to use a certain permission container and one or more different auth containers. That means that you can have your users’ data scattered amongst many data containers and have the LiveUser class try each defined container until the user is found. For example, you can have all website users who can apply for a new account online on the webserver’s local database. Also, you want to enable all your company’s employees to login to the site without the need to create new accounts for all of them. To achieve that, a second container can be defined to be used by the LiveUser class. You can also define a permission container of your choice that will manage the rights for each user. Depending on the container, you can implement any kind of permission schemes for your application while having one consistent API. Using different permission and auth containers, it’s easily possible to integrate newly written applications with older ones that have their own ways of storing permissions and user data. Just make a new container type and you’re ready to go! Currently available are containers using: PEAR::DB, PEAR::MDB, PEAR::MDB2, PEAR::XML_Tree and PEAR::Auth. HTTP_Request 1.2.3 http://pear.php.net/package/HTTP_Request/
Supports GET/POST/HEAD/TRACE/PUT/DELETE, Basic authentication, Proxy, Proxy Authentication, SSL, file uploads etc. Services_Weather 1.3.1 http://pear.php.net/package/Services_Weather/
Services_Weather searches for given locations and retrieves current weather data and, dependent on the used service, also forecasts. Up to now, GlobalWeather from CapeScience, Weather XML from EJSE (US only), a XOAP service from Weather.com and METAR/TAF from NOAA are supported. Further services will get included, if they become available, have a usable API and are properly documented.
October 2004
●
PHP Architect
●
www.phparch.com
8
Row, Row, Row Your Boat ZIP on the Fly with the Streams API
F E A T U R E
by Chung W. Leong
In human languages, the meaning of words tends to change over time. The word “porcelain” traces its root to “porcus”—Latin for pig. Functions in PHP also have a way of acquiring capabilities beyond what their names suggest. Once upon a time, the “f” in [fopen()] had stood for “file.” Nowadays, [fopen()] can open many other things.
O
ne thing about PHP that I’ve always found interesting is how much it resembles a real human language. It is extremely flexible. It has a few quirks and irregularities (that tend to drive beginners to the language crazy). And it has an enormous “vocabulary” that is constantly growing. The last time I checked, PHP has somewhere in the neighborhood of 3,500 functions. At times, browsing through the PHP manual can feel like reading the dictionary. You will come across functions you never knew existed, even if you are a experienced coder, much like fluent English speakers would find such words as ‘idempotent’ or ‘spoonerism’ in the OED. Whereas learning these obscure words probably won’t do much for your English prose, though, on more than one occasion I have stumble across functions that made major impacts in projects I worked on. In this article, I will share with you one of these discoveries: the PHP Streams API.
Having a Life Offline On our web site, we have a large collection of training materials designed to help people who are trying to learn foreign languages. These are highly interactive HTML pages that make heavy use of graphics and audio. The materials are contained in a database-driven content management system written in PHP. A feature that many of our users have requested is the ability to download multiple lessons in a single ZIP file. Because some of them often travel to places in the world where there is no easy access to the Internet (or where access to American web sites is blocked), they October 2004
●
PHP Architect
●
www.phparch.com
wanted to have offline versions of our lessons that they could burn onto a CD-ROM and take with them. My initial thought on how to implement this was to save the pages, along with the associated media files, to a temporary folder, then spawn an external program to compress them. I realized quickly, though, that this was unworkable. A single package could contain hundreds, sometimes thousands of files. Writing them all to disk would simply take too long. Either the web browser would drop the connection for lack of network activity, or our user would run out of patience and click cancel. Any reasonable solution therefore must involve creating the ZIP file using PHP. ZIP on the Fly ZIP is a relatively straightforward file format. At Zend.com, you can find an excellent article by John Coggeshall that describes how to create one from within a script; I will, therefore, refrain from going into much detail here. A ZIP file consists of two major parts: the data segments and the central directory. A data segment is the compressed contents of a file sandwiched between a header and a trailer. For each file in the archive there is a data segment. The central direc-
REQUIREMENTS PHP: 4.3.0+ OS: N/A Other software: N/A
Code Directory: streams
9
FEATURE
ZIP on the Fly with the Streams API
Listing 1 1
$crc = $this->src_crcs[$src_path]; $dest_path_len = strlen($dest_path); $s $s $s $s $s $s $s $s $s $s $s $s $s $s $s
= “\x50\x4b\x01\x02”; .= “\x00\x00”; .= “\x14\x00”; .= “\x00\x00”; .= “\x08\x00”; .= “\x00\x00\x00\x00”; .= pack(“V”, $crc); .= pack(“V”, $c_len); .= pack(“V”, $unc_len); .= pack(“v”, $dest_path_len); .= pack(“v”, 0 ); .= pack(“v”, 0 ); .= pack(“v”, 0 ); .= pack(“v”, 0 ); .= pack(“V”, 32 );
$s .= pack(“V”, $ds_offset); $s .= $dest_path; echo $s; $ds_offset += (42 + $dest_path_len + $c_len); $this->cd_len += (46 + $dest_path_len); $this->cd_num++; } } function EchoZipSummary() { $s .= “\x50\x4b\x05\x06\x00\x00\x00\x00”; $s .= pack(“v”, $this->cd_num); $s .= pack(“v”, $this->cd_num); $s .= pack(“V”, $this->cd_len); $s .= pack(“V”, $this->ds_len); $comment_len = strlen($this->comment); $s .= pack(“v”, $comment_len); echo $s; echo $this->comment; return (22 + $comment_len); }
10
FEATURE
ZIP on the Fly with the Streams API
which can retrieve data from the web and other sources. In our case, in order to obtain the dynamic HTML pages from our content management system, I would simply make HTTP requests to the local web server. Putting it All Together In Listing 2, you will find a simplified version of the download script I employed in our project. In the preceding web page, the user has checked off a number of lessons that she/he wishes to download. The user selections arrive via HTTP POST as an array of lesson identifiers. The script loops through this array, inserting the files for each lesson into the $file_list array. It then adds the supporting images, Javascript, and CSS files to the list. The keys of $file_list contain the source file-
Listing 3
Figure 1
Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
October 2004
●
PHP Architect
names, while the values contain the destination filenames. After the script has finished building the list, it creates a FlyZIP object and calls its AddFile() method for each of the files. When that’s done, it invokes the EchoToClient() method to send the ZIP file to the browser. For the purpose of simplifying the debugging process, the script will simply copy everything into a folder if the variable $DEBUG_PATH is defined. In Listing 3, you will find the code for the functions used in the download script. The AddLesson() function, with help from AddURL(), adds the URLs that constitute a given lesson to the file list. Since URLs are usually not valid filesystem names (‘?’ is not allowed), AddURL() calls Offline() to obtain a suitable destination filename. This function simply takes the script name (minus the .php extension), appends to it all the GET
●
www.phparch.com
1
11
FEATURE
ZIP on the Fly with the Streams API
variables, and attaches .html at the end. While the download script managed to create ZIP files containing the correct files, there is one major snag: all the hyperlinks are broken. Anchor tags in the original online pages that looked like need to be converted to in the offline versions. There were a number of ways to fix this. For example, I could have inserted a conditional statement into every link. Depending on the value of HTTP_USER_AGENT, it would echo either the online URL or the offline filename. Alternatively, I could have all the links point to static HTML files, then use Apache Rewrite rules to redirect them to the correct PHP scripts. Both of these solutions involve making a lot of changes to the existing code, something I would rather avoid. A more attractive solution would be to replace the links after a page has been retrieved. With Regular Expression, this is far from difficult. The tricky part is how to invoke the code that does the replacement. Since the page retrieval occurs within the FlyZIP class (more precisely, in the EchoDataSegments() method),
1
12
FEATURE
ZIP on the Fly with the Streams API
FTP, and FTPS protocols, as well as for accessing standard streams (stdin, stdout, and stderr) and contents inside compressed files. It also lets you make your own stream wrappers and register them to a custom protocol. In the PHP manual, for example, you will find the VariableStream wrapper, registered to the protocol var:// (see listing 4). This stream wrapper lets you access a global variable as though it were a file. A stream wrapper is a class that implements a defined set of methods. When PHP opens a stream, it creates an instance of the stream wrapper class and invokes its stream_open method. To read from the stream, it invokes stream_read with the number of bytes desired as a parameter. To write to one, it invokes stream_write with the data to be written. When a script calls fseek() or ftell() on a stream, PHP satisfies the request by invoking stream_seek() or stream_tell(). The stream wrapper, in turn, must then update or return the position of the current pointer within the stream. When PHP needs to know whether it has reached the end of a stream, it invokes the stream_eof() method. When it needs the statistics of a file—its size, for example—it invokes stream_stat(). When time comes to close the stream, PHP invokes stream_close() if the method is defined. If it is not, PHP assumes that no clean up is necessary and nothing happens. Besides responding to explicit calls to fclose(), PHP will also close a stream when the resource pointer returned by fopen() goes out of scope. For example: function Test() { $f = fopen(“var://Hello/”); } Test(); echo “Hello”;
Had VariableStream::stream_close() been declared, PHP would invoke it before it prints Hello, because $f goes out of scope when Test() returns. To connect a stream wrapper to a protocol, you use stream_wrapper_register() (or stream_register_wrapper() in PHP 4.3.0 and 4.3.1). The function takes two parameters, the protocol name and the name of the stream wrapper class. Also known as “schema,” the protocol is the part of a URL that comes before the colon. It basically denotes the method for accessing a particular resource. When registering your own wrapper, you must connect it to a unique protocol. You cannot override the built-in wrappers (HTTP, FTP, etc) or those registered earlier and, once a wrapper is registered, it cannot be unregistered. The name of the protocol must be longer than one letter (for otherwise Windows machines would get confused). It can contain letters, numbers, dashes, plus signs, and periods (but not underscores). Contrary to the recommendation in RFC 1738, protocol names in PHP are case sensitive.
October 2004
●
PHP Architect
●
www.phparch.com
Listing 5 1
13
FEATURE
ZIP on the Fly with the Streams API
A Function As a File Now, back to the problem at hand. I need a stream that does the following: retrieve a page from our content management system through HTTP, then perform a search and replace on all the hyperlinks. Instead of making a stream wrapper for this very specific task, I have created a general purpose one that gets its data from a function. In Listing 5, you will find the code for the FunctionStream wrapper class. The class largely resembles the VariableStream class from the PHP manual. In fact, I created it simply by performing a searchand-replace operation, changing occurrences of $_GLOBALS[$this->varname] to $this->data. I did implement stream_stat() , which was missing from VariableStream . file_get_contents() invokes this method and would throw a warning if it is missing. I have registered the FunctionStream class to the “func” protocol. A URL to a function stream has the format “ffunc:///?”. When FunctionStream’s stream_open() method is invoked, it parses the URL with parse_url(). Using the host part as the function name, it calls the function, passing the path and the query part as parameters. The return value is then saved in $this->data. Listing 6 shows the updated version of the AddLesson() function. The only difference is the URL root: a URL that looked like http://localhost/lesson.php?lesson=1 before has now become func://GetPage/lesson.php?lesson=1. When the FlyZIP class retrieves the contents of this URL via file_get_contents(), the stream wrapper calls GetPage(). The function retrieves the page from local-
host, then changes all the href attributes with the help of preg_match_callback(). For those not completely fluent in regular expressions, the pattern here matches any string situated between href=” and “. The (?
October 2004
●
PHP Architect
●
www.phparch.com
About the Author
?>
Chung Wing Leong is a senior programmer at the National Foreign Language Center at the University of Maryland. His interest is in languages, both of the human and computer varieties. When not coding in PHP, he watches corny Polish soap operas to pass his time.
To Discuss this article: http://forums.phparch.com/176
15
Driving Multiple Databases Anywhere
F E A T U R E
by Geoffrey Mondoux
As the communication era continues to grow, the need and desire to link many different systems and environments together increases. This article is designed to show you how to link different types of databases safely, and how to establish meaningful connections to read, write, and understand the wealth of information held in them.
T
rying to manage data from disparate database systems can be painful at best. The configurations are different, the SQL dialects are different and the table structures are definitely different. What if there was a way to remove these barriers? What if you could access these different systems in a semi-transparent way, and use the result sets as if there was only one database system back there? That would be good, right? Enter Database_Universal. Let’s dig in.
Universal Database Communication As we will be discussing the means of how information in different (but similar) databases can be shared, we will need some sort of abstraction layer for accessing them. For this task, we will rely heavily on the PEAR::DB class (http://pear.php.net). PEAR::DB is a wonderful connection class and I recommend readers review the documentation and consider using it in future projects. But PEAR won’t be enough. We need to be able to work with these different databases seamlessly. The Database_Universal class (see Listing 1) will handle our database connections, regardless of connection type. It consists of just five methods: two connection methods, two configuration methods, and one query method. We’ll explore those now. The setConnection() configuration method takes two arrays. The first array is used to create the DSN string; the second array is used to set options about the connection. This is the same information that PEAR::DB uses, so for more detail, see the PEAR::DB documentation for connecting, at: October 2004
●
PHP Architect
●
www.phparch.com
http://pear.php.net/manual/en/package.database.db.in tro-connect.php
The setFetchMode() configuration method allows you to change the type of record sets returned to you, by passing in one of the PEAR::DB result set options. The DB_FETCHMODE_ASSOC option, for example, returns result rows in an associative array keyed by field name, whereas the DB_FETCHMODE_OBJECT option returns result rows as objects. I have used DB_FETCHMODE_ASSOC as the default. Your preference may differ. The connect() and disconnect() methods are self explanatory, and simply connect and disconnect from the currently configured database. The query() method takes care of executing all queries, and returns a standard PEAR::DB_Result object for us to work with. We’ll see more about this method later on. Now that we have our wrapper in place, let’s get started. Setting the stage Suppose we have two different databases running under different database server engines on different
REQUIREMENTS PHP: 4.2 OS: Any Other: PEAR::DB and a database software
Code Directory: databases
17
FEATURE
Driving multiple databases anywhere
servers, and the information in each database needs to be combined into one interface. As an example, let’s say the owner of Gecko’s, a beach and body shop, recently acquired another store, Maniac’s Skate Shop. Each store will be keeping all of their existing systems since they will not be merging companies, but the new owner of Maniac’s wants to be able to view all inventory in both stores at once. Suppose Maniac’s system happens to be Postgres-driven, but Gecko’s is MySQL-driven. Needless to say, the
Listing 1 1
October 2004
●
PHP Architect
●
www.phparch.com
table structures are not the same, either. We have been given the responsibility of developing the unified interface, which will be used daily by the owner to assess purchases, mark downs, and to bundle products into packages. Let’s see how we are going to implement it. The adapter file Before we can do anything, we will need to view the table structures for both stores. Table 1 shows the layout of Gecko’s and Maniac’s product tables. The columns of these tables will play an important role in the crafting of the SQL statements and the retrieval of the data. In order to unify data like this, we need to have a method of mapping fields between the two databases. This is where the adapter file comes in (see Listing 2). This file defines an SQL query to get product data from each database, and then specifies a mapping between “virtual” fields and actual fields in each database. The adapter file is the key to providing flexible and immersible database collaboration. The connection settings Since we are only connecting to two databases, we’ll store the server connection settings in an array (see Listing 3). In larger systems, I’d recommend keeping a database table with this information in it and passing the required information to an array or object when required. Each server setting is broken up into four different array sets: “connection”, “options”, “universal_name” and “universal_type.” Only the “connection” and “universal_type” have to be filled in. The “options” setting, as suggested by its name, is optional. The “universal_name” setting simply provides a friendly string that can be used to output which server is being queried. Putting it together With the three files we have so far—the Universal_Database class, the server connection setListing 2 1
18
FEATURE
Driving multiple databases anywhere
tings, and the adapter file—we can now link Maniac’s Skate Shop and Gecko’s Beach and Body Shop together for the owner. Listing 4 shows our integration script. Pretty simple, actually. We first create a Database_Universal object, and then cycle through the servers we want to connect to, running the relevant query from our adapter file. Note how we index the result fields with the information specified in the adapter file. You can see how this method allows us to correlate information from disparate sources. Taking it further The owner feels that it is essential to be able to update inventory for each store on one screen, so in our second example we will be inserting data into the databas-
es we are connecting to. First, let’s look at the second parameter to the query() method of our Database_Universal class. This $data parameter provides data to be inserted into the SQL statements before being processed. This functionality is actually implemented in the PEAR::DB query() method—we just pass it through. It actually works a little like the printf() function, in that our SQL statement can contain placeholders, and the $data parameter can contain data to substitute into those placeholders. The placeholder in PEAR::DB is a “?”. For more information about the value replacement functionality of PEAR::DB (also known as prepare/execute), please see the PEAR::DB documentation for querying, at : http://pear.php.net/manual/en/package.database.db .intro-query.php
Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
NOTE: Remember to safeguard your data from breaking your SQL statement and making it unusable by using functions such as addslashes(). If you are inserting these values through a form on the web, then magic_quotes_gpc can be beneficial, as well. It will automatically add slashes to all $_GET, $_POST, and $_COOKIE data that needs to be escaped.
Let’s now revisit the adapter file, because in order to create the new interface, it will definitely need more work.
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
October 2004
●
PHP Architect
●
www.phparch.com
19
FEATURE
Driving multiple databases anywhere
Revisiting the adapter Listing 5 contains our expanded adapter file. It has two new elements in each SQL array for communicating with Maniac’s and Gecko’s databases: “add_product” and “change_product_quantity”. We will use “add_product” in this example and “change_product_quantity” later on. As mentioned, PEAR::DB’s prepare/execute functionality will be used to insert the values into the SQL statement. One thing of interest to note in this example is the difference in how MySQL and Postgres handle sequencing (or auto incrementing). MySQL will automaticallyl update the record to the next ID number in the sequence. However, Postgres does not do this and you must specifically identify the next ID value for the record you are inserting through a sequence index we called id_ref. This is one of the many minute differences in database systems. However, the adapter file is able to cope with it and maintain a universal query method.
Putting it together... again Listing 6 shows the new integration script. We can now use a form to add inventory to either store. With this example, you can see how easy it is to add your new products to any type of database, regardless of database engine or structure. We have also specified the fields in our SQL INSERT commands in the same order, which allows us to structure the $data array (for PEAR::DB prepare/execute functionality) the same for both databases. This allows the creation of our array data to be standardized and any database should then accept the $data array in the SQL statements. We have also protected the formation of the SQL statement by addslashes(). If, however, you are using a PHP configuration with magic_quotes_gpc turned on, then you can remove them from the example or check if it’s enabled and change the code accordingly. For determining which server to add the product to, we use a form element. The ’s in that
“ What if you could access these different systems in a semi-transparent way?”
Figure 1 Gecko's: TABLE inventory product_id | product_name 1 Blue Waves Board 150.00 2 Tropic Ocean Board 3 Insane Torrent Board
| 3 250.00 350.00
price | quantity | product_description Calm and serene 4 Head to the palm trees 1 For the rough riders
cost | 150.00 250.00 1
available 3 Street 4 Pipe Multi-purpose
Maniac's: TABLE id | 1 2 3
stock stock_name | Zero 1999 Deck Big 8 2001 Deck Major Grind Desk 350.00
| info
Listing 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
October 2004
●
PHP Architect
●
www.phparch.com
20
FEATURE
Driving multiple databases anywhere
element are built from the $servers array, which allows quick identification of the correct server. A little more functionality, please. Now that we have observed how the adapter file is used more extensively, and how the Database_Universal’s query() function helps us with the addition of fill-in variables, we are ready to build an example application which will adjust the inventory quantities in both stores and update the store’s quantity content instantly. Our shrewd owner wishes more functionality. Listing 7 shows our final integration script. Notice that in order to update/add/edit information in the easiest way, we must keep track of the server ID for each product using a hidden input field in the HTML form. This will allow our script to easily select the proper serv-
er on which to perform the query. Where to go from here The tools used in this article have many other uses. For instance, I have used them for a caching system, pertaining to the status of data in over 60 databases. It could also be used for linking any type of correlating data into one interface as we did in the examples above. It could be used to clone records from one database to another (like replication). Data mining is becoming an important aspect of the Internet. With XML-based protocols such as RSS, SOAP, and XML-RPC, the need to provide information in a unified form grows. Database_Universal could be used for the creation of RSS or XML files that contain information from many databases. XML compliments Database_Universal very nicely, and it also allows others to use the data in future scripts by simply querying your
Listing 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
October 2004
●
PHP Architect
●
www.phparch.com
21
FEATURE
Driving multiple databases anywhere
script to bring all the data together. Without these complimentary links this data might require a spidering script, which can require a lot of upkeep. You can even use this method for populating several databases with records you may have received from a spider or RSS feed reader. This is incredibly important if you are in a position of obtaining content to populate several types of databases. Another possible use of these tools is to create graphs of live data from content-related databases. This could be useful if you want to track supply/demand, highs/lows or averages through many different data-
bases and provide the data in a graphical format that people can understand and read. Also, if you wish to tie in non-database connections to Database_Universal (spider tools for instance), you just have to create a new type name (along the lines of pgsql or mysql) and check for it in the connect() and query() methods. Your custom code can then connect and query as desired and return the record set back. Help with SQL There are so many different database standards: ODBC, MSSQL, MySQL, Postgres, SQLite, and so on.
Listing 7 1
October 2004
●
PHP Architect
●
www.phparch.com
22
FEATURE
Driving multiple databases anywhere
Unfortunately, this means that each one has a slightly different way of connecting or querying. Fortunately, PEAR::DB takes some of that pain away by abstracting out the connection and querying operations. You just need to be able to form the proper SQL statement to achieve your goals. Once you have that, you can place it in the adapter file, assign it a friendly name in the array, and use it without worrying about it again. Here are some websites to assist you in the creation of your own multi-database system: Conclusion Data communication and collaboration is important. If you are ever working on an project with a lot of elements to talk to, molding the data to your needs is
essential. Not all programmers like the structure of a particular database table or its column names, but with the tools described in this article you can map them to whatever you want and use these new names in your scripting. The adapter takes the aspects that are ugly or that you don’t like, and transforms them into something that is easy for you to work with. It is extremely flexible and allows you to use database operations that may differ in syntax structure, while still achieving the same effect. I hope the information provided here was entertaining and informative. The task of collecting information from many databases is an important aspect of my current job, and I have no doubt that this need is shared among many. I have used the techniques presented here with over 10 different database structures and it has done me well and saved me much time.
• SQL Database Reference Manual http://www.sql.org/sql-database/mysql/
• Comparison of Oracle, MySQL, and Postgres DBMS http://det-dbalice.if.pw.edu.pl/detdbalice/ttraczyk/db_compare/db_compare.html
• Gentle Introduction to SQL http://sqlzoo.net/
October 2004
●
PHP Architect
●
www.phparch.com
About the Author
?>
With over eight years of programming and project experience, Geoffrey Mondoux is a Project Manager and Developer at Hostworks Incorporated (hostworks.ca) as well the owner and operator of SacredCore (sacredcore.net).
To Discuss this article: http://forums.phparch.com/180
23
Can’t stop thinking about PHP? Write for us! Visit us at http://www.phparch.com/writeforus.php
Roll Your Own Template
F E A T U R E
by Sérgio Machado
Template engines are a “must use” in simple to complex web applications because they provide a way to separate presentation from logic. This makes this kind of software simple to manage and fast to develop. Template engines, however, can also be a performance bottleneck. My purpose here is to build a template system with the following features in mind (in order of importance): lightweight (performance and ease of use), support for operations and functions for data manipulation (power and control to the developer) and simplicity.
Feeling the need for separation Dynamic web applications can become very hard to manage—they change and grow constantly. Suppose you have to develop a corporate intranet-based web application. You’ll have to handle database data, business logic, presentation logic and application security. You’ll make queries and process your data. You’ll process form posts and create complex data views with tables and graphics (you may need to support many languages). You’ll need to ensure that users are granted proper permissions and that these permissions are correctly applied—users should be able to see and do what they are supposed to, and should be forbidden to see and do what they are not supposed to. And finally, you’ll have to be able to change it according to new requirements. Typically, script files mix presentation and logic, meaning that PHP code is embedded in HTML code. If the tasks above are required this will make script files hard to read and change. If the application is being developed by a team, script readiness will become more important. The solution to this seems to be the use of templates. In template files we define all clientside and data views by removing the HTML code from scripts, making them more simple and manageable. The use of templates can also bring other advantages. By using it we may assign a team member specifically to the user interface. First we agree on the available variables and their naming for each view, and then he can build it using stubs for the data source for testing. Another advantage is that you can easily develop themes for your site—you can change whatever you October 2004
●
PHP Architect
●
www.phparch.com
want in a template file, keeping the variables. Having a template system in PHP usually means having template files with special tags which are parsed using a template engine. This engine has basically two goals: variable binding and variable substitution. Advanced engines have caching mechanisms and precompiling for performance. In the template file definitions you can define loops to iterate over arrays, and logical expressions for conditional formatting—some engines provide arithmetic operations, too. The problem with template engines is that they can cause performance problems because template files need to be parsed. This is especially true if you have complex templates with nested arrays. For example, imagine a page listing items from a database where for each item you can take an action using a combo box filled with other items from the database that are dynamic so you cannot hard-code them. I think template engines should be a non-intrusive component in a system, and I’ve found myself with the need to develop one. I agree that we should reuse as much code as we can—we shouldn’t reinvent the wheel. But sometimes the solutions available don’t fit our purposes, and we should continue researching to build better wheels. My purpose here is to build a tem-
REQUIREMENTS PHP: Any OS: Any Other: N/A Code Directory: template
25
FEATURE
Roll Your Own Template
plate system with the following features in mind (in order of importance): lightweight (performance and ease of use), support for operations and functions for data manipulation (power and control to the developer) and simplicity. Desperately seeking solutions The main performance killer in a template engine is parsing the template files for variable substitution or template file execution. Caching mechanisms and precompiling could be a solution, but could make the template engine more complex than I want it to be. These mechanisms would need to control which files are compiled and also contain a parsing mechanism with error checking. I had to find a way to remove this overhead and I did it with a simple and somewhat obvious solution: using PHP as a template definition language. In fact, we can think of PHP as a template engine anyway because we can insert dynamic parts into a document. PHP “as is” doesn’t offer a proper way to separate business logic from presentation, but I’ve decided to find a way to use PHP as a template definition language, without turning my template files into complex PHP scripts.
“The main performance killer in a template engine is parsing the template files for variable substitution or template file execution.”
many things (see Listing 3). I said that the header and footer were common, so why are they not in the master template? Again, it’s because requirements can change and in the future we may wish to have different headers or footers, just like we can have different main content—it’s good to be ready. Before going on I need to explain some choices made in the above template files. I’ve enclosed variables in tags (see the $pageTitle assignment in Listing 1) because I think it looks cleaner than using echo() in template files. In the header template file I’ve used foreach and endforeach for the same reason—I want my template files to be easy to read and change. Remember that foreach requires an array—otherwise PHP will complain: “invalid argument supplied for foreach()”. You can mitigate this problem by casting to an array, or by always assigning your array variables to array() before using them. The first solution silences the warning, but doesn’t mean that everything is working as expected—should the array be empty? In the second solution the warning will be shown, and we’ll
Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Listing 2 Setting the stage Let’s start developing a small dynamic web page to manage product orders. We’ll have a table of customers which have made orders. Orders have items which belong to categories. Orders also have a state: new, shipping, waiting for payment, returned or closed. We want an easy way to manage orders without going back and forth constantly, so orders and items should be on the same page. Suppose this page will be one of many included in a store application. This store application will have a common header and footer which can change from time to time. This looks like the perfect place to develop our template. The master template will join the results of the other templates—the header, the contents and the footer (see Listing 1). The header will have the page title and the menu options which come from a database table (see Listing 2). The footer will have the copyright notice, and will be in a separate template because the requirements can change and the footer could contain
October 2004
●
PHP Architect
●
www.phparch.com
1
Listing 3 1
26
FEATURE
Roll Your Own Template
have the chance to correct the problem at development time—if the array should be empty then we can assign it to array() to hide the warning. This will make the files cleaner, too. By now we should be able to spot some advantages of using PHP as a template definition language: • we know the language • we have full control of the presentation logic because we have a full-featured language to use • errors will be caught by the PHP interpreter • we have full syntax highlighting from many editors and development environments. The most important advantage is what we were searching for: performance. We trust in the PHP engine’s ability, so we give it the task of doing what it does best: interpreting PHP. Unfortunately, we can spot Listing 4
●
PHP Architect
The template engine We need a way to bind our main script variables to the template variables, as well as a way to fetch the result of the template file before returning it to the browser. If we were to simply define the template variables in our main script and include() the template files, the result would be immediately sent to the client. Why don’t we want that? Because it imposes limits on what we can do with our templates. For example, we would have to make sure we execute the templates in order. And what if we want to execute the same template file over again, but with different values? Listing 4: Continued...
1
27
FEATURE
Roll Your Own Template
A little research in the PHP documentation led me to the Output Control functions. These functions allow you to control the output to the browser. They are mainly used to prevent output being sent to the browser before certain HTTP headers are specified using header() or, as in the example in the documentation, to set cookies using setcookie(). All cookies and headers must be sent before any output is sent. From the Output Control table of contents in the documentation I found some very useful functions. First, ob_start() allows us to turn output buffering on. When output buffering is on, output that would normally get sent to the browser is stored in an internal buffer. Secondly, ob_get_contents() returns the contents of the output buffer. Finally, ob_end_clean() will clear the output buffer (without sending it to the browser) and turn off output buffering. Interesting is the fact that output buffers are stackable, so we may use nested ob_start(), ob_end_clean() , etc. Now we have a way to execute templates and get the output into a variable, like this:
1 2 7 8 9 10 11 2004-09-01T13:52:57-05:00 12 en-us 13 Your Site Description 14 15 http://www.thehookup.ca/ 16 Your Site Title 17 18 19 Your Item Creator 20 2004-09-01T13:52:57-05:00 21 Your Item Subject 22 Your Item Description 23 http://www.thehookup.ca/index.php?id=485 24 Your Item Title 25 26
Figure 3
Listing 9 $bFOAFNode = new BlankNode(“”, “bNode2”); $statements[] = new Statement (new Resource (“http://www.thehookup.ca/”), new Re-source (“http://xmlns.com/foaf/0.1/maker”), $bFOAFNode); $statements[] = new Statement ($bFOAFNode, new Resource (“http://www.w3.org/1999/02/22-rdf-syntax-ns#type”), new Resource (“http://xmlns.com/foaf/0.1/Person”)); $statements[] = new Statement ($bFOAFNode, new Resource (“http://xmlns.com/foaf/0.1/mbox”), new Literal(“mailto:
[email protected]”));
October 2004
●
PHP Architect
●
www.phparch.com
37
FEATURE
Exposing Web application data semantically using RAP
everything else, it has its strengths and weaknesses. The fact that the most common format for storage and exchange, RDF/XML, is hard to create and understand (unless you use a toolkit such as RAP) leaves many Web developers disillusioned. On the flip-side, one of the wonderful things about the Resource Description Framework is that you may describe your Web resources using one or many of the pre-existing vocabularies or by inventing your own. Given that you can mix and match as you please, there are no limits as to how you can describe your resources without sacrificing machine readability. To illustrate this concept, we will take the RSS feed we just created and add additional statements that use the FOAF RDF vocabulary to describe details about who created this RSS channel. Let’s take our previous example and add the statements in Listing 9 to the model. These statements, using the FOAF RDF vocabulary, communicate that the channel has a maker, that maker is a person, and that person has the email address
[email protected] . Our newly generated RDF/XML file (Listing 10) is nearly identical, but now contains the additional stateFigure 4
ments. You will notice that the FOAF vocabulary namespace is now included along with the namespaces for Dublin Core, RSS 1.0, RDF and RDF Schema. Running the RDF/XML file through the RSS FeedValidator[7] shows us (Figure 3) that even though we’ve added new FOAF statements that the validator doesn’t understand, the document is still a valid RSS channel. You can add in statements using any vocabulary, be it common or proprietary and the validator will simply ignore those statements for which it does not have support built-in. We’ve touched very briefly on the FOAF RDF vocabulary and, in doing so, have not done it justice. The FOAF community is currently very active and the FOAF RDF vocabulary is becoming very rich. Many new social networking sites are exposing their users’ profiles as FOAF documents thus expanding the interoperability of their services. Associate Geographic Information with Your Syndicated Content Using WGS84 As we’ve just proved, the RSS FeedValidator[7] is a Web service that can process semantic data. In the case of our mixed RSS and FOAF document, it simply ignores those RDF triples that it doesn’t understand. To further demonstrate the value of exposing your Web data semantically, we will again add to our RSS feed, this time augmenting it to contain GPS data. The w3c has created a WGS84 RDF vocabulary[8] for representing latitude/longitude positioning that is being used by semantic innovators to add geo positioning information to blogs, digital photos and other online resources. Taking our existing example, let’s add two statements (Listing 11) to our model stating that the syndicated item (web page) is from Toronto, Canada. Toronto has a lat/long pair of 43.6667/-79.4167. After re-generating our RDF/XML file (Listing 12), we can see that the Geo RDF vocabulary is listed in the namespaces and that the item now has geographic properties. Running this file through the RSS FeedValidator[7] once again shows that the document is
References [1] http://www.w3.org/TR/rdf-primer/ [2] http://www.intellidimension.com/ [3] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/ [4] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/phpdoc_classtree.html [5] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/tutorial/getting_started.htm [6] http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html [7] http://feedvalidator.org [8] http://space.frot.org/draft-geo-draft.html [9] http://www.w3.org/RDF/Validator/ [10] http://www.mapbureau.com/rdfmapper/ [11] http://www.w3.org/TR/rdf-schema/
October 2004
●
PHP Architect
●
www.phparch.com
38
FEATURE
Exposing Web application data semantically using RAP
a valid RSS feed. Processing the feed with the w3c’s RDF Validator[9] shows that the document is also a valid RDF file. If you want to see your RDF model as a pictorial graph, enable the graphing capabilities when you validate your feed using the RDF Validator[9]. Lastly, let’s send our feed to a Web service called RDF Mapper[10]. In the content field of the online form, enter the URL to your RSS/FOAF/GEO file. You’ll notice that the service is able to pin-point your syndicated item on a World map, as shown in Figure 4! In fact, the RDF Mapper will list your items and allow you to click through to your syndicated content. RDF Mapper is a powerful example of what may be soon possible once Web developers start providing semantic data beyond simple RSS feeds. Imagine a Web, whereby data is interoperable and exchangeable with Web services without any refactoring. Software agents could also harvest data from the Web, providing new data and services. Summary With a few lines of code and the excellent RAP open source project, we’ve created RDF graph models and generated rich RSS feeds that harness multiple freely available semantic vocabularies. Our examination of
the RDF Mapper Web service hints at the power of the evolving Semantic Web, and illustrates the value created when Web developers expose their Web data in a semantic format. The complete source code for the examples provided in this article are available at http://www.thehookup.ca/semaview/rap_phparchitect.zip or in your code download.
About the Author
?>
As Vice President of Development and Operations for Semaview, Inc., Paul is focused on delivering products and services for businesses adopting personal information management and semantic technologies. Paul manages Semaview’s product development and service operations teams.
To Discuss this article: http://forums.phparch.com/177
FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit http://www.favorhosting.com/phpa/ call 1-866-4FAVOR1 now for information.
October 2004
●
PHP Architect
●
www.phparch.com
39
Integrating PHP and OpenOffice Using PHP to Dynamically Manipulate and Convert OO documents
F E A T U R E
by Bård Farstad
You have probably already heard about OpenOffice, the Open Source Office suite that is now used by millions of people worldwide. One of the key points in OpenOffice’s mission is that its file formats are saved so that they can be accessible to anyone.. In this article, we will look into the OOo XML storage format to see if it’s really as open as they claim. Our Document I have created a very simple document in OpenOffice, which you can see in Figure 1. The document contains two headers (level 1 and 2 respectively), one paragraph with some plain text and an image. It’s a very simple document, but it has the basic structure that makes it perfect for a proof-of-concept example; naturally, you can apply the same concepts that we will be examining in this article to more complex files—but it would be pointless to complicate things while we’re trying to learn something new. The File Format OpenOffice documents are saved using the OpenOffice XML format in a .sxw file. This format is, essentially, just a ZIP archive that contains one or more XML files (and, in our case, an image as well). If you extract the ZIP file with the unzip command (which you’ll be able to do even if it has an SXW extension), you will see that its structure is as follows: . |— | |— | |— |— |— |— `—
META-INF `— manifest.xml Pictures `— 10000000000001680000010E84CA0370.jpg content.xml meta.xml mimetype settings.xml styles.xml
The meta.xml file contains metadata elements for the document, such as the creation date, editing time and October 2004
●
PHP Architect
●
www.phparch.com
statistics like paragraph and word count. The mimetype file simply contains the MIME type of the current document, in this case application/vnd.sun.xml.writer. The styles.xml file contains definitions about fonts, alignment, colours used in the document and so forth. In settings.xml, you will find user interface settings for the application with which the document was edited. The manifest.xml file, finally, simply contains a reference to all the files that makes up the document. What is of interest to us, however, is stored in content.xml and the JPEG image. The content.xml file, as you may have guessed, is where the actual content is stored, so that’s where we’ll look for our dynamic data. The JPEG file contains, of course, the image that we’ve embedded in our text. Content Structure We are now going to take a look at the structure of the XML document. This part requires some knowledge about XML documents and XML namespaces, which are really beyond the scope of this document; if you are not familiar with these, you will find plenty of information on this topic by googling for it or looking up the official XML website at http://w3c.org/XML/. In Listing 1, you can see the most important part of
REQUIREMENTS PHP: 4.3 OS: Any Other: eZ xml and Sablotron XSLT extension Code Directory: openoffice
40
FEATURE
Integrating PHP and OpenOffice
the content.xml file. I have extracted the body part of the XML data, which is where we can find the actual content of the document. All the text is defined using the namespace prefix text, while the image uses the namespace prefix draw. This means that we can quickly identify and extract any text or image data separately. The reason for using namespaces here is to avoid collisions between tags in different contexts—which makes it possible for OpenOffice to re-use a tag in a different context just by defining a new namespace, rather than finding a new name for it. In our document, the text is marked up with the tags h, p and image. They correspond to a definition of a header, a paragraph and an image object. If we read carefully, we can actually read the text as it is already, which indicates that the document data is human readable and accessible in a very simple way. If you consider the paragraph tags, you will see that they have a style definition attribute, style-name, which defines the layout of the paragraph. We are not interFigure 1
ested in this part, since we are only going to grab the content. If you want to learn more about the style definitions, you will find them in the styles.xml file. The same goes for all the other tags that have a style-name attribute. The best thing about the OpenOffice file format is that it’s documented, and it’s documented well. There is a link to the documentation in the resources part of this article. Look in this specification for details about the file format. TIP: if you want to have a look at the XML files that OpenOffice creates, you should disable size optimization of the XML files. This is done in Tools -> Options -> Load/Save -> General. This causes OOo to add newline characters to its output, thus making it much more readable.
Extracting the Data Let us try to perform some automated extraction of our data. To do this, I have chosen to use a simple PHP script and a DOM XML parser. In this example, I also used the eZ xml XML library, which comes as a part of the eZ publish CMF, but any other DOM-compatible XML parser should be able to do the job just as well. If you look at the code snippet in Listing 2, you will notice that we start by printing some XHTML text stating that this is an OpenOffice XML import test, followed by a horizontal ruler. When PHP starts executing the script, the first thing we do is to include the XML library. Once the library is included, we create a new eZXML parser instance—$$ xml—which, in turn, is used to create a DOM tree using the domTree() function. DOM (short for Document Object Model) is a programming API, defined by W3C, for accessing XML documents based on a treelike structural representation of the XML document. Since the data we are looking for is
Listing 1 1 2 Test to see if OpenOffice is open 3 This is just some text to demonstrate normal paragraph text. 4 Image of Ormevika 5 6 7 8 9
October 2004
●
PHP Architect
●
www.phparch.com
41
FEATURE
Integrating PHP and OpenOffice
stored in the body tag, we try to extract the body node by using its name and the namespace URI. Notice that we do not use the namespace prefix office, but the URI http://openoffice.org/2000/office. The reason for this is that the namespace prefix is just an alias for the URI inside the current XML document that could change in the future. We call the elementsByNameNS() function on our DOM document object to get an array of matching nodes. Next, we check to see whether we were able to retrieve a DOM node. We need to do this because there should only be one node if the document is valid—if there aren’t any or there are too many, an error has occurred and we should deal with it accordingly. Once we are sure that we have the right data, we extract the body node into the $bodyNode variable and we use foreach() to iterate over its children nodes. I’ve created a function called handleNode() to take care of processing the child nodes. This function returns HTML text that we print out immediately, just to display its contents. The code for the handleNode() function is listed in Listing 3. In this function, we start by running through a switch() for the name of the node/tag. I’ve added cases for h and p, since these are the tags that show up in our sample document; if any other tag occurs, my script just print “Unsupported node.” In case of an h tag (header), we first extract the level attribute, which is used to generate the HTML header with the correct level. This is done with the attributeValueNS() function. Then, we append the text content of the header tag with surrounding tags to the variable, $htmlTextContent, which just holds the HTML value we are returning from the function. The p tag (paragraph) is a bit more complicated. The paragraph tag can have child nodes, so we need to do a foreach() on the children. We then check the name of each child node with a switch(), which, in my implementation only supports nodes of type #text and image, which is what we need for our small OpenOffice
document. If any other node type occurs, we again just print “Unsupported node.” In the case of a #text node, we just append the text content of this tag to the paragraph variable. With the image tag, we are in this case just interested in the path to the image itself, so we just extract the href attribute—again using the attributeValueNS() function— and use that to create an img XHTML tag. Note that we remove the first character from the href; this is done to get the relative path to the file which we can use directly. We use the ltrim() function to remove this character. After all child nodes are checked, we append their content, complete with a surrounding p tag to the script’s output for display in XHTML. Finally, we return the contents of the $xhtmlTextContent variable, which contains our converted text.
“This format [OpenOffice] is, essentially, just a ZIP archive that contains one or more XML files.” XML Transformation When using PHP, you have many different tools at your disposal to work with XML. XSLT transformations have become more and more popular lately, so I am going to show an example of how we could perform the OOto-XHTML conversion above using the XSLT functionality available in PHP 4. You need to enable the XSLT Sablotron extension when compiling PHP for this to work—you may want to check out the XSLT section of
Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
OpenOffice XML import test
October 2004
●
PHP Architect
●
www.phparch.com
42
FEATURE
Integrating PHP and OpenOffice
the PHP manual for further details on how to accomplish this. The PHP code that we will use to perform XSLT transformations is very simple. Once we have created the XSLT processor object using xslt_create(), we can execute the transformation directly by calling the xslt_process function, which will return the converted
XML document, or a whole lot of XHTML in our case. If an error occurs, we can use the xslt_error function to get details about what went wrong. To free memory, we should always use the xslt_free() function when we’re done with the XSLT processing. In Listing 4, you will find a small example written in PHP 4 that shows you how you the XSLT transformation
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
October 2004
●
PHP Architect
●
www.phparch.com
43
FEATURE
Integrating PHP and OpenOffice
takes place. In this example, we just print the converted document if no error has occurred. As you can see, when dealing with XSLT transformation, writing the PHP code is the easy part—it is preparing the XSLT file itself that most of the work goes into. I have included the XSLT file I used to convert our simple OpenOffice document in Listing 5. We start the XSLT file by defining several different namespaces, including OpenOffice namespaces like office, text and draw. This is needed because these namespaces are used in the OpenOffice file and they are necessary to match the tags we are going to transform. The next element in the XSLT file is . This defines the format of the output document to XHTML using UTF-8 encoding. We also declare that the output document should be indented according to its hierarchy, so that it will be easier to read for humans. To define the transformation rules, we use the element to instruct the transformation engine as to what it should whenever each specific node is matched. Our first node is office:body, which we transform into a simple XHTML body tag—keep in mind that, in this example, I have kept the XHTML body simple and in its current state it will not validate, due to a missing header definition. To process the child nodes, we use the element, to which we do not supply any matching attribute, since we want to process all child nodes of office:body indiscriminately. If we had wanted to process only nodes of a specific type, we could have used the select attribute of the element to restrict the transformation as needed. To transform headers and paragraphs, we define a
new template that matches nodes of type text:h and text:p. The transformation works in a very similar way for both nodes—they are directly converted into h1 and p XHTML tags. Again, we use the tag to transform their child nodes just like we did before. This XSLT file does not handle multiple levels of headers—that can be done by matching the value of the text:level attribute on text:h nodes. You can use the element together with and to express multiple conditional tests. The last part of our XSLT document handles the transformation of draw:image nodes, which become XHTML tags. Since image nodes does not contain children, we do not use the element here, resorting instead to to fetch the value of the xlink:href attribute. Since OpenOffice starts the path of the image with a # character, we need to remove this before we use it. To achieve this, we use the substring() function and, once we have removed the leading # character, we construct the src attribute of our tag using the element. The End Result The result of both examples is an XHTML page—you can see a screenshot of the page outputted by the DOM-based example in Figure 2. The result from the XSLT transformation is exactly the same, so I am not including a screenshot of that. As you can see from the screenshot, we have managed to maintain some of the formatting, and we are also able to show the image embedded in the article. Most importantly, however, we have converted and kept all of our content in a
Listing 5 1 2 8 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 documents/unpacked/ 30 31 32
October 2004
●
PHP Architect
●
www.phparch.com
44
FEATURE
Integrating PHP and OpenOffice
usable format. Converting an OpenOffice document to XHTML like we have just done is not very hard—after all, you could have just used the “Save as HTML” function built into OpenOffice. However, there is a lot more that you can do here—including manipulating an OO document directly and using OpenOffice to generate output in a variety of different formats that are normally hard to work with. DOM Parsing or XSLT? Since we looked at two very different approaches of converting our OpenOffice document, you may be wondering whether one is markedly better than the other. Which approach should we use? It is hard to assert that one method is better than the other, since this fact will depend on what you are trying to achieve. If you want to perform a direct conversion from OpenOffice to any other XML-based format—like XHTML or DocBook—I would recommend Figure 2
that you use XSLT transformations, which were built specifically for this purpose. If, on the other hand, you need to interpret and work with the contents of an OO document (for example, if you want to import the OpenOffice document into a CMS system, which is what I have done with eZ publish), then the DOM approach is clearly more appropriate. Regardless of which method is best for your needs, the fact that you can choose is a major plus. There are also other tools you could use in this case, such as, for example, the SAX XML parser in PHP 4 and SimpleXML in PHP 5. Content Management Maybe you’re wondering why this article is about the OpenOffice XML format in particular and not about any XML document. After all, XML is XML, isn’t it? Well, of course it is—but this particular XML format is used by millions of people every day to produce content, and that’s what makes the difference. I hope that my article will encourage you to support OpenOffice in your PHP solutions and Content Management Systems. Doing so will make it easier for the end-users of your solutions to share their information using a common format. They can also use their favorite word processor to produce the content, and they can do so on multiple platforms and using an interface paradigm—WYSIWYG formatting, spell checking and the comfort of writing content with a dedicated word-processor application instead of a browser— that they are already familiar with. Future Additions To really integrate OpenOffice with your applications, you would need to support the file format more thoroughly. Basic text formatting like bold, italic and underline is a must, but these are defined as styles in OpenOffice, so you would need to actually parse of the style definitions in order to recognize and convert the font styling properly. You would, of course, also need to support hyperlinks—although these don’t pop up too often when you are writing a document with a word processor (the automatic link recognition feature that pretty much every office application seems to have these days can actually be quite annoying if you’re trying to write content destined for offline use), they are very common when you want to write content for publication on the
October 2004
●
PHP Architect
●
www.phparch.com
45
FEATURE
Integrating PHP and OpenOffice
Internet. Tables and lists would be the natural next step. In a content management system setting, you are normally not interested in formatting, so you can most likely ignore most of the style definitions. However, there are some styles in which you might be interested, including alignment of text and images, so you may want to investigate a little further here. Image size is another layout definition you might be interested in when importing into a CMS system. OpenOffice stores only the original, high resolution version of the image and has information about the image size in inches. Thus, in order to convert the image for usage on the web, you need to first figure out the size of the OpenOffice document page in inches, which is defined inside the style.xml file, and then calculate the size that the picture will take on the screen. This is accomplished by determining the width of your web page in pixels—normally this would be approx 750 or so—and then scale the size of the image down by using this simple formula: (image width in inches) —————————————————————————————————— (page width in inches)*(display page width in pixels)
For example, if you have a 4.5” wide image originally placed on a page whose width is 8.5”—which is the normal A4 sheet dimension—and you need to resize it for a web page that is 750 pixels wide, the image needs to be scaled to 4.5/8.5*750=375 pixels wide. The image height needs to be calculated based on the same formula, but image conversion and scaling programs— for example ImageMagick—will normally do this for you automatically, thus saving you from any further math.
Another nice feature worth exploring is building a WebDAV—Web-based Distributed Authoring and Versioning—interface to your application. The system could then import documents automatically when OpenOffice saves them through DAV to it; for example, this is the approach we at eZ systems use when integrating OpenOffice into our CMS application eZ publish. Conclusion With just a few lines of code, we were able to extract loads of essential information from our document and convert it into XHTML. Our script does not currently support much of the OpenOffice format, but we were nonetheless able to do what would be next to impossible (at least with less than one hundred lines of code) had we been dealing with a proprietary format like the ones provided by Microsoft Office. This means that you can now interact with a great word processing product rather than having to come up with your own interface, which is unlikely to match OpenOffice’s functionality, or even come close to it. For my part, I must say that the OpenOffice projects are rather successful in making the data accessible, but I wish they could separate a document’s content from its styles better—this problem becomes obvious once you notice that the content.xml file starts with a bunch of font declarations, which should, in my opinion, be stored in the styles.xml file instead. Still, that’s a minor issue that’s easy to deal with once you know about it and, therefore, I think that OO implements a pretty good format—especially when you consider the alternatives!
Resources OpenOffice XML file format http://xml.openoffice.org/
OpenOffice XML file spec PDF: http://xml.openoffice.org/xml_specification.p df
About the Author eZ publish CMF: http://ez.no/ez_publish
PHP XSLT documentation: http://www.php.net/manual/en/ref.xslt.php
?>
Bård Farstad is one of the three co-founders of eZ systems. He has been working professionally with CMS development since 1999 and has written many general purpose libraries like XML parsers, SOAP and XML-RPC libraries (both on the client and server sides). He is also one of the main developers of the eZ publish CMS. In his spare time he likes to play with his daughter, play the guitar and is also into aquascaping. You can reach Bård at
[email protected].
Document Object Model introduction: http://www.w3.org/TR/DOM-Level-2-Core/intro duction.html
October 2004
●
PHP Architect
●
www.phparch.com
To Discuss this article: http://forums.phparch.com/175
46
PHP-GTK and the Glade GUI Builder Building Client Applications with Style
F E A T U R E
by Tony Leake Many of you have probably tried out PHP-GTK to build simple applications. Coding a complex window program can be a time-consuming and error-prone task. However, using the Glade GUI builder can simplify the task and make window building fun again.
I
n this article we will build a simple contact management system using PHP-GTK and the Glade GUI builder. Space necessitates that much of the error checking we would do in the real world will be omitted, such as checking that our database inserts worked and validating user input. I will point out when this may be needed, however I’m sure that it will be fairly obvious when extra checking is required.
Glade Basics Glade allows a developer to visually create windows by dragging and dropping ‘widgets’ on to them. This means that we do not need to write code to place a button on a form—the code only needs to deal with what happens when that button is clicked. As dealing with GTK can be quite a complex affair, this will save time in building our applications. It also makes experimenting with different layouts fast and easy. If that button does not look right where it is, we just cut and paste it somewhere else. Glade uses the concept of signals to attach widgets to code. For example, if we have a button called button_submit, upon being pressed by the user it will emit a signal called clicked. We tell Glade to connect that signal to a function, which, by default, will be called on_button_submit_clicked(). All we have to do, then, is code a function of the same name, which will automatically be called when the button is pressed. Glade projects are saved in XML format; in our PHP script, we simply need to call GTK::GladeXml($glade_file) and our window is creat-
October 2004
●
PHP Architect
●
www.phparch.com
ed. In order to use glade within PHP, you will need to compile it with —enable-libglade under Linux or have the libglade.dll installed in your path under Windows. You can find more details about getting your PHP-GTK installation set up in the PHP-GTK manual, wiki and mailing list archives. When Glade is first started, the following three windows will be opened: • The main Glade window (Figure 1) is mostly used for opening and saving projects. There are other options, mainly for creating source code for other languages supported by Glade but we will not be using them here. • The properties window (Figure 2) allows us to set various attributes, such as height, width and name of the currently selected widget. A widget is selected simply by clicking it with the mouse. • The palette window (Figure 3) is where we choose a widget to place on our form. Simply select a widget in the palette, then click where you want it to appear. If you hover over the icons in the palette, a tool tip will pop up giving you the name of the widget.
REQUIREMENTS PHP: 4.3.x + OS: Linux, Windows Other: PHP-GTK 1.01, Glade, PEAR:DB Code Directory: gtk
47
FEATURE
PHP-GTK and the Glade GUI Builder
There is one other window that we will be using—the widget tree—but Glade does not show it by default. Select Show widget Tree from the View menu in the main window in order to make it visible. The widget tree is a hierarchical view of the widgets that are part of our form; we will see how this is used later on in the article. There is no screen shot of the widget tree, as it is just an empty window when first opened. A Quick Example To get warmed up, we will build a simple form with two text boxes and a submit button. In the palette window, click the Window icon, which is at the top left of the palette. This will cause several things to happen: the window icon will appear in the Glade panel, the properties window will spring to life, the widget tree will be populated and, most importantly, our new window will appear for us to start to add widgets to. If, for any reason, the window does not appear, you just need to double-click the window icon in the Glade panel. If you look at the properties window, you will see many things that can be changed. Even though we won’t get to use many of these in our examples, it is always a good idea to get to know what options are available to us. A window is classed as a TopLevel Widget in GTK. A top-level widget can only contain one
other widget, which is usually a Layout Manager. We will use the table layout manager, which will be very familiar to you if you have ever coded HTML. Select the table widget (third row from the bottom, third from the right) in the palette and click on your window to attach it. In the dialogue window that pops up, set its size to be two columns by three rows. In the top left cell and the one below it, place label widgets (the icons for these looks like a capital ‘A’). Using the Properties window, set their Label properties to be Input and Output. Next to the labels, place text entry widgets and change their Name properties to be entry_input and entry_output. I find it useful to prefix a widget’s name with its type, but feel free to come up with your own naming convention—just keep in mind that being consistent will make working with your code a whole lot easier. Now, in the bottom left-table cell add a button and change its name to button_submit and its label to Submit. You should now have a window that looks like Figure 4. Not very pretty, but we can look at how to better control our layouts later. Now, we need to add a signal to let us know when the button is clicked; select the button, then, in the properties window, select ‘Signals’. Next to the ‘Signal’ text box, click the button with the ellipsis on it; this will
“Glade allows a
developer to visually create windows by dragging and dropping ‘widgets’ on to them..”
Figure 1
Figure 2
Figure 3
Figure 4
October 2004
●
PHP Architect
●
www.phparch.com
48
FEATURE
PHP-GTK and the Glade GUI Builder
Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Figure 5
Writing a Contact Management System. So, what is a contact management system? It’s really nothing more a glorified telephone directory, the kind of thing that might be used by a sales department. Users can search for a contact by first, last or company name; once the person is found, more details can be viewed, such as department, telephone number, and so forth. One of the most important things about a contact management system is being able to leave notes on what the call was about—for example, ‘Bob said didn’t need fax rolls this month, call back July.’ That way, different sales people can be sure they won’t harass poor Bob unless he’s ready to buy. Thus, when looking at our contact details, we want to see all notes that have been left, in order, with the date of the note and who left it. We also need ways to leave notes and add new contacts. This is a very simplistic application example that, in the real world, would need much more functionality to be useful. For example, if someone adds a note to call Bob back in July, the system should actually remind us about it when July comes along. However, this article is about learning to use PHP-GTK
Figure 6
October 2004
●
PHP Architect
bring up a window showing all of the signals that are available to us. Some of these signals belong to the widget itself and some to its parent widgets It is a good idea to start looking in the manual at widget hierarchy—as you come across new ones, it will give you a better understanding of how GTK and its widgets work. Double click on the ‘clicked’ signal to select it; this will populate the handler text box with on_button_submit_clicked. Click Add to accept the signal and save your work as first_project.glade . In the same directory that you saved the glade file in, create a file called first_project.php containing the code from Listing 1. When you start the program, type some text (HHello for example) into the first text box and, when you click the button, You said Hello will appear in the second text box.
●
www.phparch.com
49
FEATURE
PHP-GTK and the Glade GUI Builder
and Glade, so we should have enough here to work with regardless. In the following examples, we will use a MySQL database to store our contact details; however, I have chosen to use Pear::DB to simplify things, so if you want to use PostgreSQL instead, only minimal changes will be needed. Also, the database structure has been modeled on simplicity rather than reality, so, for example, the Contact table has the user’s name and company in it. In a more realistic scenario, these should be normalized in two separate tables, but as this is not a tutorial on database design I hope you will forgive this bad practice in the interest of preventing my article from taking over this issue of php|a completely. The User Interface. In GUI applications, just as in web pages, it is important to present the information to the user in a logical format that is easy to understand. We could, I’m sure, make one large page to handle our whole app. However, we would just end up with a mess on the screen and would overwhelm the user with a mass of out-of-control text boxes, so we will split the application into five simple screens. The first will be the login screen (Figure 5). The user enters his username and password and, if validated, is taken to the search screen. When we enter this screen, we want the cursor to be in the username entry widget to save the user having to click with the mouse. These little time-saving features can really make your app faster and easier to use. When designing GUI apps, it can be very useful to get a non technical user to play with it at an early stage and see if it works the way they expect. You may be very surprised to find out that something that is very logical to you may not be so to the end user. The search screen (Figure 6) allows the user to select the search type (by first, last or company name) from the GTK-Combo widget, which is similar to a dropdown box on a web page. When the value in the combo widget changes, the cursor moves into the search box, thus saving another mouse click. The user then enters the search term, which is composed of one or more letters, tabs to the search button and hits the Enter key. All of our con-
October 2004
●
PHP Architect
●
www.phparch.com
tacts that match the search string are now shown in the list widget and, if we click on one of the results, we will be taken to the result details screen for that contact, which is shown in Figure 7. In it, we can see full details for our contact along with a call history, and we can choose to add a new note if required. The note addition screen (Figure 8) is pretty simple: just type a note and submit it; there is no need to add your username or the date, as the system will do that for you. The add contact screen (Figure 9) is also quite self explanatory. In our example, no validation is performed before the data is saved to the database—I’ll leave that to you as an exercise. You will notice that we also have a menu bar at the
Figure 7
Figure 8
50
FEATURE
PHP-GTK and the Glade GUI Builder
top of the screen, split in two sections: User and Directory. The user menu only has one item, “Log out”. When the user logs out, he will be taken back to the login screen. No other screens can be selected until the user has logged back in. The directory menu, on the other hand, consists of two entries: Search and Add User. After the user has logged in, they will be taken to the search screen by default; if they actually want to add a new user, they can use the ‘Add User’ menu item to get there. The Search menu option returns the user to the search screen. All the menu items also have keyboard shortcuts: ALT-o for logging out, ALT-s for searching and ALT-n for creating a new user. The GUI design. Now that we know what the interface should do and how it will look, we have to decide how to implement it. In GTK—just as in HTML—there are many ways to create a layout. Also like what happens with HTML, there is no right and wrong way to do things—it really comes down to experimentation and finding your favorite method. First of all, we need to make our window appear to have five screens. We will do this with the GtkNotebook widget. The Notebook is most often used to build tabbed user interfaces, which work (for example) like recent versions of Mozilla and allow more than one screen to be stored in a single window. In our case, however, we will turn off the tabs and select pages programmatically. This gives the impression of taking the user to a whole different screen while, in reality, GTK is doing all the work for us. Next, we need to decide how to lay out our widgets. Figure 9
October 2004
●
PHP Architect
●
www.phparch.com
GTK gives us several options, such as GtkFixed. When placed on a window, this widget allows us to position our widgets on a grid and know they will be where we placed them even if the user resizes the window. Whenever you’re dealing with a scenario that calls for a rigid disposition of all the elements in your layout, this can be very useful; however, if you prefer to have some of your widgets stretch as the window is resized, GtkFixed can be too restrictive. Another option is to use tables as we saw in the previous example; however, the method we will use takes advantage of two new widgets: GtkHbox and GtkVbox. The GtkHbox and GtkVbox widgets are similar to a onedimensional table; Vbox (vertical) allows you to split your page into a number of rows while Hbox (horizontal) will spit in into a number of columns. I find that the most flexible layouts can be achieved by using a combination of H and V boxes and nesting them as needed. Interface Basics For the purposes of our application, the screen is split into two portions: the menu and the GtkNotebook. Everything else just happens to be part of different pages of the notebook. Start a new Glade Project, create a window, change the window name to be directory_win, and the title to be Directory. Next, add the destroy signal, which will give you the handler name on_directory_win_destroy() Set the default width and height to be 600 by 400 pixels. Now, add a GtkVbox widget to the window, selecting two rows. Don’t forget—if you are not sure where the widget you want is, hovering the mouse over the icons on the palette will give you the names of each widget. Next, add a Menu bar to the top row and a GtkNotebook to the lower one, choosing five pages. If, at a later stage, we realize that we need an extra page, selecting the notebook in Glade’s Widget tree will allow us to add more pages. For now, leave the tabs turned on, as this makes things easier for development. We will only turn them off when we are sure we have finished the application. You will also want to set a border width of 10 for the notebook; this gives all of our pages a uniform border around irrespective of what we do on each individual page. Keep in mind that consistency is an important part of GUI design—badly laid out pages will appear to “jump around” as you switch between them and the user will feel a lot more comfortable if things are where they expect them to be. The “show border” property should
51
FEATURE
PHP-GTK and the Glade GUI Builder
be set to “no”; otherwise, our pages will have a box around them (the border will show while the tabs are turned on, no matter what you do). Do play about with these settings, though—the fact that I don’t want a box around my pages does not mean that you should necessarily do things the way I like to. In fact, every aspect of how the pages are styled is a matter of personal choice—and you should experiment to find your own preferences. As there will only be one GtkNotebook in the application, we will leave its name to its default value, notebook1. If we were to have several, a more distinctive name would, of course, save a lot of confusion later on in the game. Let’s now create the menus. Right-click on the menu bar at the top of the window and select ‘edit menus’ from the list that pops up. Click the Add button and the word item1 will show up in the left pane, which is essentially a graphical representation of our menu. On the right-hand side of the dialogue, change the label to be User; you can delete the entries from the Name and Handler text boxes, as this is essentially a heading and we don’t need to refer to it programmatically. Now, click Add again; this time change the label to Log Out and the name to logout. In our code, we will use a “camelBack” (also known as “studlyCaps”) naming convention for our functions and variables (i.e.: someFunctionName()), which is pretty common for Object-oriented code. However, as we will see, it is useful for using Glade functions to leave the default handler name with underscores. Our handler here is called on_logout_activate . If we accepted things as they are, the Log Out item will appear as a separate menu, which is not what we want—it should, instead, be added underneath User. In order to achieve this, make sure that Log Out is selected and then click the right arrow underneath it. This will right indent the item, which identifies it as part of the User menu. To add a keyboard shortcut (AALT-n), simply check the ALT checkbox in the bottom right, then type n into the text box. For our second menu, add a new item called Directory, then use the left arrow to un-indent it. This will make it a separate heading, to which we can add the following entries: Label: [Search] Name: [search] Handler: [on_search_activate] Shortcut: ALT-s Label: [New Contact] Name: [new_contact] Handler: [on_new_contact_activate] Shortcut: ALT-n
The Login Screen We can now move on to designing the login screen. October 2004
●
PHP Architect
●
www.phparch.com
What we want here is two text boxes—one for the username, the other for the password. Naturally, the password input box should show asterisks, not the actual password. We also need a button to submit the form. Whenever a user tries to log on, we first check that his username and password are correct. If they are, we make a note of who the user is in the application and take him to the second tab in the notebook, which will be the search screen. If the login is unsuccessful, we simply blank both text boxes and put the cursor back in the username field. In the real world, we would probably want to provide the user with an error message, and possibly limit the number of tries he is allowed before the application quits. I will leave these details to you. Select the first tab on the notebook and attach a Vbox to it with three rows. In the top two rows, add a corresponding number of Hbox widgets with two columns each and, in the bottom row, add an Hbox with one column. It might seem like a waste of time to add this last Hbox with only one column in it, but, as you will see later on, doing so gives us more control over layout. Now, in both of the top two rows, add a GtkLabel in the left-hand column and a GtkEntry in the right. In the third row add a GtkButton. Set the labels to read Username and Password, and name the GtkEntries entry_login_username and entry_login_password. You will have noticed that, to name widgets, I use the convention of widget type, screen name and widget name separated by underscores. When you add signal names to this for the callbacks, you end up with some pretty long function names. Personally, I live with it, as it stops confusion in the code when you have a lot of widgets, but, if you don’t like the idea of long names, feel free to come up with your own convention. Set the button’s label to Submit and its name to button_login_submit, then add the Clicked signal. The callback function will then be called on_button_login_submit_clicked(). See what I mean about long names? At least it is descriptive! Your window will now look like Figure 10. Not very attractive, and doesn’t look a lot like the example we saw earlier, so what went wrong? Well, nothing, really—we just need to understand a little more about how GTK lays out its widgets. There are many ways to tell GTK to lay things out when using the H- and V-boxes, and that’s what I like about them. Glade’s default is to tell GTK that each widget should take up as much space as it can—if you try resizing the window both smaller and larger, you will see what I mean. Even though this can be great, it’s not what we are looking for here. So, for starters, select each Hbox in turn using the widget tree and, in the Packing tab of the Properties window, set its expand property to no. If
52
FEATURE
PHP-GTK and the Glade GUI Builder
you now try resizing the window again, you will see what difference this has made. While you are at it, you may also want to name the H- and V-boxes; this isn’t strictly necessary, but it makes working with the widget tree a little easier when you have a lot of widgets. You can make up your own naming convention, but you will probably want to include the screen name in there somewhere. A good thing to do now would be putting a little space between the rows of the Vbox, so, selecting each one of the Hboxes in turn, set the Padding attribute in the properties window to 5. Do the same for the two GtkLabels (noticing what effect this has) and the button. Then, for both of the GtkEntries, set the expand property to no, and our window is visually finished. If you try resizing it now, you should notice quite a big difference. There is, incidentally, one last thing to do: right now, the password field will still show plain text, so selecting
●
PHP Architect
First Code Our application is an excellent opportunity to look at how OO code can be used to deal with PHP-GTK, so we will build it using a class. We will use a single class named directoryUI.class.PHP for the whole application and a main.PHP script to launch it. For now, we will look at the basic code to get the window up and running, to get us logged on and to then switch to the second tab of the notebook. Create a database called Directory and then create the following table (all table creation SQL is designed to work with MySQL, but it should be pretty simple to move this to another DBMS, like PostgreSQL, for example): CREATE TABLE `User` (
Listing 2: Continued...
Listing 2 1
53
FEATURE
PHP-GTK and the Glade GUI Builder
`user_uid` int(11) NOT NULL auto_increment, `username` varchar(20) NOT NULL default ‘’, `password` varchar(30) NOT NULL default ‘’, PRIMARY KEY (`user_uid`), );
In our simple system, the password will be stored in plain text, but for a production you should use some kind of encryption, for example the PASSWORD function in MySQL, or MD5 hashes. Insert a username and password of your choice in your newly-created database, then save your Glade work as directoryUI.glade and, in the same directory create a file, call it main.PHP and copy the following code into it.
Next, create another file, directoryUI.class.PHP, containing the code from Listing 2. There’s quite a lot going on, so first let’s see it in action and then look at the code in detail. For starters, you should change the login details in dbConnect() to match those of your database; you should consult the PEAR documentation if you don’t know how to do this. If you run main.PHP, your window should start up on the login screen and the cursor should be in the username field. Log in with the username and password you stored in your database and, when you click submit, the notebook should change to the second page. If you hit ALT-o, you will be logged out again and taken back to the login screen. If you mistype the password or username, the two entry fields will be blanked out and the cursor will again be placed in the username field. So, what’s going on in the code? Well, first of all we include the Pear::DB library—if, for some reason, you are unable to, you should make sure that it is installed and that the PEAR include path is set up correctly. In the constructor, we simply call a few other functions from our class. There isn’t a lot to say about dbConnect(), apart from the fact that it really should have more error checking. The windowSetup function, on the other hand, is a lot more interesting:
we also set up an instance variable to keep a reference to the application for later. The next six lines connect the signals; in PHP-GTK, we can connect signals using signal_autoconnect() as we saw earlier, or we can individually connect each widget’s signals. The only problem with signal_autoconnect() is that it only works in procedural code; there is no way to tell the function that the signals should be connected to a certain object. That leaves signal_connect(), but if we have a lot of signals in our application it will become pretty tedious to connect them all. Thus, since all of our signals start with the on_ prefix, we can simply get a list of all functions in our class that start with on_ and connect them programmatically in a foreach loop. Next, we get a handle of the notebook and again save it for later. $this->notebook = &this->window->get_widget(“notebook1”);
This is pretty important, as it shows how we can get a handle on any widget at anytime based on its name— one of the most important Glade functions. Next, the on_logon_activate() function retrieves handles to the two entry fields and blanks them out; note that they will already be blank when we start the application, but we will use this function whenever we need to reset the login screen, either because the user has logged out or because of a failed login attempt. Next, we place the cursor in the username entry field with the grab_focus() function and switch the notebook to the second page. We could do this without the selectPage() function, but I find it easier to give my notebook pages names rather than using their numeric identifier, as in select_page(1), which gets pretty confusing when you have a lot of pages and wreaks all sorts of havoc whenever you decide, midproject, to insert a new page somewhere in the middle of a notebook. You will notice that selectPage() uses the $pages array. Each time we add a new page, we just need to add an entry to this array and then we can select the page with
“Our application is an
excellent opportunity to look at how OO code can be used to deal with PHP-GTK...”
$this->window = &new GladeXML($this->gladeFileName);
This line parses the glade file and sets up our window;
October 2004
●
PHP Architect
●
www.phparch.com
$this->selectPage(“pageName”);
The last function called from the constructor is start(), which simply calls GTK::main(). Don’t put any further code in the constructor after this line, because it will not be executed until after the window is closed.
54
FEATURE
PHP-GTK and the Glade GUI Builder
The on_button_login_submit_clicked() function is pretty self explanatory: we get handles for the two entry widgets to get the text they contain, then call the database to see if the user supplied the correct credentials. If they did, we call on_search_activate() (which can also be called from the menu or by pressing ALT-s). If the username or password were incorrect, we call on_login_activate(), which clears the entry fields and puts the cursor in the username field again. Note the following line in the on_search_activate() function: if(!$this->user_uid) return;
This ensures that the user cannot get to the search screen from the menu without logging in. We now have a working application—admittedly, it doesn’t do a whole lot, but I’m sure you are starting to get the idea of how the rest of it will hang together.
“So, what is a contact management system? It’s really nothing more a glorified telephone directory...” The Search Screen Let’s now look at the search screen. First of all, we want a select box to choose whether to search by first, last or company name. Next, we can add a “search term” field with a submit button and a clist widget that will show our results. When we select a search type, we want the cursor to move to the search term box and, once we have results in the clist widget, we want to click on a row to select it and move to the results detail page. On the second tab of the notebook, place a Vbox with three rows. In the first row, place an Hbox with two columns. In the first column, add a label widget and set its label to be Type, then add a combo box in the second column. The combo box is actually a composite widget (a widget made up of multiple, simpler widgets—in this case, the drop-down selection list and an entry field). If you click on part of the widget that shows an arrow, you will select the drop-down selection list, while if you click on the white area you will select the entry widget. Call the entry widget entry_search_type and add the changed signal. Select the combo box; on the Widget tab of the properties window, you will find a text box called Items where you can add the items that we want October 2004
●
PHP Architect
●
www.phparch.com
to appear in the combo box. Add ‘Firstname’, ‘Surname’ and ‘Company’ here; we will later be able to get our hands on the combo’s entry box and simply read back what has been selected. One last thing: as we will be using the value selected as part of an SQL statement, we don’t want the user to be able to change what is in the entry field; therefore, set its editable property to no. In the second row, add an Hbox with three columns and then add the following widgets: • A label with the text search • A GtkEntry named entry_search_term • A button named button_search_submit with clicked signal enabled. On the third row, add an Hbox with one column and drop a clist named clist_search_result into it with four columns. A clist is a bit like a page from a Microsoft Excel spreadsheet, and it’s perfect for showing database results. When you add a clist to a window using Glade, it is automatically placed inside a scrolled pane, so that, if the data that needs to be displayed is too much for the size allocated to the list on the screen, the widget will just scroll as needed. Select the scrolled window in the widget tree and set its HPolicy and VPolicy properties to automatic; this will ensure that the scroll bars only show when needed. Our clist has four columns, but we are going to hide the first one, which will hold the user ID from the database row that it is showing, as the user doesn’t need to see this. We will set the titles on the remaining three columns to be Firstname, Surname and Company. We also want to add the select_row signal, which indicates that our on_clist_search_result_select_row() function will be called when the user clicks on a row in the clist. Next, set the padding property to 5 on each of the three HBoxes. On the first two rows, set the expand property to no. Leave it as yes for the HBox that contains the clist, since we want it to grow as the window is resized. Set the combo and the entry’s expand properties to no. Now we want our label widgets to be the same size and the text to line up, but the text length is not the same. If you select the label and look at the Common tab in the properties window, there are height and width properties, which are not normally used as the widget will take as much space as needed to show the text. In our case, however, we are dealing with a more stringent layout requirement, so tick the checkboxes for width and set the size to be 35 pixels; this way, everything will line up better. We also want to left align the text, which is done on the Widget tab of the properties window. The text will only align the way you want it to if the wrap text property is set to yes, so set this as well for both widgets. Finally, you’ll want to give them both
55
FEATURE
PHP-GTK and the Glade GUI Builder
a padding value of 10 and change the width of the entry widget so that it is the same width as the combo box—that’s 176 pixels on my system. Phew! That was quite a bit of work; your window should now look like the search screen we saw earlier. From now on, I won’t be telling you when to set layout properties, since you should be getting the hang of how to do it yourself by now—and a bit of experimenting is always a good thing. Search Screen Code Now that we have designed the search screen, we can proceed to actually coming up with the code needed to run it. You can start by creating the two database tables shown in Listing 3. I have put a small amount of data in there as well to get you started. Next, add the code from Listing 4 to your directoryUI.class.PHP file. This function is called whenever the value of the entry widget associated with the combo box changes because the user has selected an item. It simply retrieves a handle to the search term entry, clears it if contains text and shifts the application’s focus on it. In the search function shown in Listing 5, we receive the search term and the search type in input and use them in the SQL query as part of the where clause to select the rows that we want. We then clear the clist
widget to purge any previous results and append rows from our query. You must make sure that the amount of columns in your query result matches the column count in the corresponding clist, or you will get a fatal error. The contact_uid column in the list is not something the user needs to see, but is quite necessary for us, as we will need it when the user has selected a row to identify our corresponding database record. Therefore, we can use the GtkClist::set_column_visibility() function to hide it from view, so we add the following 2 lines to the windowSetup function: $clist = $this->window>get_widget(“clist_search_result”); $clist->set_column_visibility(0,0);
At this point you, should be able to start the application using the main.PHP script and search for contacts. Leaving the search term field empty will return all three of our results (in a larger application you may want to prevent the user from performing searches of this kind, as they can be quite intensive on your DBMS). You should be able to play with the filtering system by inputting different search terms to see what happens. Search Results Screen Take a look at the search results screen in the earlier
Listing 3 CREATE TABLE `Contact` ( `contact_uid` int(11) NOT NULL auto_increment, `firstname` varchar(20) NOT NULL default ‘’, `surname` varchar(20) NOT NULL default ‘’, `company` varchar(30) NOT NULL default ‘’, `job_title` varchar(20) default NULL, `department` varchar(20) default NULL, `telephone` varchar(11) default NULL, PRIMARY KEY (`contact_uid`) ) TYPE=MyISAM AUTO_INCREMENT=7 ;
INSERT INTO `Contact` VALUES (1, ‘Bart’, ‘simpson’, ‘n/a’, ‘operator’, ‘operation’, ‘123’); INSERT INTO `Contact` VALUES (2, ‘Marge’, ‘Simpson’, ‘n/a’, NULL, NULL, NULL); INSERT INTO `Contact` VALUES (3, ‘Homer’, ‘Simpson’, ‘Springfield Power’, ‘Console Operator’, ‘Safety’, ‘0123 34567’);
CREATE TABLE `Note` ( `note_uid` int(11) NOT NULL auto_increment, `contact_uid` int(11) default NULL, `user_uid` int(11) default NULL, `timestamp` timestamp(14) NOT NULL, `note` varchar(255) default NULL, PRIMARY KEY (`note_uid`) ) TYPE=MyISAM AUTO_INCREMENT=11 ;
INSERT INTO `Note` VALUES (9, 3, 1, 20040828093447, ‘Homer said he would not talk to me unless I sent him some donuts first.’); INSERT INTO `Note` VALUES (10, 3, 1, 20040828093528, ‘Homers colleague said he could not speak to me as he was taking a nap in the broom cupboard’);
Listing 4 function on_entry_search_type_changed(){ $search_entry = $this->window->get_widget(“entry_search_term”); $search_entry->set_text(“”); $search_entry->grab_focus(); }
October 2004
●
PHP Architect
●
www.phparch.com
56
FEATURE
PHP-GTK and the Glade GUI Builder
screen shot. The top section is again built with a clist, while the bottom section is a GtkText widget. The text widget will serve our purpose here, but I do recommend that you read the docs on it—several parts of its implementation are reported as broken and its use is deprecated in GTK2, so use it with caution. I am not going to go into detail about how to achieve the layout—as I mentioned earlier, you should now Listing 5 function on_button_search_submit_clicked(){ $type_widget = $this->window->get_widget(“entry_search_type”); $type = $type_widget->get_text(); $search_widget = $this->window>get_widget(“entry_search_term”); $search = $search_widget->get_text(); $query = sprintf(“ SELECT contact_uid ,firstname ,surname ,company FROM Directory.Contact WHERE %s LIKE ‘%s%%’” ,strtolower($type) ,$search ); $result = $this->db->query($query); $clist = $this->window->get_widget(“clist_search_result”); $clist->clear(); while ($row = $result->fetchRow()) { $clist->append($row);
have all the tools you need to come up with the same result (or even a better one) on your own, so I’ll leave that to you. However, here are a few hints that will hopefully be of assistance: • Set the height of the clist to be fixed to stop it from expanding as you resize the window. • Set the clist column count to two, name it clist_search_result_details and set its Show Titles property to no. • Name the button button_search_result_add_note, and add the clicked signal to it. • Name the text box text_search_result_note . The code for the on_clist_search_result_select_row function is shown in Listing 6. Firstly, we need to get the contact_uid from the row that the user selected. We store that value in an instance variable $this->contact_uid, since we will need it in other functions later on in the program. You may remember that contact_uid is in the first column of clist_search_result—the one that we hide whenever the search screen is brought up. To retrieve it, we start by getting a handle to the clist and use the following line of code to find the index of the selected row:
} }
$selected_row = $clist_search_result->selection[0];
Listing 6 function showNotes(){ $text_area = $this->window>get_widget(“text_search_result_note”); //clear any old notes from the widget $text_area->delete_text(0,-1); //get notes $query = sprintf(“ SELECT u.username ,date_format(timestamp,’%%d-%%m-%%Y %%H:%%i:%%s’) as
Depending on how you use the clist, it is possible to have more than one row selected, hence selection[0] will give us the index of the first and, in our case, only selected row. We then use the GTkClist::get_text(ROW, COLUMN) method to get the text from Column 0 of our selected row, which will contain the contact_uid of the selected person. With the contact ID in our possession, we fetch the selected person’s details from the database and display
date ,note FROM Directory.Note n Left join Directory.User u using(user_uid) WHERE contact_uid = %d ORDER BY note_uid desc” ,$this->contact_uid ); $result = $this->db->query($query); while($row = $result->fetchRow(DB_FETCHMODE_ASSOC)){ $notes .= sprintf(“%s %s\n%s\n\n———————————————————————— \n” ,$row[‘date’] ,$row[‘username’] ,$row[‘note’] );
Dynamic Web Pages www.dynamicwebpages.de sex could not be better | dynamic web pages - german php.node
} $text_area->insert_text($notes,0); }
October 2004
●
PHP Architect
●
www.phparch.com
news . scripts . tutorials . downloads . books . installation hints
57
FEATURE
PHP-GTK and the Glade GUI Builder
each field on a separate line of the clist. We must clear the contents of the list widget, or the new information will be appended to the old one each time we do a search—definitely not what we want. We then call our ShowNotes function from Listing 7 to format and display the user notes and select the search_result page. This process should be pretty much self-explanatory: we select all notes for this contact in reverse chronological order and format them by prefixing them with the date and time at which they were created and the username of the user who created the note. The only two text widgetspecific functions that we use here are delete_text() and insert_text() . We call delete_text() with the arguments 0 and -1, thus causing the widget to delete all text from position 0—the first character—to position -1, which is the last character. Different combinations of these values will allow you to manipulate the text in the widget in different ways—you should check the manual for more details. The inset_text() function, on the other hand, takes, as its two arguments, the text that should be shown inside it and the position at which it should be inserted, in this case 0 to indicate the start of the widget. In order to select this page, you will need to
add a search_result element to the $pages array. We will then need to add the on_button_search_result_add_note_clicked_func tion() function shown in Listing 8 to our code. This simply deletes any previous notes from the text widget and calls the add_note page. Don’t forget to add this to your $pages array or it won’t work. The Add Note Screen The add note screen is very simple. It contains a text area named text_add_note and a button named button_add_note_submit with the clicked signal enabled. Once again, the layout should be trivial at this point. As a little hint, I set the height of the text widget to a fixed value. The code for the function shown in Listing 9 is equally as straightforward. We insert the record into the database, call showNotes() and then s e l e c t P a g e ( “ s e a r c h _ r e s u l t ”) to go back to the search result screen, where our new note is now displayed. If you remember, earlier in our project we implemented the on_search_activate() function instead of calling the page directly. When we are on the notes screen, we can now return to the search screen either from the menu or from the keyboard using ALT-s, and this is possible because we connected the menus signal to on_search_activate().
“With a little practice, you could design and build the GUI part of our program in around an hour.”
Listing 8
Listing 7 function showNotes(){ $text_area = $this->window>get_widget(“text_search_result_note”); //clear any old notes from the widget $text_area->delete_text(0,-1); //get notes $query = sprintf(“ SELECT u.username ,date_format(timestamp,’%%d-%%m-%%Y %%H:%%i:%%s’) as date
function on_button_search_result_add_note_clicked(){ $text = $this->window->get_widget(“text_add_note”); $text->delete_text(0,-1); $this->selectPage(“add_note”); }
Listing 9 function on_button_add_note_submit_clicked(){ $text_note = $this->window->get_widget(“text_add_note”); $text = $text_note->get_chars(0,-1);
,note FROM Directory.Note n Left join Directory.User u using(user_uid) WHERE contact_uid = %d ORDER BY note_uid desc” ,$this->contact_uid ); $result = $this->db->query($query);
$query = sprintf(“ INSERT INTO Directory.Note ( contact_uid ,user_uid ,note ) VALUES ( %d ,%d ,’%s’ )” ,$this->contact_uid ,$this->user_uid ,$text ); $this->db->query($query);
while($row = $result->fetchRow(DB_FETCHMODE_ASSOC)){ $notes .= sprintf(“%s %s\n%s\n\n———————————————————————— \n” ,$row[‘date’] ,$row[‘username’] ,$row[‘note’] ); }
$this->showNotes(); $this->selectPage(“search_result”);
$text_area->insert_text($notes,0); }
October 2004
}
●
PHP Architect
●
www.phparch.com
58
FEATURE
PHP-GTK and the Glade GUI Builder
Adding a Contact Again, this is a pretty simple screen, both in layout and code implementation. You can see how the screen is laid out in Figure 9. I’m sure that, by now, you can work out how to make this layout work, so I’ll just run through the widget names and their signals. From top to bottom the entry widgets are: entry_new_contact_firstname entry_new_contact_surname entry_new_contact_company entry_new_contact_title entry_new_contact_department entry_new_contact_telephone
The submit button is called button_new_contact; like every button, we’ve dealt with in this article, it requires the clicked signal to. You can add add_entry to the $pages array, so that we can select the page and make it visible, and then add the two functions in Listing 10, which, at this point, should be easy for you to understand—all they do is retrieve the information from the various widgets and store it in the database.
Conclusion With a little practice, you could design and build the GUI part of our program in around an hour. Add a couple of hundred lines of code and you can see just how quickly you can build real desktop application using PHP, GTK and Glade. Naturally, you can check out the wiki and mailing lists on http://gtk.php.net for more information and help on how PHP-GTK works, but nothing beats a lot of experimentation (which, incidentally, can also be lots of fun).
About the Author
?>
Tony works for Ebuyer (UK) and is currently developing their warehouse and stock management systems using PHP-GTK. When he is not working, he enjoys playing guitars and golf. He and his wife Lee are expecting their first baby this month. He can be contacted at
[email protected]
To Discuss this article: http://forums.phparch.com/178
Listing 10 function on_button_new_contact_submit_clicked(){ $firstname = $this->window->get_widget(“entry_new_contact_firstname”); $surname = $this->window->get_widget(“entry_new_contact_surname”); $company = $this->window->get_widget(“entry_new_contact_company”); $job_title = $this->window->get_widget(“entry_new_contact_title”); $department = $this->window->get_widget(“entry_new_contact_department”); $telephone = $this->window->get_widget(“entry_new_contact_telephone”); $query = sprintf(“ INSERT INTO Directory.Contact ( firstname ,surname ,company ,department ,job_title ,telephone ) VALUES ( ‘%s’ ,’%s’ ,’%s’ ,’%s’ ,’%s’ ,’%s’ )” ,$firstname->get_text() ,$surname->get_text() ,$company->get_text() ,$department->get_text() ,$job_title->get_text() ,$telephone->get_text() ); $this->db->query($query); $firstname->set_text(“”); $surname->set_text(“”); $company->set_text(“”); $department->set_text(“”); $job_title->set_text(“”); $telephone->set_text(“”); $telephone->set_text(“”); $this->on_new_contact_activate(); } function on_new_contact_activate(){ if(!$this->user_uid) return; $widget = $this->window->get_widget(“entry_new_contact_firstname”); $widget->grab_focus(); $this->selectPage(“add_entry”); }
October 2004
●
PHP Architect
●
www.phparch.com
59
SECURITY CORNER
S E C U R I T Y
C O R N E R
Security Corner
File Uploads by Chris Shiflett Welcome to another edition of Security Corner. This month’s topic is file upload;, I will focus on the mechanism you create to allow users to upload files to your application. Unlike typical form data, forms are handled uniquely, and PHP uses the $_FILES array to provide you with all of the information you need. However, because it isn’t very clear what information the client provides and what information comes from PHP itself, a security-conscious developer can have a difficult time determining what data to trust. This article takes a detailed look at file uploads, beginning with a brief discussion that walks you through the process and some example code that implements this feature. This is followed by a close examination of the basic mechanics of file uploads, and then by a discussion of the security risks inherent in this activity, as well as some safeguards and best practices that you can implement in your applications.
I
n order to let users upload files, you must present them with a typical HTML form. However, because files are not sent in the same way that regular form data is, you must specify a particular encoding:
The enctype attribute is often left out, so you might not be familiar with it. An HTTP request that includes both regular form data and files has a special format, and this attribute is necessary for the browser’s compliance. The form field for a file is actually very simple:
October 2004
●
PHP Architect
●
www.phparch.com
This is rendered in various ways by the different browsers. Traditionally, the interface includes a standard text field as well as a browse button, so that the user can either enter the path to the file manually or browse for it. In Safari, only the browse option is available. Regardless, the behavior from a developer’s perspective is the same, but you might want to be mindful of the differences in presentation, in case you have very specific instructions for the user. To better illustrate a file upload, I present you with an example HTML form that can be used to allow users to upload attachments to a Web-based e-mail application:
Please choose a file to upload:
61
SECURITY CORNER
File Uploads
( [name] => phpa_09-2004.pdf [type] => application/pdf [tmp_name] => /tmp/phpz1A0zr [error] => 0 [size] => 2704632
You may want to include a special hidden variable named MAX_FILE_SIZE:
) )
This informs the browser of the maximum size (in bytes) allowed by the form. Of course, as with any client-side restriction, this is easily defeated by an attacker, but including this information can act as a guide for your legitimate users. In this example, files up to 1kB in size are allowed, and this needs to be enforced in the server-side data filtering. For my demonstration, I omit this restriction. Note: the PHP directive upload_max_file size also controls this behavior, and post_max_size can potentially restrict this as well—the smaller of the two will determine the maximum size allowed.
Figure 1 shows how this form is rendered in my browser, Firefox 1.0 PR running on Fedora Core 2. When the form is submitted, the HTTP request is sent to upload.php. To demonstrate what information is made available to you, upload.php does the following:
To experiment with my form, I choose to upload September’s issue of php|architect, which is a file on my computer named phpa_09-2004.pdf. Upon selecting this article and submitting the form, I get the following response: Array ( [attachment] => Array
This shows exactly what information PHP provides in the $_FILES superglobal array. However, what it doesn’t show is what information can be trusted. With a cursory glance, I suspect that name is provided by the client, but I’m unsure about the rest of the information. Multipart HTTP Request In order to clarify things, it is necessary for me to examine the HTTP request, because this shows us exactly what is sent by the client. Because the September issue of php|architect is a rather large file, I use something much smaller in this next demonstration. Using the same form, I can upload a file named attachment.txt with the following contents: Security Corner: File Uploads Chris Shiflett php|architect Oct 2004
When I upload this file, I see the following: Array ( [attachment] => Array ( [name] => attachment.txt [type] => text/plain [tmp_name] => /tmp/phpnXMOeW [error] => 0 [size] => 68 ) )
The HTTP request sent by my browser is as follows (some optional headers removed for brevity): POST /upload.php HTTP/1.1 Host: example.org Referer: http://example.org/attach.php
Figure 1
October 2004
●
PHP Architect
●
www.phparch.com
62
SECURITY CORNER Content-Type: multipart/form-data; boundary=————————— ————-11401160922046879112964562566 Content-Length: 298 ——————————————-11401160922046879112964562566 Content-Disposition: form-data; name=”attachment”; filename=”attachment.txt” Content-Type: text/plain Security Corner: File Uploads Chris Shiflett php|architect Oct 2004 ——————————————-11401160922046879112964562566—
It is not necessary to understand the format of this request, but it should be easy to spot the contents of the file and its associated metadata. The name attribute is the name of the form field given in attach.php, and the filename attribute is the name of the local file on the user’s computer. Based on this example request, it seems that only the name and type are potentially dangerous, because it’s difficult for an attacker to do much damage by changing the name of the form field itself, primarily because your code references this by name (that is. changing the name will usually result in the code accessing a variable that does not exist, and the only caveat is when the code loops through all form data). In order to better appreciate what an attacker can accomplish, use the code in Listing 1 to perform your own tests. You will need to change the domain name as well as make sure the /upload.php script exists on your server. If you can figure out a way to alter the tmp_name or size, you will have discovered a dangerous opening for an attacker. Practical Risks Surprisingly, there are very few additional risks associat-
File Uploads
ed with file uploads, although it is important to remember the risks associated with any form processing—you have no assurance as to the format or size of anything sent in each request. Validating the format of a file depends entirely on your specific application, and binary files present additional challenges. Although I do not discuss these approaches here, anti-virus software, file signatures, and the like can be used to help prevent certain malicious file types. Although these are blacklist approaches and fundamentally flawed, they may be your only option. The filename (aattachment.txt in my example) is provided by the client, and it should be filtered before being used in any capacity. If your requirements allow it, you can ignore this information completely and choose your own name. This eliminates this particular risk entirely. Theoretical Risks There are two elements of $_FILES for which you will probably want to implement additional safeguards: tmp_name and size. While I have been unable to uncover a specific exploit in my research, there are best practices available that prevent the theoretical attacks that involve the client being capable of manipulating this information. In order to be assured that the filename given in tmp_name is actually the file that was uploaded with the form (and not an arbitrary file given by the user, such as /etc/passwd), you can use is_uploaded_file() as follows:
October 2004
●
PHP Architect
●
www.phparch.com
63
SECURITY CORNER
File Uploads
{ /* $_FILES[‘attachment’][‘tmp_name’] is valid. */ } ?>
This is particularly important in situations where some or all of the contents of the uploaded file are displayed to the user (perhaps for verification). If you plan to simply move the temporary file to another location in the filesystem, PHP provides you with a function that first checks whether the file given is an uploaded file:
If you want to be sure of the file’s size, you can use a standard PHP filesystem function after verifying that the file is a valid uploaded file:
Until Next Time... It might seem unnecessary to validate data when your research does not reveal a method by which the client can modify it. This approach of adding safeguards that seem superfluous is known in the industry as defense in depth, and I strongly recommend it. Theoretical attacks have been known to materialize into real attacks, and you’ll be glad that you’re prepared. If you happen to find a way to modify the tmp_name or size attribute of a file (with register_globals disabled and at least version 4.1.0 of PHP), please let me know. You can now eliminate file uploads from your list of worries, and I hope that I’ve also been able to provide some clarity regarding the underlying mechanism. Until next month, be safe.
About the Author
Chris Shiflett is a frequent contributor to the PHP community and one of the leading security experts in the field. His solutions to security problems are often used as points of reference, and these solutions are showcased in his talks at conferences such as ApacheCon and the O’Reilly Open Source Convention, and in his articles in publications such as PHP Magazine and php|architect. “Security Corner,” his monthly column for php|architect, is the industry’s first and foremost PHP security column. Chris is the author of the HTTP Developer’s Handbook (Sams), a coauthor of the Zend PHP Certification Study Guide (Sams), and is currently writing PHP Security (O’Reilly). As a member of the Zend PHP Education Advisory Board, he is one of the authors of the Zend PHP Certification. He is also leading an effort to create a PHP community site at PHPCommunity.org. You can contact him at
[email protected] or visit his Web site at http://shiflett.org/.
Have you had your PHP today?
ER FF LO IA EC SP
Subscribe to the print
?>
http://www.phparch.com
edition and get a copy of Lumen's LightBulb--a $499 value absolutely FREE*!
In collaboration with:
NEW COMBO NOW AVAILABLE: PDF + PRINT The Magazine For PHP Professionals
* Offer valid until 12/31/2004 on the purchase of a 12-month print subscription
October 2004
●
PHP Architect
●
www.phparch.com
64
By John W. Holmes
Standard PHP Library (SPL) Functions I’m always on the lookout for new features and code snippets from PHP5 that I can share with you. The SPL is an extension included in the core PHP5 distribution that will make you rethink how you use objects, arrays and loops—except that, unfortunately, there isn’t a whole lot written on it yet. The extension (we’ll get to what it does in just a second) is maintained by Marcus Boerger and, while there is some documentation in the manual, a more in-depth version can be found at:
T I P S
&
T R I C K S
Tips & Tricks
www.php.net/~helly/php/ext/spl/ The SPL is a “collection of interfaces and classes that are meant to solve standard problems.” Its goal is to present a series of classes and interfaces that will promote a common coding standard and code reuse. With the current version of PHP5, the SPL mainly focuses on “iterators” and how you can use them with your objects. An iterator is “an object or routine for accessing items from a list, array or stream one at a time.” Using the iterator classes and interfaces of the SPL is going to allow you to create objects that behave like arrays. You’ll be able to define the methods for next(), current(), and so on to determine what’s returned as your object is iterated through. Your object can actually be iterating through the lines of a file, files in a directory, rows in a query result, data from a stream or anything else that you can access “one at a time” or loop through. Why would you want to treat an object as an array, you ask? You always hear people preaching about
“object oriented” programming and it seems like backpedaling to use arrays instead of the methods of your objects, right? Well, you’re still using your methods, but indirectly. You can almost think of it as an abstraction layer. An abstraction layer for databases allows you to use a common set of methods and properties to access a variety of databases. Iterators are going to allow you to use a common set of array functions (fforeach(), each(), next(), and so forth) to traverse through anything your object wants to interface with. If you’ve visited any PHP forums or the PHP-general mailing list, then you’ve probably noticed how people love to use file() to read the lines of a file into an array and then loop through the array and do whatever they need to. Listing 1 shows a short PHP script that uses this technique to simply echo each line of a file. The problem that you may encounter with this method, though, is that the entire file is loaded into memory as an array before it’s iterated through. You use more of your system resources the larger the file gets. So, how could our new knowledge of the SPL and iterators be used in this situation? To solve the memory issue mentioned above, you’ll want to create an iterator class that will only read and return one line at a time as you loop through the array. This way, you only use enough memory for one line instead of the entire file. In order to do this, we’re going to create our own class that will implement the Iterator interface defined in the SPL. The Iterator interface requires us to have a current(), key(), next(), rewind() and valid() method
“Take a look
through your code and consider turning those common array iterations into SPL iterators.”
October 2004
●
PHP Architect
●
www.phparch.com
65
TIPS & TRICKS in our class. The code in Listing 2 shows how such a class would be created. Each of the required interface methods are defined along with a __construct() method that is called when we instantiate an object from this class. We use the constructor to open the requested file. Looking in detail at the FileLineIterator class, you’ll see that it starts off by defining several private variables that will keep track of the opened file for us. The __construct() method is used to open the file passed when new FileLineIterator is called. The current() method is going to return the text of the current line that we’re reading—so long as the file pointer and valid flag are true. The key() method works along the same lines as current(), but returns the key of the line we’re reading. next() is where the meat of this simple class is at. Each call to next() will verify that the file pointer is valid and then try to read a line from the file using fgets(). If a line was successfully read, then the line number is incremented and the valid flag is set to true to indicate the current element is valid. The rewind() method resets the line number and then calls next() to ensure the first line is loaded from the file. Throughout the iteration of an object of this class, the valid() function is called to see if the current element is valid or not. Once next() fails to retrieve a line from the file, the valid flag is set to False so that the iteration will end. At the bottom of Listing 2 is an example of how this class is used. A [$file] object is instantiated from it and then looped through using foreach(). If you compare this part of the code to Listing 1, you’ll see that there is very little change for the same result, except that now we have abstracted where the actual data is coming from. Why is that abstraction important? Say, for instance, that you’re searching the file for lines that match a certain criteria, such as length. Since you want to ignore the other lines, you can just modify the next() method to test for length and, if your criteria is not met, retrieve the next element. Listing 3 shows an example of a modified next() method that will do this. Line length is just one example; you could also use a regular expression to retrieve lines matching a pattern, among many other things. You could implement this as a different class than the one above (extending it so you only redefine the one method). Alternatively, you could make the exclusion of lines dependent upon an extra flag passed when the object is created, such as $file = new FileLineIterator(‘test.txt’,’/abc[0-9]{3}/’);, where the second parameter can be a regular expression to match in lines returned. There are a few things to take note of as you examine this class and move on to create your own. The first is the sequence in which methods are called as the object is iterated through. To find out what this is, use October 2004
●
PHP Architect
●
www.phparch.com
your favourite debugger, or go back into each of the methods and add an echo statement to declare what method is being executed. function next() { echo ‘next ‘;
Now, run the example script again and take note what happens. The sequence followed by the script to retrieve the first element is rewind-> next-> valid-> current-> key. Note that the next is only in there
Listing 1
Listing 2 1
66
TIPS & TRICKS because we manually call it within the rewind() method. If that wasn’t done, once the sequence were to reach the current() method, there would not be any line data to return. Another thing to be aware of is that the key() method is called whether you request the key in the foreach() loop or not. Thus, even though you could use foreach($file as $line), the key() method is still called. One caveat, also, before you get too excited over this. While $file in our example can be used as an array in foreach(), it’s not an actual array. This means that not all array functions are going to work with $file—only those that deal with iterating over arrays. You cannot use array_search(), in_array(), and so on, for example. In fact, if you were to call var_dump() on $file after it’s created, you’ll see that it contains only the private declared variables. The array elements returned in foreach() only exist once the object is iterated over. Finally, you may be wondering why the file is read in the next() method instead of current(). Although it doesn’t happen in this example, the current() method can be called multiple times during an iteration. If we were to use fgets() within its scope, we’d advance the file pointer and then cause the following calls to current() within the same iteration to return bad values. This example just barely scratches the features and functionality in the iterator classes and interfaces in the SPL. In doing my research for this column, I came
across an article by Harry Fuecks that has a great introduction to the SPL at www.sitepoint.com/print/php5standard-library. Harry gives his definition of SPL and what it can do along with some good code examples that will help explain things more. A few examples from Marcus are also included in the PHP source in the ext/spl/examples/ directory, or you can view them in the online CVS at http://cvs.php.net/php-src/ext/spl/examples/. Marcus’ examples show you how to use the SPL to access the file system, databases, and other items in a variety of ways. The iteration classes and interfaces can be a powerful way for you to present objects to your programmers (or your own code) that have the familiar interaction characteristics of arrays but the flexible backend of object oriented programming. Take a look through your code and consider turning those common array iterations into SPL iterators. Submit Your PHP5 Tips and Tricks PHP5 has been out there for a while now. With all of the new features, there’s got to be some new tips or tricks out there that you want to share with everyone. Send your submission to
[email protected] and if it’s published, you’ll get a free issue of php|architect.
Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
About the Author
?>
John Holmes is a Captain in the U.S. Army and a freelance PHP and MySQL programmer. He has been programming in PHP for over 4 years and loves every minute of it. He is currently serving at Ft. Gordon, Georgia as a Company Commander with his wife and two sons.
Not Quite Random A quick correction to a tip that was published last month. Chris Cowell wrote me to point out that my random line selector wouldn’t be as random as it should be. Since the procedure was using fseek() to go to a random character in the file, longer lines with more characters would have a better chance of being selected. With my implementation of this procedure, all of the file’s lines had between 9 and 12 characters, so I don’t see this as too big of a deal. It may be to you, though, if your lines have a wide variety of lengths. Thank you to a loyal reader for pointing this out so that us professionals can always have the best code.
October 2004
●
PHP Architect
●
www.phparch.com
67
PHP and the Enterprise
e x i t ( 0 ) ;
by Andi Gutmans and Marco Tabini
Andi’s Braindump Although there are quite a few lighthouses of PHP usage in the enterprise, it still has not gained widespread adoption. CIOs tend to buy more into “mainstream” technologies, such as J2EE, even though they are not necessarily using nor do they require the features that these platforms offers. I don’t plan to do a side by side comparison with J2EE in this column, but very few J2EE users are actually using the full J2EE platform. The majority are using servlets and/or JSP and are not using “enterprise” features such as two-phase commits, messaging or Enterprise Java Beans. Actually, the majority of these applications would do not only very well with PHP, but they would probably do better, especially as far as development time and cost are concerned. The importance of enterprise adoption is clear. As it grows, the interest of the whole software industry in PHP grows. Additional growth means bigger investments in the technology itself, more demand for PHP developers and architects, improvements of the tools around the language, and October 2004
●
PHP Architect
●
better salaries. A few years ago, someone called PHP the web’s best kept secret. The reason was that, if you look at the Netcraft survey, PHP seems to be mainstream. However, when you look at the press and how much coverage and recognition PHP is actually getting, then the Netcraft numbers don’t look quite as good. This doesn’t mean that the situation isn’t improving—and I think the release of PHP 5 is definitely going to improve PHP’s standing in the world of development—but we still have a long way to go. So how do we convince the software industry that PHP developers aren’t mere “scripters,” or that our platform can be used for “businesscritical” applications? I think there isn’t a single “catch-all” answer to this question. There are many things that have to fall into place together. For example, PHP developers inside large organizations should continue evangelizing the technology within their companies. I have seen many scenarios in which PHP came out from the woodworks of a non-critical application and caught the attention of manage-
www.phparch.com
ment, who were impressed by the short development cycle and high performance, and were willing to hear more about it. Especially now that Linux has penetrated large companies and the psychological barrier of open-source has been broken in many places, events like these should become more and more commonplace. In addition, the need for PHPrelated case studies is big. Many companies look at one another when making decisions about technology. If you manage to bring PHP into your large organization, try and share it with the world. If you can get permission to do so, there’s nothing like letting the rest of the industry know about your experience. What problems you had, how you solved it, what the solution’s architecture looked like? There are definitely ways in which the backing of PHP by commercial companies can help, but I think that there’s a huge amount of work the PHP community can do on its own. I have only mentioned a few ways, but there are probably many more. I think it would be great if we’d see more talk of PHP in the
68
EXIT(0);
PHP and the Enterprise
leading software industry publications and I think we are not far from that point. Marco’s Perspective There’s something to be said for timeliness. Andi and I usually agree on a topic on which to write for our next exit(0) about a month before we have to turn in our articles—as good techies, we can write threethousand-words-long e-mails without thinking twice, but we fret whenever we have a deadline for an article because we’re afraid that we won’t make it, so a good month’s worth of writing time is a good thing to have. In the month that has passed since we decided that we’d be talking about the E-word, I’ve caught sight of a lot of discussion on the topic. For the most part, the pundits have been focusing on whether PHP is “enterprise ready.” My answer to that question is always another question: “what does that mean?” If the person treats me like a demented ape—because everybody should know what enterpriseready mean, shouldn’t they?—I know that my interlocutor has no idea of what he’s asking. What he wants to know, in fact, is whether there are enough companies that use PHP to make him feel safe that he’s making the right decision. (Incidentally, the funniest item I read while researching for this article is a posting on a Java forum where someone was asking his colleagues for “arguments against PHP.”) If enterprise-ready means whether PHP can provide the type of functionality, stability and performance that a large scale, business-critical application, then we all know that that’s the case—and that’s exactly where the problem lies: we know it, but making the “right” people know about it is the real challenge lies. If you look at the really well-known “PHP success stories” in the enterprise—like Yahoo!—they were most often brought upon by insiders who, like October 2004
●
PHP Architect
●
Andi suggests, championed the introduction of PHP in their organization by bringing its strengths to the attention of management. The largest problem that I think we, as a community, have right now in bridging the gap between the needs of the enterprise and the PHP world is not a technical one. If anything, the problem is that there isn’t enough documentation to easily make a business case for the adoption of our platform in a company. If you look at publications like Information Week, which is very popular among CIOs everywhere,
“So how do we convince the software industry that PHP developers aren’t mere “scripters,” or that our platform can be used for “business-critical” applications?” you’ll notice that the articles they publish are all about showing how companies around the globe have solved problems using tools offered by “enterprise-ready” software providers. Because many of these publications are controlled-circulation magazines (meaning that the publisher gives away the magazine for free to qualified readers), the articles that appear in them tend to tout the virtues of software whose developers are willing to pay for advertisements on their pages, not because the publisher is selling out, but because those companies already have an inroad to their editorial departments and their wellpaid marketing staff can provide them with plenty of ready-made case studies that they can publish. Nothing wrong with that—it’s just the way this particular industry
www.phparch.com
works—but you see how difficult it becomes to attract the attention of these publications when the strengths of PHP are competing against well-funded marketing departments with plenty of success stories of their own. Given the current situation, our best bet is to bring the companies that use PHP out of the closet. Many companies I’ve dealt with don’t feel that it’s their job to evangelize PHP, and they are probably right—however, most companies also don’t mind being on the receiving end of publicity, and case studies are always a great way to showcase their accomplishments to their board of directors, their investors and the world at large. I’ve heard people complain about the fact that corporate concerns are worried about the stigma associated with using open-source software, but that should be history by now. Case studies and business-oriented information is pretty high-up on my list of things that I would like to see more of. php|a doesn’t necessarily lend itself to this kind of material—we are a technical publication and I intend to keep things this way for the foreseeable future. However, I think that even purely technical people still need as much ammo as possible to further the adoption of PHP—be it within their own companies, or for their clients—and, therefore, a case study or two every now and then wouldn’t be quite that out of order. If you have a story about PHP adoption that you think would be worth publishing (be it on our pages or on our website), write to us at
[email protected]. I’ll make sure we pick up the discussion and see where it leads.
php|a
69