This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Zend Studio 3.0 is the official PHP IDE of php|cruise
We’ve got you covered, from port to sockets.
?>
php | Cruise
Port Canaveral • Coco Cay • Nassau
March 1st - March 5th 2004 Signup deadline: Feb 15, 2004 ENJOY LEARNING PHP IN A FUN AND EXCITING ENVIRONMENT—AND SAVE A BUNDLE! Features
Visit us at www.phparch.com/cruise for more details. Andrei Zmievski - Andrei's Regex Clinic, James Cox - XML for the Masses, Wez Furlong - Extending PHP, Stuart Herbert - Safe and Advanced Error Handling in PHP5, Peter James - mod_rewrite: From Zero to Hero, George Schlossnagle Profiling PHP, Ilia Alshanetsky - Programming Web Services, John Coggeshall Mastering PDFLib, Jason Sweat - Data Caching Techniques Plus: Stream socket programming, debugging techniques, writing high-performance code, data mining, PHP 101, safe and advanced error handling in PHP5, programming smarty, and much, much more!
php | Cruise
Conference Pass
$ 899.99**
Hotel
Included
Meals
Totals:
Traditional PHP Conference* $ 1,150.00
($ 400.00)
Included***
$ 899.99
($ 200.00)
$1,750.00
You Save $ 850 * Based on average of two major PHP conferences ** Based on interior stateroom, double occupancy *** Alcohol and carbonated beverages not included
TABLE OF CONTENTS
php|architect Departments
5
Features
9
Editorial
Write SMS Applications With PHP and Gnokii by Eric Persson
I N D E X
6
What’s New!
16 37
Offline Content Management with PHP-GTK
Product Review
by Morgan Tocker
SQLyog
23 58
Writing PHP Extensions: Managing Arrays
Product Review 2003 Quebec PHP Conference DVD by Marco Tabini
by Wez Furlong
28 61
65
Security Corner
The Need For Speed
by Chris Shiflett
Optimizing your PHP Applications by Ilia Alshanetsky
Tips & Tricks By John W. Holmes
41 Profiling PHP Applications by George Schlossnagle
68
exit(0); Why Can’t We All Just Get Along? By Marco Tabini
51 Caching Techniques for the PHP Developer by Bruno Pedro
February 2004
●
PHP Architect
●
www.phparch.com
3
You’ll never know what we’ll come up with next
! W E N
Existing subscribers can upgrade to the Print edition and save! Login to your account for more details.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Surface International Air Combo edition add-on (print + PDF edition)
$ 83.99 $111.99 $125.99 $ 14.00
CAD CAD CAD CAD
($59.99 ($79.99 ($89.99 ($10.00
US*) US*) US*) US)
Country: ___________________________________________ Payment type: VISA Mastercard
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly. **Offer available only in conjunction with the purchase of a print subscription.
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
E D I T O R I A L
R A N T S
W
elcome to the February 2004 issue of php|architect. As I write this, I'm sitting in my office—about forty degrees Celsius warmer than outside and, therefore, a much better place to work in that that the local park—suffering from an awful cold and sitting by a collection of (clean) tissues discreetly stashed on my desk, ready for use. As you can expect, I'm not particularly happy about either fact (make that three facts—the cold outside, the cold in my body, and the fact that I'm sitting in an office when I could really be somewhere else far away from anything that even remotely resembles a computer). Incidentally, with php|cruise coming at the beginning of March, I should hopefully be able to get rid of at least two problems—and I'm still working on finding a way to avoid computers during that trip. But I ramble—a clear sign that the cold medicine is wearing off. Let me instead tell you something about this month's issue. With the popularity that PHP enjoys nowadays comes the fact that it is used as the backbone of more and more high-traffic sites. A simple consequence of this is that an increasing number of developers are "hitting the wall" and finally feeling the limits of what the "let's just do it in PHP" approach can do. Building a website is always a high-wire balance of budgeting, respecting deadlines and writing the best code possible, but there's nothing quite as bad as finding out that the way you've done things is incapable of meeting the demands of your website—and, by the time you realize that you have a problem, it's usually too late to think about a solution short of calling your travel agent and inquiring about that non-extradition country you heard of. Therefore, this month we dedicate a fair amount of room to the performance management of PHP applications. George Schlossnagle's article—based on an excerpt from his latest book, published by SAMS—talks about profiling, a concept that I have very rarely seen associated with PHP applications. Profiling takes the guesswork out of understanding where the bottlenecks in your application are, allowing you to focus on finding the best possible resolution. The problem with profiling is that it only allows you to identify the problems and not solve them. Luckily, Ilia Alshanetsky and Bruno Pedro offer two other excellent articles on improving the performance of PHP without affecting the code itself (if you can, why not avoid the risk of introducing even more bugs?). While Ilia focuses on ways to make the PHP interpreter itself run faster, Bruno examines the topic of caching—both at the network and script level. This month we also start a new column—Security Corner—written by Chris Shiflett. The daily number of security advisories, patches, break-ins and source-code thefts that we see reported in the media every day has Continued on page 8... February 2004
●
PHP Architect
●
www.phparch.com
php|architect Volume III - Issue 2 February, 2004
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Authors Ilia Alshanetsky, Wez Furlong, John Holmes, Bruno Pedro, Eric Persson, George Schlossnagle, Chris Shiflett, Morgan Tocker php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
PHP 4.3.5 RC1 PHP.net has announced the release of PHP 4.3.5 RC1 PHP 4.3.5RC1 has been released for testing. This is the first release candidate and should have a very low number of problems and/or bugs. Nevertheless, please download and test it as much as possible on real-life applications to uncover any remaining issues. List of changes can be found in the NEWS file. For more information visit: http://qa.php.net/ PHP Community Logo Contest Following Chris Shiflett’s recent announcement of the PHP Community Site, he is holding a contest to find a logo that embodies the spirit of the PHP community. Everyone is welcome to participate, and you can submit as Many entries as you like. Please send all entries to [email protected] And include the name with which you want to be credited. The contest ends 29 Feb 2004, and php|architect is offering a free PDF subscription to the winner. For updated news about the contest, as well as a chance to view the current entries, visit: http://www.phpcommunity.org/logos/
ZEND Studio 3.0.2 Zend has announced the release of Zend Studio 3.0.2 client. What’s new? Zend.com lists some of the bug fixes as: • ZDE didn’t load when using a new keymap config from an older version. • Save As Project didn’t always work. • Server Center activator tried to open the wrong URL. • .js files were not opened with JavaScript highlighting. • Shift-Delete and Shift-Backspace didn’t work properly. • Find&Replace was very slow under Linux. • Add Comment sometimes erroneously commented out a line that wasn’t selected. • Added configurable limit for the number of displayed syntax errors There have also been improvements to the debugger, code completion, code analyzer, IE toolbar, and some Mac OSX changes. Get more information from Zend.com.
Good luck to all who enter!
February 2004
●
PHP Architect
●
www.phparch.com
6
NEW STUFF
MySQL Administrator MySQL.org announces: MySQL Administrator is a powerful new visual administration console that makes it significantly easier to administer your MySQL servers and gives you better visibility into how your databases
are operating. MySQL Administrator integrates database management and maintenance into a single, seamless environment, with a clear and intuitive graphical user interface. Now you can easily perform all the command line operations visually, including configuring servers, administering users, dynamically monitoring database health, and more Get more information from: http://www.mysql.com/products/administrator/index.html
Looking for a new PHP Extension? Check out some of the latest offerings from PECL.
Check out some of the hottest new releases from PEAR.
opendirectory 0.2.2 Open Directory is a directory service architecture whose programming interface provides a centralized way for applications and services to retrieve information stored in directories. The Open Directory architecture consists of the DirectoryServices daemon, which receives Open Directory client API calls and sends them to the appropriate Open Directory plug-in.
DB 1.6.0 RC4 DB is a database abstraction layer providing:
statgrab 0.1 libstatgrab is a library that provides a common interface for retrieving a variety of system statistics on a number of *NIX like systems. This extension allows you to call the functions made available by libstatgrab library. Sasl 0.1.0 SASL is the Simple Authentication and Security Layer (as defined by RFC 2222). It provides a system for adding plugable authenticating support to connection-based protocols. The SASL Extension for PHP makes the Cyrus SASL library functions available to PHP. It aims to provide a 1-to-1 wrapper around the SASL library to provide the greatest amount of implementation flexibility. To that end, it is possible to build both a client-side and serverside SASL implementation entirely in PHP. SQLLite 1.0.2 SQLite is a C library that implements an embeddable SQL database engine. Programs that link with the SQLite library can have SQL database access without running a separate RDBMS process. This extension allows you to access SQLite databases from within PHP. Windows binary available from: http://snaps.php.net/win32/PECL_STABLE/p hp_sqlite.dll
February 2004
●
PHP Architect
●
www.phparch.com
• an OO-style query API • a DSN (data source name) format for specifying database servers • prepare/execute (bind) emulation for databases that don’t support it natively • a result object for each query response • Compatible with PHP4 and PHP 5 • much more…. DB layers itself on top of PHP’s existing database extensions. The currently supported extensions are: dbase, fbsql, interbase, informix, msql, mssql, mysql, mysqli, oci8, odbc, pgsql, sqlite and sybase (DB style interfaces to LDAP servers and MS ADO (using COM) are also avaible from a separate package). System_ProcWatch 0.4 With this package, you can monitor running processes based upon an XML configuration file, XML string, INI file or an array where you define patterns, conditions and actions. Net_IMAP 0.7 Provides an implementation of the IMAP4Rev1 protocol using PEAR’s Net_Socket and the optional Auth_SASL class. XML_Beautifier 1.1 XML_Beautifier will add indentation and line breaks to you XML files, replace all entities, format your comments and makes your document easier to read. You can influence the way your document is beautified with several options.
7
NEW STUFF
PHPWeather 2.2.1 PHP Weather announces the release of version 2.2.1. PHP Weather makes it easy to show the current weather on your webpage. All you need is a local airport, that makes some special weather reports called METARs. The reports are updated once or twice an hour. Get more information from : http://sourceforge.net/projects/phpweather/
PHPEclipse Debugger PHP Eclipse adds PHP support to the Eclipse IDE Framework. This snapshot introduces the first version of the PHPEclipse debugger plugin. For more information visit: http://www.phpeclipse.de
MySQL and Zend Working Together From Zend and MySQL – These two have Joined Forces to Strengthen Open Source Web Development MySQL AB, developer of the world’s most popular open source database, and Zend Technologies, designers of the PHP Web scripting engine, today announced a partnership to simplify and improve productivity in developing and deploying Web applications with open source technologies. Through the alliance, the companies are improving
compatibility and integration between the MySQL database and Zend’s PHP products to make it easier for businesses to use complete open source solutions, such as the popular LAMP (Linux, Apache, MySQL and PHP) software stack. As part of the partnership, MySQL AB and Zend are offering partner products to their respective customers, enabling easier product procurement and deployment for Web application infrastructures. The companies will also commit development resources to design product integration and compatibility modules for both vendors’ platforms. For more information visit: www.zend.com SAXY 0.3 SAXY is a Simple API for XML (SAX) XML parser for PHP 4. It is lightweight, fast, and modeled on the methods of the Expat parser for compatibility. The primary goal of SAXY is to provide PHP developers with an alternative to Expat that is written purely in PHP. Since SAXY is not an extension, it should run on any Web hosting platform with PHP 4 and above installed. This release allows CDATASection tags to be preserved, rather than converted to Text Nodes. For more information visit: http://www.engageinteractive.com/saxy/
php|a
Editorial: Contiuned from page 5 convinced us that, at the very least, one should be able to protect his sites from malicious usage, in the hope that all the other companies we rely on to maintain their software will do so in a serious way. Finally, we bring you three more articles that, we hope, will tickle your fancy. The first one, written by Eric Persson, shows you how you can build an SMS gateway using PHP and a few other inexpensive components. SMS is not yet very popular here in North America, but, judging from the amount of people I see glued to their cell phones whenever I visit my native Italy, it is very widely used in Europe. In his article on offline news management, Morgan Tocker writes about how PHP-GTK, that most hidden of PHP gems, can be used to improve content management by providing a proper GUI application that doesn't require you to completely rewrite all your code. Finally—last but not least-Wez Furlong picks up where his article from last month left off and delves into the deep bowels of the Zend Engine to show you how a PHP extension written in C can manipulate PHP arrays— it's not quite as easy as from a script... but close enough once you know what you're doing. Well, that's it for this month. By the time I will be writing my next editorial, I plan to be either boasting about my suntan or complaining about sunburn. Either way, you can expect me to report on our adventure on the high seas—until then, happy reading!
February 2004
●
PHP Architect
●
www.phparch.com
8
Write SMS Applications With PHP and Gnokii
F E A T U R E
by Eric Persson
SMS-shorthand for Short Message Service-is the standard used by cellular phone networks worldwide to allow their customers to exchange small text messages using their handsets. Despite its limitations, SMS is very popular with cell phone users-and it has rapidly become a widely-used bridge between the Internet and mobile users.
espite the fact that it sounds like some mysterious Italian pasta, Gnokii is really just a project aimed to develop tools and drivers for Nokia mobile phones-that is, software that makes it possible to control a Nokia phone physically connected to your server via a serial port. Gnokii works like the Nokia Data Suite, which is shipped with more advanced models from Nokia: you can use it to send SMS messages, edit contacts and so on—pretty much everything you normally do with your thumb on the phone's keypad. Gnokii itself is composed of many tools, including a set of GUI applications that facilitate the remote operation of the telephone; we are really only interested in a small subset of these tools called smsd, or SMS daemon, which provides an interface for rapid access to the phone's SMS capabilities. With the SMS daemon up and running, we can use PHP to interact with the phone, send and receive SMS messages and, of course, build whatever logic we need based on the content of the messages that we receive and send. In short, my goal with this article is to show you how to configure software and hardware so that you can get the same kind of service as you would normally obtain from a big company selling mobile services like SMS gateways— but at a fraction of the price.
three major components:
Major Components of the Final Application The final application that we will create throughout this article is a simple SMS server that awaits a message from a user and acts on its contents. It is made up of
• A Nokia cell phone, which must be connected properly to the server. • The smsd application from the Gnokii package, which must, of course, be compiled and configured correctly. • The PHP scripts that provide the actual server functionality. The flow of the application will be as follows: • The user send an SMS message to the server. • The smsd daemon picks it up and automatically puts it into its database. • Our server scans smsd's database periodically for new messages. • When a new message arrives, its contents are examined and the server acts on them, for example by replying to the user with another message.
REQUIREMENTS
9
FEATURE
Write SMS Applications With PHP and Gnokii
Hardware needed When it comes to cellular communications, the bad thing about hardware is that it often costs a lot of money, but the goal of this project is precisely to provide a low-cost alternative, so the expenses associated with it should be quite reasonable. What you'll need in terms of hardware is a Nokia phone and a serial cable to hook it up to your server. I will, of course, expect that you already have a server and that it is capable of running the Gnokii tools and PHP. In my environment, I have used a Nokia 3310, which is quite new but not very expensive, and works perfectly for my needs. There are no "official" connection cables available for the 3310, but a company from the UK called Cellsavers (http://www.cellsavers.co.uk) have come up with a very ingenious serial cable with a connector that you can fit behind the battery on the phone. For those who don't know, there are 4 metal pins that are probably used by Nokia to install software and perform other programming on to the phone, and those nice folks at Cellsavers managed to figure out how to use them to control the phone through a serial port. There might be other companies supplying the same type of product, but I have not seen any around. Another important note about the hardware is that you will need to get a battery charger for the phone. One often comes with the package, and you can plug it in and leave the phone on forever without having to worry about the batteries. Installing Gnokii and smsd Before starting to install Gnokii and smsd, make sure you have MySQL installed and working properly on your server. Installing Gnokii is quite straightforward—it involves little more than the usual configure-make-make install steps. However, there are some configuration options that I find important. The first might be a matter of taste, but I like to place everything belonging to Gnokii in /usr/local/Gnokii. Therefore, I will use --prefix=/usr/local/Gnokii when invoking it. Next, the --without-x configuration switch indicates that we will not need to use the xgnokii GUI application to send SMS messages and manage the phone. If you want to take a look at the graphical tools, you can of course skip this parameter, but on a Unix server where you normally do not have Xwindows installed you'll get a whole lot of errors if you do so. The last parameter is --enable-security, which turns on a lot of security-related features in the package, like the ability to change the PIN number. I find them useful, so I usually turn them on. The resulting configure line will be as follows: ./configure --prefix=/usr/local/Gnokii --without-x --enable-security
February 2004
●
PHP Architect
●
www.phparch.com
Listing 1 1 2 /* 3 4 Sms parsing utility for gnokiis smsd database. 5 LISTING 1 6 */ 7 error_reporting(E_ALL); 8 9 $CONFIG = array(); 10 $CONFIG[‘keywords_directory’] = ‘./keywords/’; 11 $CONFIG[‘default_email’] = ‘eric@persson.tm’; 12 $CONFIG[‘database_username’] = ‘root’; 13 $CONFIG[‘database_password’] = ‘’; 14 $CONFIG[‘database_hostname’] = ‘localhost’; 15 $CONFIG[‘database_database’] = ‘sms’; 16 17 18 /* 19 Error function, will handle an error in desired way. 20 Maybe add some notification functionality to notify an admin? 21 */ 22 function return_error($error=’’){ 23 24 echo date(‘Y-m-d H:i:s’).’: ‘.$error; 25 exit(); 26 } 27 28 29 /* 30 Read through the keywords directory and gather filenames and keywords, 31 that should be match by a message. 32 */ 33 function read_keywords(){ 34 global $CONFIG; 35 36 37 $keywords = array(); 38 39 $dh = opendir($CONFIG[‘keywords_directory’]); 40 if( $dh ){ 41 while( $filename = readdir($dh) ){ 42 if( ereg(‘^([a-z0-9_]*).php$’, $filename, $match) ){ 43 $keywords[$match[1]] = $CONFIG[‘keywords_directory’].$filename; 44 echo date(‘Y-m-d H:i:s’).’: ‘.$match[1].chr(10); 45 } 46 } 47 48 if( sizeof($keywords)==0 ) 49 return_error(‘Keyword directory was empty.’); 50 else 51 return $keywords; 52 53 }else{ 54 return_error(‘Keyword directory could not be opened.’); 55 } 56 57 } 58 59 /* 60 This function will be executed if a message arrives that 61 doesnt match any keyword. 62 */ 63 function default_action($message, $sender){ 64 global $CONFIG; 65 66 #mail($CONFIG[‘default_email’], ‘Unhandled sms’, ‘Message:’.$message.”\n”.’Sender:’.$sender); 67 echo ‘DEFAULT ACTION’.chr(10); 68 } 69 70 /* 71 This function takes the message and the sender as arguments 72 and then performs an ereg match for each message. 73 */ 74 function match_message($message, $sender){ 75 global $keywords; 76 77 $match_message = ereg_replace(‘([^a-z0-9]*)’, ‘’, strtolower($message)); 78 reset($keywords); 79 while( list($keyword, $phpfile)=each($keywords) ){
Continued on page 11
10
FEATURE
Write SMS Applications With PHP and Gnokii
Listing 1: Continued 81 82 include_once($phpfile); 83 if( function_exists($keyword) ) 84 $keyword($message, $sender); 85 86 return true; 87 } 88 } 89 90 default_action($message, $sender); 91 return false; 92 } 93 94 /* 95 A faster message match process, its not as nice to user mistakes as the one 96 above, but its an option, if speed/processor time is important. 97 */ 98 function match_message_fast($message, $sender){ 99 global $keywords; 100 101 if( strpos($message, ‘ ‘)>0 ) 102 $match_part = substr($message, 0, strpos($message, ‘ ‘)); 103 else 104 $match_part = $message; 105 106 $match_part = trim(strtolower($match_part)); 107 108 if( isset($keywords[$match_part]) ){ 109 include_once($keywords[$match_part]); 110 if( function_exists($match_part) ) 111 $match_part($message, $sender); 112 return true; 113 } 114 115 default_action($message, $sender); 116 return false; 117 118 } 119 120 /* 121 Just for speed measurements. 122 */ 123 function diff_microtime($microtime, $end_microtime){ 124 125 $start_seconds = substr($microtime, strpos($microtime, ‘ ‘), strlen($microtime)); 126 $start_fraction = substr($microtime, 0, strpos($microtime, ‘ ‘)); 127 $end_seconds = substr($end_microtime, strpos($microtime, ‘ ‘), strlen($microtime)); 128 $end_fraction = substr($end_microtime, 0, strpos($microtime, ‘ ‘)); 129 130 return sprintf(‘%1.3fs’, (($end_seconds$start_seconds)+$end_fraction-$start_fraction)); 131 132 } 133 134 135 /* Connect to the mysql database */ 136 $connection = mysql_connect($CONFIG[‘database_hostname’], $CONFIG[‘database_username’], $CONFIG[‘database_password’]); 137 138 /* Check if the connection was successful, otherwise return an error. */ 139 if( !$connection ) 140 return_error(‘Could not connect to mysql at ‘.$CONFIG[‘database_hostname’]); 141 142 /* Select the database that contains the smsd tables */ 143 mysql_select_db($CONFIG[‘database_database’], $connection) or return_error(‘Could not select database’); 144 145 146 /* Call the read_keywords() function to get all keywords currently available. */ 147 $keywords = read_keywords(); 148 149 /* Reversesort the keywords by the arrays key element. */ 150 krsort($keywords); 151 152 /* 153 Application main loop. 154 We will loop 100 times and then finish this script, to prevent memoryclogging. 155 */ 156 for( $i=0; $i<10000000; $i++ ){ 157 $microtime = microtime(); // speed measurements
February 2004
●
PHP Architect
●
Once you've downloaded the Gnokii tarball from latest version at the time of this writing is Gnokii-0.5.5—you can decompress it and start the compilation process: http://www.Gnokii.org—the
www.phparch.com
# gzip -dc Gnokii-0.5.5.tar.gz | tar -xof # cd Gnokii-0.5.5 # ./configure --prefix=/usr/local/Gnokii --without-x --enable-security # make # make install
If you don't encounter any major problems during the building process, you will end up having your copy of Gnokii installed in /usr/local/Gnokii. Before we test the phone interface, however, we still need to create the /etc/gnokiirc file, which holds some important configuration options, like information on where the phone is connected to and which model it is. My /etc/gnokiirc file looks like this: [global] port = /dev/ttyS1 model = 3310 initlength = default connection = serial bindir = /usr/local/Gnokii/sbin/
Make sure that you have connected your phone to the correct serial port as you specified in the configuration. Also, check the model of your phone and enter it accordingly. The initlength variable controls the number of characters sent to the phone during initialization; you don't normally want to change this setting—unless you have problems with the connection, I suggest that you use the default value (at least initially). The connection variable should be set to serial, since we'll be connecting to the phone using the serial port. In case you're wondering, it's possible to configure it to use an infrared connection instead. Now, it's time to test it all and see if everything works fine. A good starting point here is to try and send out an SMS message using Gnokii: Listing 1: Continued 158 159 $res = mysql_query(‘SELECT * FROM inbox WHERE processed=0 ORDER BY insertdate ASC’); 160 while( $sms = mysql_fetch_array($res) ){ 161 162 /* Decide which matching function you wish to use. */ 163 match_message($sms[‘text’], $sms[‘number’]); 164 //match_message_fast($sms[‘text’], $sms[‘number’]); 165 166 mysql_query(‘UPDATE inbox SET processed=1 WHERE id=’.$sms[‘id’]); 167 } 168 169 /* Enable this for some performance statistics. 170 $end_microtime = microtime(); 171 echo “time:”.diff_microtime($microtime, $end_microtime).”\n”; 172 */ 173 174 sleep(1); 175 } 176 177 ?>
11
FEATURE
Write SMS Applications With PHP and Gnokii
# cd /usr/local/Gnokii/ # bin/Gnokii --sendsms "+xxxxxxxxxx" Gnokii Version 0.5.5 Please enter SMS text. End your input with : Success Regards, Eric Send succeeded!
Clearly, you will need to replace the xxxxxxxxx above with a real, working phone number that you can test for the message to arrive (you could, in fact, use the same number as the cell phone you're using to send the message). If you don't receive the message, or if you get an error, you may want to step back and look at the configuration and build procedure once again, just to make sure that you haven't missed anything. The next step consists of configuring smsd so that we can send messages out onto the network programmatically. It's obviously important to have Gnokii working first, since smsd relies on the same runtime configuration libraries. The smsd source code is located in the /smsd/ folder under the directory where you unpacked the Gnokii tarball. Smsd can work either with a database or with a filesystem but, for the purposes of this article, we will only focus on configuring it to use MySQL. The daemon is not compiled by default when you compile Gnokii, so that will have to be our next step. You will need to manually edit the Makefile and change every instance of the path to the MySQL installation in the DB Modules section. Next, you can build the executables: # make # make libmysql.so # make install
Setting Up the smsd Database Since we want to use smsd with MySQL, we need to creListing 2 1 2 /* 3 Example keyword file 4 LISTING 2 5 */ 6 7 function hello($message, $sender){ 8 9 $return_message = ‘Hello to you my friend!’; 10 /* Output a log message */ 11 echo date(‘Y-m-d H:i:s’).’: Answeared message “‘.$message.’” from “‘.$sender.’” with “‘.$return_message.’”’.chr(10); 12 13 /* Send the reply to the sender */ 14 mysql_query(‘INSERT INTO outbox SET number=”’.$sender.’”, text=”’.$return_message.’”, processed_date=”0”, insertdate=now(), error=0, dreport=0’); 15 16 } 17 18
February 2004
●
PHP Architect
●
www.phparch.com
ate a database for it to use. For simplicity's sake, we'll call it sms and grant a new MySQL user with login sms and password sms access to it. Naturally, if you move into a production environment where security is a concern, you may want to use a more secure username/password combination. Keep in mind that anyone who can access your sms database can insert rows into the outbox and therefore send messages from the connected phone. On a larger system, the possibility for abuse is certainly there—and therefore security is worth at least some consideration. In the smsd directory of your tarball, you will also find a SQL file called sms.tables.mysql.sql that contains the table definitions needed to run the daemon. All you need to do is import these into your database and you are all set to go. There is also a file for those that prefer PostgreSQL, but we will focus on MySQL here. Installing daemontools The daemontools package is a collection of tools that can be used to monitor and manage UNIX-based services. Its installation procedure is quite straightforward, since there aren't too many options or configuration directives. The only thing to keep in mind is that some differences in newer versions of glibc (2.3.1 and above) may require you to patch the daemontools source before you try to compile it. The patch you need is called the "errno-patch" and fixes an incompatible declaration of the errno variable made in the source. I've seen some people claim that this problem is caused by bad programming practices, but the error really only started popping up when changes were made to glibc, so I'm not too sure as to how true that is. Whatever the real reason, if you encounter this problem, simply patch the source and you'll be just fine. If you need to download the patch, you can get it from http://www.qmail.org/moni.csi.hu/pub/glibc-2.3.1/. Then, follow the daemontools installation instructions, which you can find at http://cr.yp.to/daemontools/install.html. If you're not familiar with patching software, this is done by downloading the software, extracting it, and then using the patch program to affect the actual changes in the source code. More information about the errno patching process and daemontools can be found at http://www.qmail.org/moni.csi.hu/pub/glibc2.3.1/INSTRUCTIONS but, generally speaking, you can get away with something like this: # tar zxvf daemontools-0.76.tar.gz # cd admin/daemontools-0.76 # patch -p1 /path/to/daemontools-0.76.errno.patch
On to Some PHP Our main PHP script will be as small and efficient as possible, since it will be running as a daemon on our server all the time. Its main task will be to check if there are new messages in the SMS inbox table and, if so,
12
FEATURE
Write SMS Applications With PHP and Gnokii
match them against the possible keywords that we have created, so that the appropriate action can be taken. We'll call this script smsparse.php. First of all, let's decide how we're going to structure our application. Since our main goal is to respond to certain keywords, we'll start by creating a few "keyword scripts", which are really nothing more than standalone PHP files stored in a subdirectory called keywords. For example, if we wanted to define a keyword called hello, our directory structure would like this: ./keywords/ ./keywords/hello.php ./smsparse.php
not as tolerant but will save some CPU resources by using a faster algorithm. The difference won't probably be dramatic, but on a heavily loaded server or with a large list of keywords it may well have an impact on the overall performance of the system. In match_message() (lines 74-92), the message is first cleaned of unwanted characters such as non-alphanumeric values and spaces, and converted to lowercase. Next, the function cycles through all the keywords and performs an ereg() match against the "clean" version of the message. If a match occurs, the PHP file corresponding to the keyword is included and executed. The match_message_fast function, on the other hand, works by taking the first word in the message and converting it to lowercase. The word is then used to perform a search in the keyword array and, if a match is found, the appropriate PHP file is included and executed.
As you can see, each keyword has its own PHP file. We simply use the keyword as the filename for the script that contains the actions associated with it in order to simplify the entire process. Let's now have a look at smsparse.php, which you can see in Listing 1. At the beginning of the script (in the Writing Keyword Scripts read_keywords function), we read through the contents Since keyword scripts are an idea I came up specifically of the keyword directory. Each file for this article, it's probably a good idea found in the directory is matched to discuss them a little. Essentially, a against the ereg() pattern on line 41 keyword script simply contains code and, if that operation is successful, a that determines what happens when a “The daemontools new item with the keyword as a key keyword is matched. To make it possiand the file's path as the value is added package is a ble for multiple scripts to coexist, the to the array that the function returns actual functionality is stored in a funccollection of tools at the end of its execution. tion that has the same name as the keythat can be used to As you can see, on line 148 we sort word that corresponds to a particular the array in a descending fashion monitor and manscript. based on the length of each key. We Let's assume, for example, that we age UNIX-based do this so that longer keywords are want to match the word "hello" at the checked for first and we don't end up services.” beginning of a message and reply with in a situation where a word like "eat" is an SMS of our own. In this, case, we'd matched instead of "Seattle" because it have to write a PHP script, called is shorter. hello.php, similar to the one shown in The main portion of the application works by execut- Listing 2. As you can see, the file contains a function, called ing a loop indefinitely. At every iteration, we check if a hello(), that accepts the incoming message and the new message has arrived in the inbox and, if that is the sender's phone number as arguments. case, match the message against the active keywords Sending a reply to the sender through SMS is a simple that we have identified at the beginning of the script, process—all we need to do is add a row to the outbox and finally sleep for 1 second before the next cycle. table of the sms database. The SMS daemon will periodFor the actual matching process, I have written two ically poll the database for new outgoing messages and alternatives that use different approaches. The first one, send them automatically. match_message(), is the most fault tolerant but also the slowest one. The second one, match_message_fast(), is Figure 1 776 805 819 3009 820 23813 818 837
Your Own PHP Daemon: Using daemontools The last step in our quest consists of setting up our PHP script to run as a daemon. You could, in theory, simply run the script and detach it from the console, but if you're running a proper server, a more robust configuration is required—and this is where the daemontools package comes into place. The configuration of daemontools is a bit complicated compared to the other packages we have seen in this article because it involves a relatively large number of files and directories. However, once one realizes that there is method to the madness, it's not quite so bad. Given the amount of space alloted for this article, I will leave it up to you to get daemontoolsup and running— the documentation is very clear and there are plenty of resources for this purpose on the Net. When daemontools is installed it creates a directory called /service. This will contain information on all the various services that daemontools is running; a program called supervise monitors the /service directory and takes care of starting and keeping the services running as needed. Compared to "normal daemons", which are started at boot time, daemontools services are started by supervise and, if any of them is killed or dies unexpectedly, supervise itself takes care of restarting them again automatically. Therefore, daemontools is an excellent solution if you want your services to be running all the time and be monitored for failures of any kind. However, not all services are suitable to run with this package—they have to behave in a certain manner that makes it possible for supervise to interact with them in an automated fashion. Luckily, most applications can be modified so that they can be compatible with supervise, and our smsparser script is no exception. First of all, we must ensure that the script can be run without having to explicitly invoke the PHP interpreter. Under a UNIX shell, this is done by introducing a "shebang", that is, a special command at the beginning of the file that tells the shell interpreter which application the script should be piped through in order for it to be executed. Let's start by figuring out where PHP is installed:
smsparse in the /usr/local/smsparse/ directory, where I will assume that you have stored the smsparse.php script and its underlying directory structure with all the keyword scripts. We will call the directory supervisesmsparse: # mkdir -p /usr/local/smsparse/supervise-smsparse/
We'll use the service directory to store all the information that supervise needs to run smsparse correctly. Next, we will focus on getting smsparse running. In the /usr/local/smsparse/supervise-smsparse/ directory, create an executable text file called run that contains the following two lines of shell commands: #!/bin/sh exec /usr/local/bin/php -q /usr/local/smsparse/smsparse.php
That's it! If we now create a symlink from the /service directory to our newly created folder, supervise will automatically take care of starting and monitoring our server: # ln -s /usr/local/smsparse/supervise-smsparse/ /service/supervise-smsparse/
We will do the same for smsd, so that we can have supervise monitor the Gnokii daemon process as well. As described earlier, we installed Gnokii in /usr/local/Gnokii/ and, therefore, the smsd binary will reside in /usr/local/Gnokii/bin/smsd. As with out server, we will create a subdirectory to house the execution files for smsd: # mkdir -p /usr/local/Gnokii/supervise-smsd/
Next, we'll write a new run file: #!/bin/sh exec /usr/local/Gnokii/bin/smsd -u sms -p sms -d sms -m mysql
Finally, to start the smsd run file, we link the supervise-smsd directory into /service with: # ln -s /usr/local/Gnokii/supervise-smsd/ /service/supervise-smsd/
If you now check your process list, you should see your smsparse and smsd processes listed-that is, if you have done everything right:
# whereis php # ps axf
On my machine, a RedHat 8 server, the commands outputs the following:
This means that I have the PHP interpreter's binary installed in /usr/local/bin/php. It's now time to create a service directory for our service. We'll start by creating a "service" directory for
February 2004
●
PHP Architect
●
www.phparch.com
Keeping Tabs on the Situation Now that we have turned our smsparse script into a true daemon, we could use some logging capabilities so that we can diagnose any problems properly should anything go wrong. As part of the daemontools package, you will
14
FEATURE
Write SMS Applications With PHP and Gnokii
find a small program, called multilog, that is capable of logging the output of a service directly to a set of automatically-rotated logfiles. This means that, if we set up our service settings properly, we won't even need to write any special code for the purpose of creating activity logs! To enable the logging functionality, start out by creating a log directory in supervise-smsparse: # mkdir -p /usr/local/smsparse/supervise-smsparse/log
The logging process acts much like a normal process running under supervise. It needs its own directory and run file; therefore, we need to create a special run /usr/local/smsparse/supervisefile at smsparse/log/run that contains the following commands: #!/bin/sh exec multilog t ./main
Multilog supports a wide range of arguments, which, in turn, make it possible to create very complex logging rules. Our command line above, however, is quite simple and really just means "add a timestamp on each line, and store the logfiles in ./main". The t argument represents the number of Temps Atomique International (TAI) seconds since 1970-01-01 00:00:10 TAI. As you might remember from Listing 1, we prepend a date('Y-m-d H:i:s') string before each line is outputted and, therefore, we will actually have double timestamps in the log file (naturally, you can modify the script to omit its timestamp, or change the multilog instantiation to do the same). We don't need to link the log directory directly from /service. The supervise program will execute the runfile it contains automatically for us. However, you must restart supervise to make it aware of the new log directory. You can, once again, use the svc program to send a TERM signal to the service:
human readable form by piping the tail output through tai64nlocal like this: # tail /service/supervise-smsparse/log/main/current | \ tai64nlocal 2004-01-07 15:57:27.380601500 2004-01-07 15:55:46: Starting sms parser... 2004-01-07 15:57:27.380605500 2004-01-07 15:55:46: hello 2004-01-07 15:57:27.380607500 2004-01-07 15:55:46: success
Conclusion The easiest way to test your new Gnokii setup is to grab another cell phone and send an SMS message containing the word "Hello" to your Gnokii phone. If all goes well, smsparse will pick it up and reply back with the message we entered in the hello keyword script. As you have probably by now realized, it's not that hard to set up a mobile service through which you can exchange information with your users by utilizing SMS. Even if you're not in the business of running SMS gateways, you could use it for a variety of other activities. For example, you can use it to provide "fun" services, like interactive voting, or a useful server monitoring interface for your internal network. The list of possibilities is very long—and my clients have shown great interest in using SMS as a complement to other services. If you're worried about scalability, this solution may not be for you, as it will have trouble handling a very large number of messages on a daily basis. However, it is so inexpensive that it could well be a good starting point for a more serious implementation. The good news is that you'll be able to stay with Gnokii even if your needs grow, as newer versions of the package are slated to support multiple phones.
# svc -t /service/supervise-smsparse/
A new look at the process list (see Figure 1) will show you that smsparse has been started again, together with the logging process. This means that our services are now managed by supervise and will run indefinitely-all the while providing us with a nice logfile, which we can monitor by using the tail utility: # tail /service/supervise-smsparse/log/main/current @400000003ffc1dfc22cb14e4 2004-01-07 15:54:05: Starting sms parser... @400000003ffc1dfc22cb2484 2004-01-07 15:54:05: hello @400000003ffc1dfc22cb2c54 2004-01-07 15:54:05: success
This example shows that smsparser was started correctly, and 2 keywords where found, hello and success. As you can see, the TAI timestamp at the beginning of each line is a bit cryptic, but it can be translated into a February 2004
●
PHP Architect
●
www.phparch.com
About the Author
?>
When Eric's not out skiing or hiking, he's working as a freelance developer on various projects. His current focus is finishing his education in open-air alpine environments.
To Discuss this article: http://forums.phparch.com/126
15
Offline Content Management with PHP-GTK
F E A T U R E
by Morgan Tocker
Over the years, I have had the opportunity to work on a few content management systems for websites of varying complexity. While each CMS is a little different from the others, I can’t help but think that sometimes I find myself performing the same hacks and workarounds over and over just to get around the limitations of HTML. The desired output of the majority of our PHP work must be web based—but management of the content doesn’t have to be.
W
elcome to the world of PHP-GTK. Why introduce GTK to a largely web-based language? Well, convenience and portability come to mind, for example. Sometimes it's not feasible to write a Java Swing interface when you've invested so much time in your PHP classes, as you need to rewrite large portions of code. While it could be done, you'd have to fork your code in two projects, and use two different languages. That's not something you can easily convince many clients to do. Content management—a very common task for most websites these days—represents a typical example of an activity that is often performed directly through the web but that could really be best served by a "true" GUI-based client application. In most circumstances, creating a separate application is an expensive proposition, due to the duplication of code involved, the additional expertise needed and the difficulty of using a language that will run properly on a wide variety of platforms. In this article, we'll tackle porting an existing HTML-based news manager to PHP-GTK-and you'll see how easy it is to make the jump from Web to GUI with this powerful, if often neglected, platform. In creating our project, we'll start with a data abstraction layer and a traditional HTML interface that we'll ditch later on. This article gets a little complex-so as a prerequisite please install PHP-GTK, and create a table in mysql with the schema shown in Listing 1. An SQL dump with a few sample rows of data can be found in the files for this article—it's always great to have some sample data to work with.
February 2004
●
PHP Architect
●
www.phparch.com
The Data Abstraction Layer As a general rule, I create a data abstraction layer for every complex project I work on. Some people swear by this approach, others swear at it. My personal praise goes to abstraction layers because I can do things like automatically change the modified date of a record without remembering to do it in each instance of SQL code. An abstraction layer can also validate data and check the credentials of the person trying to perform changes in a multi-user situation. Consider the code in Listing 2, which represents a simple data abstraction for a news item. Once you have this example up and running, you can test creating a row in the database with the code from listing 3. As you can see, once the abstraction layer is established, we don't even have to worry about embedding SQL statements in our code. Listing 1 +--------- +-------------- +------+-----+--------- +---------------- + | Field | Type | Null | Key | Default | Extra | +--------- +-------------- +------+-----+--------- +---------------- + | id | int(11) | | PRI | NULL | auto_ increment | | author | varchar(64) | | MUL | | | | story | text | | | | | | created | int(10) | YES | | NULL | | | modified| int(10) | YES | | NUL L | | | subject | varchar(255) | YES | | NULL | | +--------- +-------------- +------+-----+--------- +---------------- +
REQUIREMENTS PHP: 4.1+ (4.3 or greater recommended) OS: Windows, Linux Applications: PHP-GTK, MySQL Code: http://code.phparch.com/20/4 Code Directory: gtk-cms
16
FEATURE
Offline Content Management with PHP-GTK
An HTML-based News Manager Listings 4 through 6 provide the basis for a very simple news management system based entirely on the web. Listing 4 (iindex.php) is the home page of the system, which creates a list of all the news available in the database. Listing 5 (eedit.php) provides the necessary interface for editing the news items and Listing 6 (ssave.php) takes care of saving our changes to the database. Although this example works well, there are a few problems with it. First of all, we have no data integrity. For example, the author "Morgan Tocker" is probably the same as the author "Morgan J Tocker" and "M. Tocker". But if I wanted to compile a list of authors (SSELECT distinct(author) FROM news WHERE visible = '1';), it might well contain each of the three individual authors that were just mentioned, since we are allowing each user to enter his or her name every time a
news item is created or edited. Another problem is the handling of whitespace in the author's name. 'This ' does not equal 'this' and ' this ' does not equal ' this'. Got it? Don't laugh—it happens. In an eternal struggle to keep data clean, we can use trim() to zap off the unwanted whitespace, or use a HTML <select> to solve the typos in our first example. This would work, but it comes with another limitation: we couldn't easily add more authors to the list. You could add a field called "other author", or write a bit of JavaScript with an item called "Other.." on the list, whereby an onchange() event would prompt the user for the name of the new author, and then recreate the list dynamically. What I'd actually like to see here, however, is a combo field. A combo box is neither a textfield or a select box—it's actually both of them at the same
id.” \n”; /* Set some properties for the news item */ $news->set_property(‘author’, ‘Morgan Tocker’); $news->set_property(‘subject’, ‘An article by Morgan’); $news->set_property(‘visible’, ‘1’); $news->set_property(‘story’, ‘This is the body of my message’); ?>
$value) $this->$field = $value; } else { // Insert a dummy entry into the database mysql_query(“INSERT into article (created, modified) VALUES (UNIX_TIMESTAMP(), UNIX_TIMESTAMP())”); $this->id = mysql_insert_id(); } } function set_property($property, $value) { // set a $property to value in the database // Note - this mechanism causes n(fields) database queries // to save data. There are alternatives mysql_query(“UPDATE article SET $property = ‘$value’, modified = UNIX_TIMESTAMP() WHERE id = ‘“.$this->id.”’”); $this->$property = $value; } } ?>
February 2004
●
PHP Architect
●
www.phparch.com
17
FEATURE
Offline Content Management with PHP-GTK
time—and it's a blessing (or a curse if you prefer) to all modern operating systems that someone left it out of the HTML 4.0 specification. Getting Your Feet Wet With GTK Since the kind of functionality that we want cannot be provided by a web browser (at least not without a massive amount of custom work), we'll have to turn elsewhere—and that's where PHP-GTK comes into play. Our PHP-GTK application actually provides a "true" GUI to our news management system, and works on a different machine from that of the webserver. The core of the application is shown in Listing 7. As you can see, the PHP-GTK version of the news manager is a bit more complex than the plain-HTML one, although the length of the script is quite deceptive, since the functionality of the three scripts that made up the previous application has now been incorporated into a single one. At the core, however, the application is extremely simple. Essentially, we create a set of GTK objects, and connect them to various handlers, which, in turn, are automatically called by the system when a specific event takes place—such as, for example, the user clicking on a button. Figure 1 shows you the application
running on a Linux system. The PHP-GTK application requires a copy of data.php, which was our Listing 2, so, if you update your class library, be sure to copy it over to your PHP-GTK application. Naturally, this is a great aspect of writing all your applications with the same language, since you're able to happily recycle your code as many times as you want, and you can run it on a variety of platforms. There is a configuration option in our data.php which chooses the MySQL server to connect to. In the web server's case, it's probably localhost. In the case of the PHP-GTK application, however, you will probably be connecting to the database remotely and, therefore, you should enter the IP or hostname of your server. Now that the application is running, notice how the combo box used for the author's name makes the application easier to use. Rather than having to build additional pages or cumbersome Javascript-based solutions, we can rely on the combo box to allow the user to either choose an existing author or create a new one through a single control. Remembering Data I'm an Apple Cocoa programmer, and Cocoa applications feature a concept called "defaults". A default is
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
18
FEATURE
Offline Content Management with PHP-GTK
basically the PHP equivalent to a session that never expires. It's a variable that you can set, and will remain available to you indefinitely, even if you shut down the application and launch it agagin. Defaults can be really handy for settings and preferences, although they are not quite as easy to implement in a PHP-GTK application as they are in Cocoa. Luckily, I've written a PHP script to store this data, so you won't have to. It creates a file called $SCRIPT_NAME.session, where it stores default information. When you first install (or execute) the application, be sure to create this file in advance with the proper permissions, so that no error will be output even if the user under which the script is running does not have write access to the folder where the defaults file resides. To tap into the features of defaults, you'll need to add the following line to the beginning of your file:
Creating a default is the same as creating a session. The GTK application can store data in the $_SESSION super global, and the same data will be available on relaunch. The following is an example:
If you look at the source for defaults.php (Listing 8), you will notice that it really works by setting itself up as a custom session handler that simply saves the information to a file. As you can see, the code is very simple, and explaining how custom session handlers work is beyond the scope of this article. You can, however, refer to Sean Coates' excellent article on this topic in the January 2004 issue of php|a. Making the GTK-APP work offline. Now that we have a GUI-based application that doesn't require a browser and a web server to run, the next step would be to make it independent of the database as well, so that you can use it as a completely "offline" application that can be run even when no connectivity is available. Listing 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$value) { if (get_magic_quotes_gpc()) $news->set_property($field, $value); else $news->set_property($field, addslashes($value)); }
We're 90% there already. All we really have to do is build a proper system of caching and check to make sure no changes have occurred since our last update. There are two generally accepted ways of performing this last operation: • Checking if the data has changed from the data we grabbed. • Checking to see if the timestamp or the last-modified date is more recent than the timestamp from when we grabbed the record. For our application, I am going to select the second of these choices, given that it's easier to compare timestamps than it is to compare content, particularly if there's a lot of it. However, keep in mind that timestamps are always going to be based on the local machine's clock and, without the database acting as a broker to determine absolute time, it's possible that your content will de-synchronize, thus causing unwanted inconsistencies. Here's how we'll be performing our
up-to-date checks: modified <= $database_copy->modified) { // Provide a warning - our copy is out of date } else { // you may update safely } ?>
Caching Content Since we cannot store the information in the database, we need a means to cache our information until we can synchronize it. Given that they provide a persistent offline storage mechanism, defaults seem to be the perfect choice here. We are going to cache each of the objects for later retrieval by adding an update_cache() method to our data.php class, which you can see in Listing 9. For example, to check if we have a cache for record ID 6, we can see if it's an object:
Listing 8
To make the synchronization process faster, we could
$subject) { $tmp = new news($id); // You may even store an object in a default $_SESSION[?record?][$id] = $tmp; } } ? } ?>
also only accept cached data that is less than 72 hours old as good without making the roundtrip to the database to check whether it has changed. modified + (3600*72)) { // we have recent cache for 6 } ?>
In this case, however, you really want to make sure that your time is properly synchronized with the MySQL server—you may choose to get your current time by executing a SELECT UNIX_TIMESTAMP() on the database server. Before we write the data back to the database, we will have to check to see that no changes have occurred
20
FEATURE
Offline Content Management with PHP-GTK
while the application was working offline. If there were changes, we will need to display a proper warning—for example by showing a dialogue box. Where to go from here In order for the application to be more versatile, you may want to integrate it with the equivalent of an "Outbox", where changes to content are written to, but no updates take place straight away. The outbox will just be another array of records saved in your defaults— very similar to a cache but organized in a different way that makes it easier to catch and revise updates before they take place. A good news management system could work similarly to the way most mail clients work, with the potential to work both online and offline depending on whether a connection to the database is available. Once this mechanism is in place, you can take advantage of the application's layout to add more functionality, such as workflow management. For example, if your environment calls for the approval of news items before they are published, you could manage the entire flow of operations through a series of "drop boxes" where each item is deposited by users with the proper credentials. Another possible improvement would be to include the possibility of marking certain changes or new news items as "drafts", so that you can save them (without publishing them on to the database) and work on them
later. Finally, the editing method is very basic and would be much more effective, particularly for non-technical users, if it were based on a more advanced interface. Interestingly enough, PHP-GTK also supports Scintilla, a very advanced open-source component that plugs into GTK to provide extended editing capabilities (once you download it from http://www.scintilla.org/, you can compile it into your version of PHP-GTK with ./configure -enable-scintilla -enable-gtkhtm). By working a Scintilla component into your system, you could make the editing process much easier for your users.
About the Author
?>
Morgan Tocker is a freelance developer living and working in Brisbane, Australia. His consultancy business, www.icedotblue.com, is responsible for all sorts of php hacks.
To Discuss this article: http://forums.phparch.com/127
Tips for Writing Applications with PHP-GTK Error Checking The lifespan of your typical PHP-GTK application is usually longer than that of its web-based counterparts. It will have to keep running for several hours, with functions being called over and over again. For a GTK application, you may find that you will want to manage your error handling, and check the integrity of your variables frequently. While you should be doing this with web-based applications, too, there is less of an opportunity for laziness in GTK. For example, I had a problem with an earlier version of PHP-GTK where the incorrect data seemed to be returned intermittently – and my application crashed and burned. In going through it with a fine-tooth comb I checked the integrity of data at a few points and, if it didn’t return the expected results, I either tried again or produced a ‘nicer’ error. In Summary, it’s a good idea to check that an item is still an array/object/integer (or whatever it was supposed to be) and that it is not empty/null. Personally, I look forward to the release of PHP 5 and exception handling, when GTK & PHP can be taken to the next level and it will become easier to tackle these issues. Portability, Recycling, and Reusing Another good idea is to try and store the important parts of your code nested in function calls, as opposed to using the traditional linear approach. Keeping in mind the way callbacks work, you will find it easier to work with both a web-based and a GTK version of the same application if they both use OOP techniques. Finally, try to separate your code from your desired output, so that you can create a file like data.php and share it between the two without the need to branch your code.
February 2004
●
PHP Architect
●
www.phparch.com
21
Can’t stop thinking about PHP? Write for us! Visit us at http://www.phparch.com/writeforus.php
Writing PHP Extensions: Managing Arrays by Wez Furlong
F E A T U R E
As we saw last time, writing PHP extensions in C isn't quite as difficult as you might think. In this issue, we're going to dive into the hash API and use it to traverse arrays and fetch values from them.
I
n the last issue, we talked a little about the Zend Engine internals and how they relate to writing an extension, about how to create an extension skeleton using the ext_skel tool, how to write extension functions and access their parameters (using a scanf() style function), how to return simple types (like strings and integers) and how to build up a PHP array. We covered a fair amount of ground, but there are still plenty more things to learn about PHP extension writing. In this issue, we're going to look at arrays again and see how it is possible to build multi-dimensional arrays and how to traverse the elements of, or look-up a particular value from an array. Multi-Dimensional Arrays As we saw last time, PHP arrays are implemented using hash-tables. This approach allows indexing the array using a string or integer key to fetch its values. Since a hash-table is not a native C type, fetching its values is not quite as simple as with native C arrays. On top of that, the Zend Engine has no built-in support for multidimensional arrays—they are simply implemented by storing another array in the appropriate slot of the hash-table. This can be a difficult or daunting prospect for the budding extension author, especially considering the state of the internals documentation, even though it is actually quite simple to implement. For our first example, let's create a two dimensional array where the first dimension contains a list of first names and the second dimension a list of surnames. If you're not sure what I mean, Listing 1 contains the PHP script equivalent for the C code in Listing 2. The con-
February 2004
●
PHP Architect
●
www.phparch.com
tents of Listing 1 should be self-explanatory, so let's take a look through Listing 2 now, line by line. Lines 1 through 5 declare a C-style 2D array. The two sets of square brackets tell the compiler that it has two dimensions; the first dimension has 3 slots, while the second dimension has 2 slots. These correspond to the 3 sets of first and last names that we are going to use to initialize our PHP array. Lines 7 and 8 are comments describing the prototype for the function. Hopefully you will recall that these comments, although they have no effect on the code itself, are an important coding convention that helps to remind you how the function is intended to be used. Line 9 uses the PHP_FUNCTION macro to declare the actual PHP function. Lines 11 and 12 declare some temporary variables—ii will represent the person whose name we are adding, and j will indicate if we are looking at their first or last name. The tmparray variable, as its name implies, will act as temporary storage for the array we create for each person. Line 14 initializes the PHP function's return_value as an array, and then we begin a loop on line 16 which will step through each person in our names array, using the variable i as the counter. For
REQUIREMENTS PHP: 4.3+ OS: N/A Other Software: Working PHP source and compiler environment Code: http://code.phparch.com/20/2 Code Directory: extensions
23
FEATURE
Writing PHP Extensions: Managing Arrays
each person, we allocate a PHP variable using the MAKE_STD_ZVAL() macro, we set it up as an array (lines 17 and 18), and then we step through each of their names and add them as string elements to our temporary array (lines 19 to 21). Having prepared our "person" array, we need to add it to our "people" array—the return value for the function (line 24). The code should be fairly simple to follow, although you might be wondering about two things in particular. The first thing you might ask is whether you should (or should not) worry about freeing the temporary array value. In this case you should not free it—we "gave" it to the Zend Engine when we used add_next_index_zval(), and the engine will take care of freeing it at the appropriate time. If we were to free it ourselves, we would cause a crash some time later in the script that would be difficult to track down. The other question you might be asking is whether we need to return something from the function. The answer is no—the C function prototype is declared as a void function, so it has nothing to return in the usual sense. Instead, PHP passes us a return_value variable that we populate—it is this variable that will be passed back into your PHP script when the function returns. Since the first thing we are doing is setting up the return_value, we don't need to do anything special after the loops that populate it and, therefore, we simply "fall out" of the bottom of the function. As you can see, building a multi-dimensional array is Listing 1 1 2 3 4 5 6 7 8 9 10
not that hard. Although my example is quite succinct, the same principle can be used to build PHP arrays with any number of dimensions—you simply create a new intermediate array to hold the contents of the dimension you want to add, and then add it. You're not limited to strings for the values either—you can use any valid zval value (integers, real numbers, strings, resources and boolean values, or even resources if you want to). Now that you are have mastered returning multidimension arrays, how about looking at working with multidimensional arrays that have been passed into your function? Getting Stuff Out of Arrays There are two things that you will typically want to do with an array that has been passed to your function— either you want to look up a specific keyed value and do something with it, or you want to step through all values and do something with each of them. We'll deal with the first of these now. So far, we've used some really convenient macros to add items to arrays—these macros insulate us from the not-so-pretty guts of the hash table implementation. However, we've now reached a point where we must step beyond these macros—because there are no macros for fetching an item from an array. Before we delve in, it's worth thinking for a minute about how you use arrays in your PHP scripts. Imagine that you have a PHP script that accepts a couple of $_GET parameters—name and age—and displays them on some kind of e-card. Let's also pretend that the age parameter is optional-the e-card will happily display something good regardless of whether the age parameter is passed or not. PHP (being the nice flexible thing that it is), will allow you to access the age parameter using $_GET['age'] syntax, even if it is not there (the value returned to your script will be NULL in that case and, at worst, the interpreter will print out a warning message to indicate that the element does not exist). If you are slightly more strict with your code, you might first want to check that the age value is present by using isset() and then take a different course of action. This is a simple validation of input parameters and, while PHP allows you to be a lazy script coder, it doesn't allow you to be a lazy extension author—you must check if an element is present before you access it, since the NULL you get back from the hash API is the kind that causes a crash if you don't handle it properly. With that in mind, take a look at Listing 3, which represents our hypothetical e-card generating function. The idea is that you pass an array of values to the function, and it will pull out the name and age. Lines 1 to 3 are the usual prototype comments and the PHP_FUNCTION declaration. Next, we declare a vari-
24
FEATURE
Writing PHP Extensions: Managing Arrays
able to point to the array passed in as the parameter to string itself. So, at runtime, you need to call strlen() the function on line 5. This is the same as the way that and add one to the result to arrive at the correct length we declared the temporary array variable from the last for a key. example. Line 6 declares two variables to hold the You might be wondering about the string (void**) name and age values—they are declared as zval ** cast on the last parameter to zend_hash_find()—it is because the hash table stores zval * and returns a just there to keep the compiler from issuing an incorpointer to its storage address. This allows you to modi- rect warning. Remember that this function wants to fy its stored value if you wish, but you don't want to do return a pointer to its storage for the element? In C, a something like that unless you are really confident in generic pointer to something has the type void *, and your abilities—in my experience, it's better to just stick when you want to return a value by reference in C, you to using the main API functions. add an extra asterisk, so the type becomes void **. The next thing is fetching the array parameter using Since we are dealing with data that is already a pointzend_parse_parameters(). The "a" format code indi- er, we have an extra level of indirection that makes our cates that we want an array value; we are storing it into fourth parameter appear to be a void *** equivalent— the variable named array. If the user this causes the compiler to issue a doesn't supply a single array as the warning because it looks like we might parameter, an appropriate warning have made a mistake. In this case we message is displayed and our function are safe, so we use the cast to hide the “...PHP arrays are will return a NULL value (remember that warning. Be very careful though—it is implemented using still very easy to make mistakes when the default return value is NULL, so we hash-tables. This don't need to do anything special to dealing with all these pointers, even if approach allows you are an experienced C coder. get a NULL value here). Back to the listing then—we have Now we're in new territory— indexing the array now managed to fetch the name elezend_hash_find() is the function to use using a string or ment from the array that was passed to look up a value by string key. It integer key to fetch to our function, and now we want to accepts four parameters; the first is a print it out as a string. It is very imporpointer to a hash table, the second is a its values.” tant to stress that the value we have is pointer to the key string, the third is the a zval and that, beyond that, we don't length of the key, including the NUL terknow anything else about it. If it is a minator and the fourth is a pointer to a zval ** that will receive the value if it exists. You can string, we can just print out the string value, but if it has get at the hash table contained in a zval using the any other type it will need to be converted first, otherwe risk crashing the engine. The Z_ARRVAL_P() macro. Before you use it, you must make wise c o n v e r t _ t o _ s t r i n g _ e x ( ) API call will handle this situasure that the zval really does reference an array value, otherwise you will get garbage results and most likely a tion for us in the best possible way—it will do nothing crash. In this case, zend_parse_parameters() has already performed the check for us (we told it we wanted an Listing 3 array), so we don't need to do anything further. 1 /* {{{ proto void phpa_emit_ecard(array fields) The zend_hash_find function returns SUCCESS if the 2 Emits a personalized e-card greeting */ element exists or FAILURE if it does not. Beware—the 3 PHP_FUNCTION(phpa_emit_ecard) 4 { values for SUCCESS and FAILURE are such that you must 5 zval *array; 6 zval **name = NULL, **age = NULL; always explicitly compare for the value you want to 7 check—do not assume that SUCCESS will evaluate to 8 if (FAILURE == zend_parse_parameters(ZEND_NUM_ARGS() 9 TSRMLS_CC, “a”, &array)) { TRUE or that FAILURE will evaluate to FALSE. Another 10 return; 11 } potential gotcha is with the length of the string key—it 12 13 if (SUCCESS == zend_hash_find(Z_ARRVAL_P(array), “name”, must include the NUL terminator for the string. The consizeof(“name”), vention used within PHP is to use the sizeof() opera14 (void**)&name)) { 15 convert_to_string_ex(name); tor when you are passing a string that you know at 16 zend_printf(“Hello ‘%s’ “, Z_STRVAL_PP(name)); 17 } compile time, since sizeof() on a constant string 18 resolves to the string length plus one for the termina19 if (SUCCESS == zend_hash_find(Z_ARRVAL_P(array), “age”, sizeof(“age”), tor—it's handled at compile time and saves your CPU a 20 (void**)&age)) { 21 convert_to_long_ex(age); few cycles when you call the function from your script. 22 zend_printf(“Happy %dth birthday!”, Z_LVAL_PP(age)); However, if you don't know the string at compile time 23 } else { 24 zend_printf(“Happy birthday!”); (perhaps it was passed as a parameter to your function 25 } 26 too) you should not use the sizeof() operator—it will 27 } resolve to the size of a string pointer, not the size of the 28 /* }}} */ February 2004
●
PHP Architect
●
www.phparch.com
25
FEATURE
Writing PHP Extensions: Managing Arrays
if the value is already a string, otherwise it will convert it to a string by making a copy of the value and converting the copy. The reason for making a copy is that you don't want to change the original value directly, since this would be reflected in the script as a sudden "magical" change in the type of that array element. Now that we have the name in a string form, we simply print it out to the output buffer mechanism using the zend_printf() function (it's equivalent to the printf() function you'd call from your PHP scripts, but channels its output through the scripting engine, so that it can be inserted properly in the script's overall output buffer). Note that we are using Z_STRVAL_PP() to access the underlying string value. Earlier in the article we used Z_ARRVAL_P() to get at an array value—you can see that the names and functions of these two macros are similar and reasonably intuitive—the former returns a string value while the latter returns the array value (the underlying hash table). The potentially confusing part of the names is the trailing P or PP—what does that mean? Each P represents a level of pointer indirection, so if you are accessing a zval *, you should use the _P version of the macro, but if you are accessListing 4 1 /* {{{ proto void phpa_iterate_array(array array) 2 For each element of the array, print the key and value */ 3 PHP_FUNCTION(phpa_iterate_array) 4 { 5 zval *array; 6 char *strindex; 7 int strindexlen; 8 long intindex; 9 zval **item; 10 HashPosition pos; 11 12 if (FAILURE == zend_parse_parameters(ZEND_NUM_ARGS() 13 TSRMLS_CC, “a”, &array)) { 14 return; 15 } 16 17 zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &pos); 18 19 while(SUCCESS == zend_hash_get_current_data_ex(Z_ARRVAL_P(array), (void**)&item, &pos)) { 20 21 switch (zend_hash_get_current_key_ex(Z_ARRVAL_P(array), &strindex, &strindexlen, &intindex, 0, &pos)) { 22 case HASH_KEY_IS_STRING: 23 /* binary safe */ 24 zend_printf(“string(%d): “, strindexlen); 25 PHPWRITE(strindex, strindexlen); 26 break; 27 28 case HASH_KEY_IS_LONG: 29 zend_printf(“long: %d “, intindex); 30 break; 31 } 32 33 zend_printf(“ => “); 34 35 convert_to_string_ex(item); 36 PHPWRITE(Z_STRVAL_PP(item), Z_STRLEN_PP(item)); 37 38 zend_printf(“\n”); 39 40 /* don’t forget to do this, otherwise you’ll end up in an infinite loop */ 41 zend_hash_move_forward_ex(Z_ARRVAL_P(array), &pos); 42 } 43 } 44 /* }}} */
February 2004
●
PHP Architect
●
www.phparch.com
ing a zval ** you should use the _PP version of the macro. There are a whole bunch of related macros that allow you to access the string value, string length, integer value, floating point value and so on. Keep in mind that you should not use these macros unless you know that the zval is of the appropriate type. Having now printed the name, we proceed to lookup the age. This is done in a similar away to above, but this time we want to print the age as a number, so we use convert_to_long_ex() to ensure that we have a integer value, and Z_LVAL_PP() to access that value. If the age was not found in the array, instead of printing an agespecific salutation, a more generic message is used instead. That's it—our function is complete. Or is it? When we print out the name using zend_printf(), we are relying on the string being a regular C-style NUL-terminated string, since that is what the printf() family of functions expect. Since any string in PHP could potentially be a binary string (maybe it is a far-eastern multi-byte string) we are probably going to end up clipping the string at the wrong point and generating broken output. The fix for this situation is to use the PHPWRITE() macro instead and pass Z_STRVAL_PP(name) and Z_STRLEN_PP(name) as its parameters. If you want to access an array element using an array index, you can use the zend_hash_index_find() function—it takes 3 parameters—the first is the hash table, the second is the integer value of the key and the third is a pointer to a zval **. In other words, you use it in the same way as you use zend_hash_find(), but instead of passing the string and the string length, you pass the integer value of the key. Iterating Arrays Now we know how to pull specific items out of an array, what about doing the equivalent of foreach(), so that we can print a list of names? Before we delve into the C code, let's just refresh our memories about how we can iterate arrays in the PHP script itself. There are three different ways to achieve this; the first and simplest approach that is familiar to programmers coming from other languages is to use an integer counter and step through the elements from 0 to the number-ofelements-minus-one using a for loop. This first approach is fine if your array is only ever indexed by integers, but this doesn't always hold true in PHP. That leads us on to the second method. Arrays have an internal position pointer that you can adjust using the end(), next(), prev(), current(), each() and reset() functions. Using various combinations of these allows you to step through and fetch elements from the array. This method is useful, but since they operate on the internal array pointer, anything else that changes that pointer while you are looping over it will mess up the loop. The final approach is to use the foreach() control structure
26
FEATURE
Writing PHP Extensions: Managing Arrays
that was introduced in PHP 4. foreach() works in a similar way to each() and next(), although it is has a little more tolerance to things messing with the internal position pointer, since it creates a copy of the array before working on it. It should be apparent that touching the internal array pointer while inside a looping control structure is a bad thing, so we want to do something that is more like the traditional for-loop approach, and store the array position in a local variable in our extension function. Of course, we want it to work with string keys as well as integer keys. Let's look at Listing 4, which demonstrates how to iterate an array and print out the keys and values. Lines 1-3 have the familiar prototype comments and PHP_FUNCTION declaration. Lines 5-10 declare the variables that we will be using—we have a variable to hold reference the array parameter, another to hold a pointer to the key if it is a string, another for the length of that string, a long to hold the integer value of the key if it is not a string, a zval ** to hold the element value and lastly we have a HashPosition variable that will keep track of where we are in the array (you can think of this as being a bit like the integer index you would use in a traditional for-loop, except that it works with string indices too). Lines 12-15 validate the function parameters to ensure that we receive only a single array. Now we are ready to being the actual iteration. The first thing we want to do is initialize our HashPosition variable so that it points to the first element of the array—this is achieved by calling zend_hash_internal_pointer_reset_ex() and passing it the hash table from the array and a pointer to the pos variable. The name of this function is a little misleadingit doesn't touch the internal pointer at all. We want to keep looping until we run out of elements, so let's use a while-loop and check the return value of the zend_hash_get_current_data_ex() function. This function is similar to zend_hash_index_find(), except that instead of passing an integer index, we are passing our hash position. If the function returns SUCCESS, it will have stored the value of the current array element in our item variable. If there are no more elements, it will return FAILURE instead; we use this fact to break out of the while-loop at the appropriate point. We also want the key for this element; we can use zend_hash_get_current_key_ex() to get it. This function is a little bit complicated, since it needs to be able to return a string key (and its length) or an integer key— so it requires that you pass suitable variables to receives those values. It's important to stress that requirementeven if you are only interested in integer keys you still need to pass valid pointers for the string and length. The opposite is also true-if you only want strings you still need to supply a variable to hold integer values. February 2004
●
PHP Architect
●
www.phparch.com
The zend_hash_get_current_data_ex() function returns one of three values—HHASH_KEY_IS_STRING indicates that the key is a string key, HASH_KEY_IS_LONG indicates that they key is an integer index and HASH_KEY_NON_EXISTANT indicates that there is no element at the current position. I'm using a switch statement to print the key correctly based on its type. It is worth noting that there is no need to check for HASH_KEY_NON_EXISTANT here, since zend_hash_get_current_data_ex() will have returned FAILURE before we reach this point. The rest of the code inside the loop should be selfexplanatory by now, except for the very last line-we need to advance to the next element before continuing with the next iteration of the loop, and we achieve that using zend_hash_move_forward_ex(). Summing Up By now you should be feeling pretty good at working with arrays in your PHP functions. We've seen how to build up arrays, multi-dimensional arrays, how to pull values out of array by string key and by numeric key. We've also seen how to iterate through the contents of an array. All this should give you plenty of ammunition for when you decide to move your PHP code over to C.
About the Author
?>
Wez Furlong is the Technical Director of The Brain Room Ltd., where he uses PHP not only for the web, but also as an embedded script engine for Linux and Windows applications and systems. Wez is a Core Developer of PHP, having contributed SQLite, COM/.Net, ActivePHP, mailparse and the Streams API (and more) and is the "King" of PECL-PHP's Extension Community Library. His consulting firm can be reached at http://www.thebrainroom.net.
To Discuss this article: http://forums.phparch.com/124
27
The Need For Speed Optimizing your PHP Applications
F E A T U R E
by Ilia Alshanetsky
The ever growing popularity of the web is putting a continually growing stress on the software and hardware used to power the common website. This article will help you combat the growing server loads and increase your web serving capacity without resorting to costly hardware upgrades.
B
efore starting on our quest for performance, let me pass along a small word of caution. Making your applications faster is certainly a noble goal but, unfortunately, it will often require a fair bit of time and frequently expose or introduce bugs. It is absolutely critical that you do not begin optimization prematurely, as doing so will virtually guarantee that deadlines will be missed and that the likelihood of ending up with a working program will be slim. Only optimize your applications once the code has been completely written, tested and deemed acceptable, and always set specific performance levels you seek to attain. Without a specific goal, you can just keep on optimizing forever, as there will always be some other tricks and tuneups you could apply. Now that we've gotten the standard optimization disclaimer out of the way, let's get to the fun part— doing the actual work. While you can certainly gain significant performance increases from optimizing your PHP code, this is usually one type of an optimization you would want to leave till the very end when all other options are exhausted. Optimizing the actual script can be a fairly drawn out process and there is always a risk of breaking working code. Whenever possible, it is always better to optimize things outside of your code that will have a positive impact on the performance of your applications. As you can probably guess, the focus of this article will be optimizations that do not actually require code modification and still make your PHP applications run much faster.
February 2004
●
PHP Architect
●
www.phparch.com
Getting Started The first step consists of optimizing the PHP executable itself, which will make all the scripts executed by it run faster. This can be done by making your C compiler, such as gcc, work harder when compiling PHP and tune the binary executable it generates for maximum performance. This optimization is performed by specifying several settings to the compiler via the CFLAGS environment variable. This variable, in turn, is used by the configuration script, which then passes these values on to the compiler at build time. It is important to note that while I am mentioning these options only in the context of PHP, these optimization flags are applicable to all parts of the system—and the more efficient the system, the faster it will be able to run everything, including your PHP applications. Below is an example of a modified PHP building procedure, which leaves room for compile-time tuning. export CFLAGS="-O3 -msse -mmmx -march=pentium3 \ mcpu=pentium3 -mfpmath=sse -funroll-loops" ./configure make make install
The Need For Speed: Optimizing your PHP Applications
What do these options do? The first one, -O3, indicates what level of optimization the compiler should use. Normally, PHP uses only -O2, which is considered to be "safe", as too much optimization can cause stability issues. However, given the evolution of compilers, -O3 is, in my experience, just as safe and many projects have already adopted it as their default optimization level. The main difference between the two is that -O3 enables function inlining, which allows the compiler to optimize out some functions by replacing function calls with a copy of their code. Another optimization technique that is enabled by -O3 is register renaming, which allows the compiler to take advantage of unused registers for various tasks; this is very handy on modern processors with large numbers registers that are frequently left unused. The downside of -O3 is that it makes the generated code nearly impossible to debug, since the register rearrangement creates a situation where a valid backtrace in the event of a crash cannot be generated. However, since you should not encounter crashes in a production environment, this is a fairly acceptable loss in most situations. In our compilation script above, we have a set of options that tell the compiler in a fair bit of detail about what processor the server has and what features it supports. This allows the compiler to apply various tricks and optimizations that are specific to a particular CPU (a Pentium III in our case). This is not normally done when producing binaries for distribution, since the goal is to generate portable code that can run on as many models of CPUs for a particular architecture as possible. Of course, enabling CPU-specific targeting means that the portability of the generated binary will be limited to a single processor type. For example, code tailored for the Pentium III via the -march and -mcpu switches (such as the one in my example) will not work on older Pentiums and AMD processors. If you are compiling PHP for a server farm that uses all types of CPUs, you may not want to use CPU tailoring options as they would require you to compile a separate PHP executable for every CPU type. The other three options, -msse, -mmmx and -mfpmath=sse, indicate that my processor supports these extended instruction sets and tells the compiler it should try to use them to generate a more optimal code. SSE and MMX are primarily math-related instructions sets and their usage can significantly accelerate any mathematical operations the underlying C code needs to perform. The last option I specify, , tells the compiler that it should unroll any small loops. The effect is the reductions in the number of instructions the processor needs to execute, since there is no more loop. February 2004
●
PHP Architect
●
www.phparch.com
However, the resulting binary will be slightly larger since instead of a single instance of the code in the loop, you'll now have the code inside the loop repeated as many times as the loop would have ran. Configuring PHP Properly Now that we have set our compiler options, let's review the configuration of PHP itself, as that, too, can have significant impact on performance. In most cases, PHP is used for serving web pages, usually as an Apache server module. The standard approach is to compile PHP as a shared Apache module that the web server then loads on startup. This is the recommended approach, as it allows for easy PHP upgrades that do not require recompilation of Apache. However, this is most definitely not the most performance-friendly approach. When generating a dynamically loadable module, the linker will add a series of hooks to allow the module to be loaded, which, among other things, does not allow the compiler to optimize the generated code to the fullest. The end result is that the compiled PHP executable is anywhere between 10% and 25% slower than it would be had it been compiled statically into Apache. # PHP configure line ./configure --with-apache=/path/to/apache_source # Apache configure line ./configure --activatemodule=src/modules/php4/libphp4.a
The configuration procedure above will compile PHP directly into Apache, making PHP part of the Apache server executable. As you can image, this means that upgrades of Apache or PHP will require you to recompile both packages. However, given the infrequent releases of both projects and relative quick compilation, the extended build procedure is more than made up for by the performance increase. You can speed up the increase in compilation time caused by the static compilation by reducing the number of extensions PHP compiles—and that will also increase performance. By default, PHP compiles a number of extensions that you may never use and that, in the end, only increase the size of your PHP binary, causing it to use more memory. Worse yet, some extensions will initialize various buffers and parameters on every request, slowing down the data serving process. You should try to compile only the extensions you need and disable extensions that you do not intend to use. ./configure \ --disable-all \ --disable-cgi \ --disable-cli \ --with-apache=/path/to/apache_source \
29
FEATURE
The Need For Speed: Optimizing your PHP Applications
The example above uses the -disable-all configuration flag to disable all extensions that are enabled by default in one go, saving the time needed to find all of the default extensions and disable them. It also will automatically disable all newly enabled-by-default extensions should any appear in the future without having to manually go through the configuration. The -disable-cgi and -disable-cli configuration directives explicitly disable the generation of the CLI and CGI SAPIs, whose compilation is not automatically disabled by the -disable-all flag. Since only the Apache SAPI is needed, there is no need to waste time building binaries that will not be used. Once all the unneeded SAPIs and extensions have been disabled, the needed extensions are enabled and the compilation process can begin. The end result is a smaller binary, which is especially important for SAPIs such as CGI and CLI where the startup costs occur on every request. A smaller binary will load that much faster allowing it to get to code processing quicker. More importantly, unneeded initializations will not be performed, making PHP work faster in all instances, regardless of the underlying SAPI. Optimizing the INI File With the PHP configuration and compilation out of the way, it's time to turn to the PHP.INI configuration directives, which can be used to improve the overall performance of your scripts as well. I'll begin with the register_globals option, which is already off by default as of PHP 4.2.0. However, many people still have it enabled, since their configuration was never updated as they upgraded their versions of PHP. This option makes PHP register a potentially large number variables based on user and system input, as well as making certain security exploits possible. It is is recommended to keep this option off and use the readily available super-globals to access the data passed by the user through POST and GET queries or browser cookies. You can further optimize the process of creating variables based on user input by changing the variables_order directive. It indicates which source of client-generated information should be used to populate the superglobals, as well as in which order they should be considered when building $_REQUEST, which is a cumulative result of the contents of other superglobals. By default, this option has a value of EGPCS, meaning that data from the system environment, the server environment, as well as user GET/POST/COOKIE input is stored. Storage and creation of array elements inside super-globals can take a hefty amount of mem-
February 2004
●
PHP Architect
●
www.phparch.com
ory and will have a negative impact on performance as this process is repeated during every single request. Therefore, you can improve the overall performance of your system by reducing the number of super-globals that are being created. In most situations, this means that you can set the value of variables_order to just GPC, so that only the data passed by the user in the GET/POST queries or through cookies is stored inside super-globals. The effect of this choice is a much faster input parsing procedure and a smaller memory footprint. If you need to use environment or system parameters, you can fetch them individually using the getenv() function instead, which will not cause a consistent performance impact. Beyond the standard super-globals, PHP also creates special variables that are used to store data that is passed via the command line. In a web environment, your PHP scripts will never be passed arguments in such a manner and, therefore, creating those variables is not necessary. You should disable register_argc_argv, which is the PHP setting responsible for the creation of these variables, to further speed up your scripts. Keep in mind that, if you use the CLI SAPI, you will need to leave this option enabled, otherwise your scripts will not be able to retrieve arguments passed to them via the command line. When parsing user input, PHP automatically escapes the data to prevent the user from injecting special characters that can potentially result in an undefined behavior in certain portions of your scripts. This automation is not always needed, since not all data fetched from the user is used in such a manner that provides a chance for special characters to cause trouble. It would be better to disable this automation by turning off the magic_quotes_gpc directive and manually escape the data as needed using addslashes(), or using whatever is the most appropriate escaping function for the situation. For example, in some cases, you need to use special escaping functions that are specifically tailored to secure data in a particular context, such as escapeshellcmd() for command lines and mysql_escape_string() for MySQL queries. The advantages of doing your own escaping are numerous: first of all, you only escape what you need, thus reducing the amount of time PHP spends parsing user input. You also save memory, as the escaping process will allocate twice as much memory to store an escaped string than it would normally for an unescaped one. Moreover, you also get a betterdesigned application that does not depend on a particular server configuration and is capable of working securely in an environment where magic_quotes_gpc is disabled. Beyond variable creation there are a number of
30
FEATURE
The Need For Speed: Optimizing your PHP Applications
other INI settings that are important for optimization purposes. By default, every PHP request is prefixed with an X-Powered-By header, which shows that what version of PHP you are running. For the purposes of rendering the page, this header is completely useless and, unless the user fetches the headers manually, it will never even be visible. In fact, just about the only people who can make use of this field are those trying to compromise your system and for that purpose need to determine what software is being run on it. It would be prudent, therefore, to disable sending of this header by setting the expose_php setting to off. Not only will this make a potential attacker's job more difficult, but it will also save a little bit of bandwidth and slightly increase performance by not sending useless data over the connection with your client. Speaking of sending data across the wire to your users, this is another area where proper INI configuration can be of much use. By default, PHP will print the
data to the user as soon as your script outputs it, resulting in many write operations, each sending a small bit of data to the socket. This can become quite slow, especially for large pages, since many system calls will need to be performed to write the data and at least some browsers will re-render the page each time a small chunk of data is received, making the user's experience less than pleasant. The alternative is to buffer the data in memory and send it in large chunks, thus reducing the number of writes to the socket and potentially speeding up the rendering time on the client. Output buffering can be enabled and controlled via the output_buffering option, which allows you to specify how big the memory buffer used to store a script's output should be. Ideally, you would want this buffer to be about the same size as the average page you send to your clients; this way, your average script output can be sent across the wire in one large chunk.
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
31
FEATURE
The Need For Speed: Optimizing your PHP Applications
At the same time, you should be careful not to create overly large buffers, as each PHP instance will have a buffer of its own—and, with many instances running at the same time, this can add up to quite a few megabytes, potentially exhausting all available memory. Another solution that can accelerate the process of sending data to the user is compression. PHP supports a GZIP-compressed output buffer handler that can be used to compress the data sent to the user in a manner that is automatically recognized by most modern browsers. For those users with compatible browsers, compression will reduce the size of the page many times over. The decrease in page size is especially convenient for users with slow connections, for whom this technique can shave off several seconds from the time it takes to load each page. In addition, faster data transmission allows server processes to be freed earlier, which, in turn, makes it possible for your server to handle a greater number of requests in any given timespan. Another pleasant side effect (on a very large scale) is the reduced bandwidth bill; I have seen bandwidth usage cut by as much as 40-50% by simply introducing compression. Better yet, implementing this feature does not require any code modification and it can be enabled by simply setting output_handler to ob_gzhandler inside the php.ini file. Alternatively, you can enable it for individual virtual hosts inside httpd.conf or specific directories via .htaccess, or even via ini_set() inside scripts that output large quantities of text. You should, however, keep in mind that compressing the data does require CPU power, and will increase the server load slightly. However, in most cases the benefits of faster loading pages, minimized bandwidth usage and reduced number of server processes will outweigh the inevitable slight increase in CPU usage. On occasion, you may find yourself using PHP not only to send data, but also to retrieve it from a remote source (for example, when implementing a network client like an e-mail application that has to retrieve messages from an IMAP server). In these situations. it is important to keep in mind that the Internet is not a local storage medium, and getting data out of it can be quite slow. You probably don't want to spend too much time waiting for the external source to respond to your query, or you may run the risk of hogging down your whole server. To prevent endless waiting, you should use the default_socket_timeout setting, which allows you to define how many seconds PHP should wait before giving up on fetching data from a remote source. This is especially important in a web environment, since while your script is waiting for data its web server instance cannot be used to serve other requests,
February 2004
●
PHP Architect
●
www.phparch.com
potentially requiring the creation of additional processes and resulting in an increased server load. In addition to remote sockets, you are likely to be working with local sockets in the form of database connections. Tuning your connection parameters is a very important step that will prevent connection overload, which may result in a performance drop and refused connections leading to broken pages. I recommend that you use the max_links and max_persistent options that exist for most database interfaces to specify how many connections PHP may keep open at any one time. By default, these options are set to -1 (unlimited), which in most situations is not a good idea, since it could lead to PHP trying to open more connections than your database server can handle. This setting is especially important when using persistent connections, which in an Apache environment will soon result in each child having their own connection open to the database. It is absolutely critical to ensure that there are strict controls to prevent persistent connections from taking up all possible database sockets, thus causing the DB server to refuse all other connections. In many instances (for example, if you run a shared host), it may be prudent to disable persistent connections altogether via the allow_persistent directive. This will automatically convert all attempts to open persistent connections into regular connections and help preventing a possible overload on your server. PHP's INI settings include several directives that limit the operations that PHP can perform, such as the ability to access and manipulate files and the amount of memory allocated by the interpreter. These settings are quite useful in a shared environment, where you want to keep a tight leash on your users to ensure that they are not abusing the system but, in a dedicated environment where you control a majority (if not all) of the PHP code executed by the interpreter, they only serve to slow down often-used functionality. Thus, for performance reasons it is better not to use the safe_mode, open_basedir and memory_limit directives in dedicated environments; the checks performed by PHP to enforce them are quite expensive and can lead to significant performance losses if enabled. Beyond the Configuration Besides optimization tricks and configuration tuning there are several other methodologies that can improve the performance of PHP applications without actually having to dabble in the application's source code. The first and foremost of these tools is an opcode cache, sometimes referred to as a "PHP compiler", although the term is really misused. Under normal circumstances, before the PHP script can be ran it must
32
FEATURE
The Need For Speed: Optimizing your PHP Applications
first be parsed and converted to a series of instructions (opcodes) that the Zend Engine can understand. This is a fairly fast process, but in large scripts with many include files it can take up a significant amount of time. Even in smaller applications, reading the PHP script from disk and parsing it every single time before execution can add up. It is quite wasteful, since for the most part the scripts rarely change between executions and there is really no need to parse the code from scratch every single time. This is where an opcode cache comes in. Instead of repeated parsing, the generated instructions are stored inside shared memory (or on disk), so that further access to the script does not require reparsing. Additionally, because the opcodes are often stored directly in memory, file system operations are reduced to a simple check to determine whether or not the script has changed since it was cached, thus further improving performance. Most opcode cache implementations—and there are several of them on the market nowadays-go even further and actually optimize the opcodes before storing them. During the traditional compilation process, the PHP parser tries to speed up the opcode generation process and does not always generate the most optimal instructions for the Zend Engine to execute. With an opcode cache, since the parsing is only done once, it makes sense to spend some time analyzing the generated opcodes and optimizing them so that their execution can be as fast as possible. The end result is that, with an opcode cache in place, you may see your PHP's performance improve anywhere between 40-600%. As far as opcode caching products go, for the most part all available solutions offer just about the same level of performance, with some minor differences. My current favorite is Turck-MMcache (http://turckmmcache.sourceforge.net/), which was originally developed by Dmitry Stogov. This particular compiler comes with a particularly efficient opcode caching mechanism and a powerful optimizer that in most cases can allow you to squeeze in a few extra requests per second compared to its competition. This cache also includes a few other features, such as a memory session handler and a content caching mechanism, which can be used to further improve the performance of your PHP applications. Unfortunately, at this time Dmitry is unable to dedicate time to the project and the development of MMCache has stalled. However, a number of volunteers have promised to continue maintaining the project and hopefully will pick up where Dmitry left off. The Zend Performance Suit (ZPS) is a commercially available PHP acceleration package offered by Zend that also implements an opcode cache and an opti-
February 2004
●
PHP Architect
●
www.phparch.com
mizer as well as content caching capabilities. The big plus of ZPS is that it is designed with both experienced and novice users in mind and provides a very powerful and user friendly interface to its components. This is especially useful when configuring content caching, which in Mmcache, for example, can require a bit of manual labor and testing. However, unlike MMcache, ZPS is not free. Its licensing model starts at about $499 per server, which may put it out of the price range of small site operators. Aside from ZPS, there is also APC, an Open Source initiative that has made big strides in the past year. Its performance is similar to that of ZPS and MMcache, but the lack of a good optimizer makes it a little slower in certain situations. Given its active development, however, there is little doubt that it will eventually be able to match the capabilities of the other implementations. I should also mention the IonCube PHP Accelerator, which was one of the original free opcode cache implementations. It still works quite well with PHP 4.3 series, but has not had any new visible developments in over a year and consequently does not perform as well as MMCache or APC in most situations. A Hidden Cache Regardless of whether or not an opcode cache is used, most scripts will still perform a fair number of file system operations. These can become a major bottleneck, because, while processor and memory speeds keep increasing, hard-drive speeds remain quite slow. It does not take much to reach the maximum read or write speed of a drive, which is usually just a few dozen megabytes per second. For ultimate performance, it is best to eliminate all filesystem operations. While this may seem like an impossible goal, a wonderful invention called a "ramdisk" makes it attainable without much effort. A ramdisk is really little more than the emulation of a hard-drive in memory; as far as programs (including your PHP scripts) are concerned, it appears to be just another run-of-the-mill disk partition. However, the data written in a ramdisk is actually stored directly in the system's memory, where data throughput is measured in hundreds of megabytes per second. Nearly all operating systems support ramdisks, but Linux actually goes a step further and allows for it to be bound to a physical drive or directory. This means that, while you get all the benefits of writing and reading data to memory, you also do not risk losing that data in the event of a system crash or reboot, since the kernel will automatically synchronize it back to the physical drive as needed. Incidentally, it's also very easy to turn on this feature-all you need is someone with root access and a few spare minutes:
33
FEATURE
The Need For Speed: Optimizing your PHP Applications
mount --bind -ttmpfs /tmp /tmp mount --bind -ttmpfs /home/webroot /home/webroot
The example above binds two commonly used directories, the temporary directory (frequently used for session storage and other common operations) and the directory where web site files can be found. The end result is that virtually all file operation commonly performed by PHP are accelerated through the reduction in the file I/O overhead. At the same time, reliability is not sacrificed for the sake of performance, making this an ideal solution even for the most demanding of websites. The only downside of this speed-up is that the ramdisk uses your memory and, therefore, binding large directories can eat up quite a bit of space that would otherwise be available to your applications. Thus, you need to exercise a bit of caution to ensure that directories mapped to ramdisks do not end up consuming all available memory and force the operating system to use its much slower swap memory facilities. And We Didn't Even Touch a Line of Code! As you've probably by now realized, there are many ways to improve the speed of PHP applications with-
February 2004
●
PHP Architect
●
www.phparch.com
out having to perform potentially dangerous code changes. Equally important is the fact that the changes for the most part require very little time to implement and can result in massive performance improvements. This does not mean that you should abandon the practice of optimizing the code itself, which is, of course, an important tool for making your applications faster. However, when time is of the essence and the pressure is on, it is always good to know a few tricks to make the code run faster without having to tinker with it.
About the Author
?>
Ilia Alshanetsky is an active member of the PHP development team and is the current release manager of PHP 4.3.X. Ilia is also the principal developer of FUDforum (http://fud.prohost.org/forum/), an open source bulletin board and a contributor to several other projects. He can be reached at [email protected].
To Discuss this article: http://forums.phparch.com/128
34
SQLyog
P R O D U C T
R E V I E W
www.Webyog.com by Eddie Peloke
I
have never been a fan of administering MySQL databases via the command line. The output of queries is difficult to read and, unless you are a MySQL expert, you need to keep a manual at your side for all the necessary syntax. That being said, as soon as I began developing with MySQL, I quickly looked for a GUI-based administration tool to speed up the development process. I eventually found MySQL Front and have used it ever since—I have tried many other administration tools over the past few years but MySQL Front has always been my favourite choice. However, all that changed as soon as I had the opportunity to use SQLyog. SQLyog is a MySQL GUI tool presented by Webyog.com. Webyog.com describes it as an “easy to use, compact and very fast graphical tool to manage your MySQL database anywhere in the world.” Let’s see how it does in this part of the world. The Details The SQLyog version I reviewed is 3.63, tested on a Windows machine. SQLyog is currently not available for Linux or MacIntosh, but don’t worry, a product called SQLyog Max is in the works and will include Linux and Mac support. SQLyog includes a very wide array of functionality that is certain to make even the most hard-core command-line fans happy:
QUICK FACTS Description: A. SQLyog is a very fast, compact and simple to use GUI tool to manage your MySQL server. The software is primarily for the users who work with MySQL during the development process. Like MySQL, SQLyog also follows the principle of the 14th century philosopher monk Occam. We follow his rule known as Occam’s razor: No complexity beyond what is necessary. Supported platforms: • Windows: 98, 2000, XP Price: 1-9 licenses: $49/license 10-49 licenses: $39/license Site License: $695 Site License: $395 ( Educational / Non Profit Organizations )
• It is compatible with MySQL 4.1, fully InnoDB compliant and supports very fast
February 2004
●
PHP Architect
●
www.phparch.com
36
PRODUCT REVIEW data retrieval operations. • It can import data from an ODBC source, with the option to import data through a query. It is also capable of copying entire databases from one server to another. • A schema and data synchronization tool is included to provide manual replication of database contents. • You can use it to schedule various jobs for automatic execution at a later date. • It provides fast client-side sorting and filtering. • It can execute multiple queries returning more than 1000s of rows per result set. It was written entirely in C/C++/Win32 APIs using native MySQL C APIs. • You can drop all tables of a database with a single click. • It allows you to edit BLOBs with support for Bitmap/GIF/JPEG formats. • You can profile queries for performance analysis • Despite being based on a GUI interface, it is
SQLyog
very keyboard friendly—you can access 99% of the features of SQLyog with the keyboard. • It allows access to MySQL’s running statistics, and can view and kill other user processes. You can also perform table diagnostics (check, optimize, repair, analyze). • You can use it to change table-types to ISAM, MYISAM, MERGE, HEAP, InnoDB, BDB. Now that we have the details out of the way, let’s load it up and see exactly what SQLyog can do. For those of you playing along at home, there is a 30 day, trial version of SQLyog available for download from their website. Within five minutes from the end of my download, I had SQLyog installed and was working with my databases. The Layout SQLyog’s layout is very clean and uncluttered. The application consists of three main working panes. The left-hand pane gives you a tree view of your databases. You can expand each database to show tables, columns and indexes. There are several right click options,
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
37
PRODUCT REVIEW depending on which part of the tree you are located. Sitting on the database, you can right click to alter the table structure, manage indexes, manage relationships, import data, export data, view data, and so on, while highlighting the individual columns gives you the right click option to drop the column or manage the column’s indexes. All of these options are also available via the application’s main menu, but I found it convenient to have them as right-click pop-up menus. The right hand top pane is the query pane, where you can type your select queries, table alterations, insert statements, and so on. Unfortunately, the query pane will automatically highlight the basic MySQL syntax, but not any of your table or column names. This isn’t a necessity, of course, but I have found it helpful with other db tools such as Toad for Oracle. The query pane does, however, give you some nice right-click options—one of the most interesting I found was the use of templates. SQLyog comes with a list of predefined MySQL statements such as ‘alter table’, ‘create indexes’, and others; clicking on any of the templates will pop the statement into the query editor, where you can then simply add your parameters such as table name, columns to be affected, and so on. Once executed, the results appear in a nice “Excel-style” tabular pane with column headers, as it is the case with most GUI database tools. The interesting thing about SQLyog’s results pane is that it does more than simply
SQLyog
show you the query output—you can view any messages returned by MySQL, view the objects connected to a selected table or database, and view your query history. Can We See the Menu Please? SQLyog’s menu contains many of the same items that can be accessed via right clicks or through the top toolbar, so that you can really perform a variety of operations, like executing queries, table diagnostics, and structure synchronization, from both places. What I like most about the menu and toolbar is their nice, clean, well organized layout. I was able to quickly find the tools and options needed to do just about any database-related task I needed. Some of the more interesting items in the menu are the Database Synchronization tool, Structure Synchronization tool, SQLyog's Job Agent and the HTML Schema tool. Using the database and structure synchronization tools will allow you to, as the names imply, synchronize the data or structure of two MySQL databases. These can be very helpful tools if you are working with separate development and production databases. Along with the synchronization tools, SQLyog ships with the SQLyog Job Agent (available free for Linux users), which allows you to schedule your synchronization tasks. This can be very helpful if you need to synchronize databases on a regular basis.
Figure 2
February 2004
●
PHP Architect
●
www.phparch.com
38
PRODUCT REVIEW The HTML Schema tool is another nice feature, as it allows you to quickly create an HTML representation of your db. The created schema shows table structures with columns and indexes. The generated schema also contains hyperlinks, which allow you to quickly find specific tables. Moving Data in and Out Any nice thing about SQLyog is its array of options to export data in and out of your databases. You can export data as batch scripts, export query results as XML, CSV (you can select the terminator) or HTML. Choosing the HTML option will create an HTML page with all of your data presented in HTML tables. The import functions allow you to run batch files, import from CVS and of course execute SQL scripts. SQLyog also contains an "Import Wizard," which gives you the power of importing from other ODBC data sources, such as Oracle, MSSQL, DB2 and Access. If you're porting your data to MySQL from external sources, the Wizard can help you cut down the migration time significantly.
SQLyog
The Future The future of SQLyog looks bright, thanks to the announcement of an SQLyog Max release sometime in Mid 2004. SQLyog Max will be a complete re-write of SQLyog to support multiple operating systems including Windows, Linux, Mac and *nix. According to Webyog, there are several new features to look out for in the upcoming release: • Full Multithread Support-Multiple queries can be executed simultaneously. Queries can be terminated in the middle of execution. • Unicode and Internationalization-Fully Unicode compliant, the new version will display Unicode data (MySQL 4.1) correctly. SQLyog would be available in multiple languages. • High Performance Editor-The new, highly scalable editor will allow the editing of very large file without loss in performance, as well as provide support for ToolTips, autocompletion and syntax highlighting for a
Figure 3
February 2004
●
PHP Architect
●
www.phparch.com
39
PRODUCT REVIEW variety of languages, such as PHP, HTML, XML, Perl and Python. • Tabbed Interface-A new, Visual Studio.NETlike tabbed interface, will support multiple documents, and allow you to get rid of modal dialogs in important operations like Data Editing and Table Structure Editing. • More polished interface, with new icons in menus and dialogue windows, as well as an improved toolbar and context-sensitive help. • for latest MySQL-SQLyog Max will also be compatible with the latest version of MySQL and with the new Stored Procedure Editor in MySQL 5.x.
SQLyog
What I Didn't Like I have not really found anything yet that I don't like about SQLyog. Many of the items on my 'would be nice to have' list should be addressed with the release of SQLyog Max-the E-R diagramming tool and the ability to start and stop MySQL from SQLyog Max will be welcome additions. The one main area, however, that could be improved is the help system. Right now, the help simply consists of HTML files, which do a good job of helping with SQLyog, but do not function well as a reference when you're in a bind. It would just be nice to see a searchable-index approach, as well as more MySQL-related help.
Along with some of the new features, SQLyog Max will be given a few enhancements over SQLyog that should help to make it a nice well-rounded database tool. These include a 'Query Builder', an EntityRelationship diagramming tool, and the ability to shutdown and start up MySQL, and will provide shot in the arm that SQLyog Max needs to become a top MySQL GUI administration tool.
Conclusion Overall, I really like SQLyog. I use it daily in my PHP/MySQL development and it is now my primary database tool. It is easy enough to use for the beginner, but has enough options for the professional MySQL developer. The lack of some advanced features, like the diagramming tool, keep it from getting five stars but it is a workhorse that I will not give up easily.
What I liked There are many things I liked about SQLyog. It is very easy to use and I found it quick and responsive. The menu is nicely laid out and organized with many of the options only a click away. Options like the synchronization tools and table diagnostics (you can select to optimize, check, analyze, or repair your selected tables) are nice features that can make a developer's life much easier. Out of all the tools I have tried, this has become my MySQL tool of choice.
php|a
Figure 4
February 2004
●
PHP Architect
●
www.phparch.com
40
Profiling PHP Applications
F E A T U R E
by George Schlossnagle
I
f you program PHP professionally, there is little doubt that, at some point, you will need to improve the performance of an application. If you work on a hightraffic site, this might be a daily or weekly endeavor for you; if your projects are mainly intranet ones, the need may arise less frequently. At some point, though, most applications need to be "retuned" in order to perform as you want them to. When I'm giving presentations on performance tuning PHP applications, I like to make the distinction between tuning tools and diagnostic techniques. Among the tuning tools are caching methodologies, system-level tunings, database query optimization, and improved algorithm design. I like to think of these techniques as elements of a toolbox, like a hammer, a torque wrench, or a screwdriver are elements of a handyman's toolbox. Just as you can't change a tire with a hammer, you can't address a database issue by improving a set of regular expressions. Without a good toolset, it's impossible to fix problems; without the ability to apply the right tool to the job, the tools are equally worthless. In automobile maintenance, choosing the right tool is a combination of experience and diagnostic insight. Even simple problems benefit from diagnostic techniques. If I have a flat tire, I may be able to patch it, but I need to know where to apply the patch. More complex problems require deeper diagnostics. If my acceleration is sluggish, I could simply guess at the problem and swap out engine parts until performance is acceptable. That method is costly in both time and materials. A much better solution is to run an engine diagnostic test to determine the malfunctioning part.
February 2004
●
PHP Architect
●
www.phparch.com
Software applications are in general much more complex than a car's engine, yet I often see even experienced developers choosing to make "educated" guesses about the location of performance deficiencies. During the spring 2003, the php.net Web sites experienced some extreme slowdowns. Inspection of the Apache Web server logs quickly indicated that the search pages were to blame for the slowdown. However, instead of profiling to find the specific source of the slowdown within those pages, random guessing was used to try to solve the issue. The result was that a problem that should have had a one-hour fix dragged on for days as "solutions" were implemented but did nothing to address the core problem. Thinking that you can spot the critical inefficiency in a large application by intuition alone is almost always pure hubris. Much as I would not trust a mechanic who claims to know what is wrong with my car without running diagnostic tests or a doctor who claims to know the source of my illness without performing tests, I am inherently skeptical of any programmer who claims to know the source of an application slowdown but does not profile the code. This article focuses on using the APD profiler for PHP to profile code. APD is a Zend extension, meaning that
REQUIREMENTS PHP: 4.x Or 5.x OS: Any Applications: N/A Code: http://code.phparch.com/20/1 Code Directory: profile
41
FEATURE
Profiling PHP Applications
it hooks deep into PHP itself to get accurate and lowcost performance measurements. Although products like Xdebug and DBG provide some profiling capabilities, APD offers the most comprehensive profiling capabilities.
-RSort by real time spent in subroutines (inclusive of child calls). -sSort by system time spent in subroutines. -SSort by system time spent in subroutines (inclusive of child calls). -uSort by user time spent in subroutines. -USort by user time spent in subroutines (inclusive of child calls). -vSort by average amount of time spent in subroutines. -zSort by user+system time spent in subroutines. (default)
Installing and Using APD APD is part of PECL and can thus be installed with the PEAR installer:
Display options -cDisplay Real time elapsed alongside call tree. -iSuppress reporting for php built-in functions -mDisplay file/line locations in traces. -OSpecifies maximum number of subroutines to display. (default 15) -tDisplay compressed call tree. -TDisplay uncompressed call tree.
# pear install apd
After ADP is installed, you should enable it by setting the following in your php.ini file: zend_extension=/path/to/apd.so apd.dumpdir=/tmp/traces
APD works by dumping trace files that can be postprocessed with the bundled pprofp trace-processing tool. These traces are dumped into apd.dumpdir, under the name pprof.pid, where pid is the process ID of the process that dumped the trace. To cause a script to be traced, you simply need to call this when you want tracing to start (usually at the top of the script): apd_set_pprof_trace();
APD works by logging the following events while a script runs: • When a function is entered. • When a function is exited. • When a file is included or required. Also, whenever a function return is registered, APD checkpoints a set of internal counters and notes how much they have advanced since the previous checkpoint. Three counters are tracked: • Real Time (a.k.a. wall-clock time)—the actual amount of real time passed. • User Time—the amount of time spent executing user code on the CPU. • System Time—the amount of time spent in operating system kernel-level calls. After a trace file has been generated, you analyze it with the pprofp script. pprofp implements a number of sorting and display options that allow you to look at a script's behavior in a number of different ways through a single trace file. Here is the list of options to pprofp: pprofp Sort options -aSort by alphabetic names of subroutines. -lSort by number of calls to subroutines -rSort by real time spent in subroutines.
February 2004
●
PHP Architect
●
www.phparch.com
The -t and -T options, which allow you to display a call tree for the script and the entire field of sort options, are particularly interesting. As indicated, the sort options allow for functions to be sorted either based on the time spent in that function exclusively (that is, not including any time spent in any child function calls) or on time spent, inclusive of function calls. In general, sorting on real elapsed time (using -r and -R) is most useful because it is the amount of time a visitor to the page actually experiences. This measurement includes time spent idling in database access calls waiting for responses and time spent in any other blocking operations. Although identifying these bottlenecks is useful, you might also want to evaluate the performance of your raw code without counting time spent in input/output (I/O) waiting. For this, the -z and -Z options are useful because they sort only on time spent on the CPU. A Tracing Example To see exactly what APD generates, you can run it on the simple script shown in Listing 1. Figure 1 shows the results of running this profiling with -r. The results are not surprising of course: sleep() takes roughly 1 second to complete. Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
42
FEATURE
Profiling PHP Applications
(Actually slightly longer than 1 second—this inaccuracy is typical of the sleep function in many languages; you should use usleep() if you need finer-grain accuracy). The hello() and goodbye() functions are both quite fast. All the functions were executed a single time, and the total script execution time was 1.0214 seconds. To generate a full call tree, you can run pprofp with the -Tcm options. This generates a full call tree, with cumulative times and file/line locations for each function call. Figure 2 shows the output from running this script. Note that in the call tree, sleep() is indented because it is a child call of hello(). Profiling a Larger Application Now that you understand the basics of using APD, let's employ it on a larger project. Serendipity is opensource weblog software written entirely in PHP. Although it is most commonly used for the weblogs of private individuals, Serendipity was designed with large, multiuser environments in mind, and it supports an unlimited number of authors. In this sense, Serendipity is an ideal starting point for
a community-based Web site that offers weblogs to its users. As far as features go, Serendipity is ready for that sort of high-volume environment, but the code should first be audited to make sure it will be able to scale well—and a profiler is perfect for this sort of analysis. One of the great things about profiling tools is that they give you easy insight into any code base, even one you might be unfamiliar with. By identifying bottlenecks and pinpointing their locations in code, APD allows you to quickly focus your attention on trouble spots. A good place to start is profiling the front page of the Web log. To do this, the index.php file is changed to a dump trace. Because the Web log is live, you do not generate a slew of trace files by profiling every page hit, so you can wrap the profile call to make sure it is called only if you manually pass PROFILE=1 on the URL line:
Figure 3 shows the profile results for the Serendipity index page, sorted by inclusive real times (using -R). I
Figure 1
Figure 2
February 2004
●
PHP Architect
●
www.phparch.com
43
FEATURE
Profiling PHP Applications
prefer to start my profiling efforts with -R because it helps give me a good idea which macro-level functions in an application are slow. Because the inclusive timing includes all child calls as well, "top-level" functions tend to be prominent in the listing. The total time for this page was 0.1231 seconds, which isn't bad if you are running your own personal site, but it might be too slow if you are trying to implement Serendipity for a large user base or a high-traffic site. The include_once() function is the top-ranked time-consumer, which is not uncommon in larger applications where a significant portion of the logic is implemented in include files. Note, though, that include_once() not only dominates the inclusive listing, but it seems to dominate the exclusive listing as well. Figure 4 verifies this: rerunning the profile with pprofp -r shows that include_once() takes 29.7% of the runtime, without counting any child function calls. What you are seeing here is the cost of compiling all the Serendipity includes. After all, one of the major costs associated with executing PHP scripts is the time spent parsing and compiling them into intermediate code. Because include files are all parsed and compiled at runtime, you can directly see this cost in the example shown in Figure 4. Luckily, you can immediately optimize away this overhead by using a compiler cache. Figure 5 shows the effect of installing APC and rerunning the profiles. As you can see, include_once() is still at the top of inclusive times (which is normal because it includes a large amount of the page logic), but its exclusive time has dropped completely out of the top five calls. Also, script
Listing 2 227 print (“
”); 228 for ($y=0; $y<7; $y++) { 229 // Be able to print borders nicely 230 $cellProp = “”; 231 if ($y==0) $cellProp = “FirstInRow”; 232 if ($y==6) $cellProp = “LastInRow”; 233 if ($x==4) $cellProp = “LastRow”; 234 if ($x==4 && $y==6) $cellProp = “LastInLastRow”; 235 236 // Start printing 237 if (($x>0 || $y>=$firstDayWeekDay) && $currDay<=$nrOfDays) { 238 if ($activeDays[$currDay] > 1) $cellProp.=’Active’; 239 print(“
execution time has almost been cut in half. If you look at the calls that remain, you can see that these are the three biggest offenders: • serendipity_plugin_api::generate_plugins • serendipity_db_query • mysql_db_query You might expect database queries to be slow. Database accesses are commonly the bottleneck in many applications. As predicted earlier, the high realtime cost of the database queries is matched with no user and system time costs because the time that is spent in these queries is exclusively spent on waiting for a response from the database server. The generate_plugins() function is a different story. Serendipity allows custom user plug-ins for side navigation bar items and comes with a few bundled examples, including a calendar, referrer tracking, and archive search plug-ins. It seems unnecessary for this plug-in generation to be so expensive. To investigate further, you can generate a complete call tree with this:
pprofp -tcm /tmp/pprof.28986
Figure 6 shows a segment of the call tree that is focused on the beginning of the first call to serendipity_plugin_api::generate_plugins(). The first 20 lines or so show what seems to be normal lead-up work. A database query is run (via serendipity_db_query()), and some string formatting is performed. About midway down the page, in the serendipity_drawcalendar() function, the trace starts to look very suspicious. Calling mktime() and date() repeatedly seems strange; in fact, date() is called 217 times in this function. Looking back up to the exclusive trace in 5, you can see that the date() function is called 240 times in total and accounts for 14.8% of the script's execution time, so this might be a good place to optimize. Fortunately, the call tree tells you exactly where to look: serendipity_functions.inc.php, lines 245-261 (shown in Listing 2). This is a piece of the serendipity_drawcalendar function, which draws the calendar in the navigation bar. Looking at line 244, you can see that the date() call is dependent on $month, $currDay,
Figure 6
February 2004
●
PHP Architect
●
www.phparch.com
46
FEATURE
Profiling PHP Applications
and $year. $currDay is incremented on every iteration through the loop, so you cannot cleanly avoid this call. You can, however, replace it: date("Ymd", mktime(0,0,0, $month, $currDay, $year))
This line makes a date string from $month, $currDay, and $year. You can avoid the date() and mktime() functions by simply formatting the string yourself: sprintf("%4d%02d%02d:, $year, $month, $currDay)
However, the date calls on lines 248, 249, 250, 258, 259, and 260 are not dependent on any variables, so you can pull their calculation to outside the loop. When you do this, the top of the loop should precalculate the three date() results needed: 227 228 229 230 231 232
Then lines 248-250 and 258-261 should both become this: if ($date_m == $month && $date_Y == $year && $date_j == $currDay) {
Implementing this simple change reduces the number of date() calls from 240 to 38, improves the speed of serendipity_plugin_api::generate_plugins() by more than 20% and reduces the overall execution time of the index page by 10%. That's a significant gain for a nine-line change and 15 minutes' worth of work! This particular example is easy to categorize as a simple case of programmer error. Putting an invariant function inside a loop is common for beginners—but it's generally easy to dismiss it is a mistake for a number of reasons: • Experienced programmers as well as beginners make these sorts of mistakes, especially in large loops where it is easy to forget where variables change. • In a team environment, it's extremely easy for simple inefficiencies like these to crop up. For example, a relatively simple task (such as writing a calendar) may be dispatched to a junior developer, and a casual audit of the work might fail to turn up this sort of error. • Inefficiencies like these are almost never revealed by intuition. If you approach the code base from afar, it is February 2004
●
PHP Architect
●
www.phparch.com
unlikely that you'll think that the calendar (largely an afterthought in the application design) is a bottleneck. Small features like these often contain subtle inefficiencies; 10% here, 15% there—they quickly add up to trouble in a performance-sensitive application. Spotting General Inefficiencies Profilers excel at spotting general inefficiencies. An example might include using a moderately expensive user function repeatedly when a built-in function might do, or frequently using a function in a loop where a single built-in function would do the job. Unlike the analysis done earlier in this chapter, which uses the inclusive timings, mild but widespread issues are often better diagnosed by using exclusive time ordering. My favorite example of this sort of "obvious" yet largely undetectable inefficiency occurred during the birth of APD. At the company where I was working, there were some functions to handle making binary data (specifically, encrypted user data) 8-bit safe so that they could be set into cookies. On every request to a page that required member credentials, the users' cookie would be decrypted and used for both authentication and as a basic cache of their personal data. User sessions were to be timed out, so the cookie contained a timestamp that was reset on every request and used to ensure that the session was still valid. This code had been in use for three years and was authored in the days of PHP3, when non-binary-safe data (for example, data containing nulls) was not correctly dealt with in the PHP cookie-handling code—and before rawurlencode() was binary safe. The functions looked something like the one you can see in Listing 3. On encoding, a string of binary data was broken down into its component characters with unpack(). The component characters were then converted to their hexadecimal values and reassembled. Decoding affected the reverse. On the surface, these functions are pretty efficient—or at least as efficient as they can be when written in PHP. When I was testing APD, I discovered to my dismay that these two functions consumed almost 30% of the execution time of every page on the site. The problem was that the user cookies were not small—they were about 1KB on average—and looping through an array of that size appending to a string at every cycle is extremely slow in PHP. Because the functions were relatively optimal from a PHP perspective, we had a couple of choices: • Fix the cookie encoding inside PHP itself to be binary safe. • Use a built-in function that achieves a result similar to what we were looking for (for example, base64_encode()).
47
FEATURE
Profiling PHP Applications
We ended up choosing the former option, and current releases of PHP have binary-safe cookie handling. However, the second option would have been just as good. A simple fix resulted in a significant speedup. This was not a single script speedup, but a capacity increase of 30% across the board. As with all technical problems that have simple answers, the question from on top was "How did this happen?" The answer is multifaceted but simple, and the reason all high-traffic scripts should be profiled regularly: • The data had changed—when the code had been written (years before), user cookies had been much smaller (less than 100 bytes), and so the overhead was much lower. • It didn't actually break anything—a 30% slowdown since inception is inherently hard to track. The difference between 100ms and 130ms is impossible to spot with the human eye. When machines are running below capacity (as is common in many projects), these cumulative slowdowns do not affect traffic levels. • It looked efficient—the encoding functions are efficient, for code written in PHP. With more than 2,000 internal functions in PHP's standard library, it is not hard to imagine failing to find base64_encode() when you are looking for a built-in hex-encoding function. • The code base was huge—with nearly a million lines of PHP code, the application code base was so large that a manual inspection of all the code was impossible. Worse still, with PHP lacking a hexencode() internal function, you need to have specific information about the context in which the userspace function is being used to suggest that base64_encode() will provide equivalent functionality. Without a profiler, this issue would never have been caught. The code was too old and buried too deep to ever be found otherwise. Note that there is an additional inefficiency in this cookie strategy. Resetting the user's cookie on every access could guarantee that a user session was expired after exactly 15 minutes, but it required the cookie to be re-encrypted and reset on every access. By changing the time expiration time window to a fuzzy one— between 15 and 20 minutes for expiration-you can change the cookie setting strategy so that it is reset
February 2004
●
PHP Architect
●
www.phparch.com
only if it is already more than 5 minutes old. This will buy you a significant speedup as well. Removing Superfluous Functionality After you have identified and addressed any obvious bottlenecks that have transparent changes, you can also use APD to gather a list of features that are intrinsically expensive. Cutting the fat from an application is more common in adopted projects (for example, when you want to integrate a free Web log or Web mail system into a large application) than it is in projects that are completely home-grown, although even in the latter case, you occasionally need to remove bloat (for example, if you need to repurpose the application into a higher-traffic role). There are two ways to go about culling features. You can systematically go through a product's feature list and remove those you do not want or need. (I like to think of this as top-down culling.) Or you can profile the code, identify features that are expensive, and then decide whether you want or need them (bottom-up culling). Top-down culling certainly has an advantage: it ensures that you do a thorough job of removing all the features you do not want. The bottom-up methodology has some benefits as well: • It identifies features. In many projects, certain features are undocumented. • It provides incentive to determine which features are nice and which are necessary. • It supplies data for prioritizing pruning. In general, I prefer using the bottom-up method when I am trying to gut a third-party application for use in a production setting, where I do not have a specific list of features I want to remove but am simply trying to improve its performance as much as necessary. Let's return to the Serendipity example. You can look for bloat by sorting a trace by inclusive times. Figure 7 shows a new trace (after the optimizations you made earlier), sorted by exclusive real time. In this trace, two things jump out: the define() functions and the preg_replace() calls. In general, I think that it is unwise to make any statements about the efficiency of define(). The usual alternative to using define() is to utilize a global variable. Global variable declarations are part of the language syntax (as opposed to define(), which is a function), so the overhead of their declaration is not as easily visible through APD. The solution I would recommend is to implement constants by using const class constants. If you are running a compiler cache, these will be cached in the class definition, so they will not need to be rein-
48
FEATURE
Profiling PHP Applications
stantiated on every request. The preg_replace() calls demand more attention. By using a call tree (so you can be certain to find the instances of preg_replace() that are actually being called), you can narrow down the majority of the occurrences to this function: function serendipity_emoticate($str) { global $serendipity; foreach ($serendipity["smiles"] as $key => $value) { $str = preg_replace("/([\t\ ]?)" . preg_quote($key,"/") . "([\t\ \!\.\)]?)/m", "$1$2", $str); }
new text markups later without having to manually alter existing entries. This function runs nine preg_replace() and eight str_replace() calls on every entry. Although these features are certainly neat, they can become expensive as traffic increases. Even with a single small entry, these calls constitute almost 15% of the script's runtime. On my personal Web log, the speed increases I have garnered so far are already more than the log will probably ever need. But if you were adapting this to be a service to users on a high-traffic Web site, removing this overhead might be critical. You have two choices for reducing the impact of
return $str; }
Listing 5
where $serendipity['smiles'] is defined as shown in Listing 4. Listing 5, on the other hand, shows the function that actually applies the markup, substituting images for the emoticons and allowing other shortcut markups. The first function, serendipity_emoticate(), goes over a string and replaces each text emoticon—such as the smiley face :)—with a link to an actual picture. This is designed to allow users to enter entries with emoticons in them and have the Web log software automatically beautify them. This is done on entry display, which allows users to re-theme their Web logs (including changing emoticons) without having to manually edit all their entries. Because there are 15 default emoticons, preg_replace() is run 15 times for every Web log entry displayed. The second function, serendipity_markup_text(), implements certain common text typesetting conventions. This phrase: *hello*
is replaced with this: <strong>hello
Other similar replacements are made as well. Again, this is performed at display time, so that you can add
Other methods include the Refresh header, whether passed as a legitimate HTTP header or by using a meta tag's http-equiv attribute. The point is to get the user to visit a remote URL that Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13
session_start(); if (!isset($_SESSION[‘initiated’])) { session_regenerate_id(); $_SESSION[‘initiated’] = true; } ?>
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
62
SECURITY CORNER includes a session identifier of the attacker's choosing. This is the first step in a basic attack, and the full-circle attack is illustrated in Figure 1. If successful, the attacker is able to bypass the necessity of capturing or predicting a valid session identifier, and it is subsequently possible to launch additional and more dangerous types of attacks. Think you're not vulnerable? Consider the code in Listing 1. Save this code as session.php somewhere where you can test it. After you ensure that you have no existing cookies from the same host (clear all cookies if you're not certain), use a URL ending in session.php?PHPSESSID=1234 to visit the page. For example, http://host/session.php?PHPSESSID=1234. The script should output 0 on your screen upon your first visit. Reload the page a few times, and you should notice the number incrementing each time, indicating the number of previous visits. With a different browser, or even an entirely different computer, go through the exact same initial steps. Upon visiting the URL for the first time, you will notice that you do not see 0. Rather, it recalls your previous session. Thus, you have impersonated the previous user. Now, if you consider that this all began with a session identifier being passed in the URL, you should see the basic danger that session fixation presents. Unlike a typical scenario, PHP did not generate the session identifier. There are a few shortcomings to this simplistic type of attack. The most important shortcoming is that the target application must use the session identifier passed to it, otherwise this attack will fail. If your session mechanism is nothing more than session_start(), your applications are vulnerable, as the previous demonstration illustrates. In order to prevent this specific vulnerability, you should always ensure that a new session identifier is used whenever you are starting a session for the first time. There are many ways this can be achieved, and one example is given in Listing 2 (this approach, too, has at least one weakness, so wait until you finish this article before deciding on the solution that best fits your needs). If the code in Listing 2 is used to start all sessions, any existing session will always have a session variable named initiated that is already set. If this is not the case, the session is new. The call to session_regenerate_id() replaces the current session identifier with a new one, although it retains the old session information. So, if the attacker coerced a user into using an external link to your application that contains the session identifier, this approach will prevent the attacker from knowing
February 2004
●
PHP Architect
●
www.phparch.com
the new session identifier, unless the session has already been initiated. A Sophisticated Attack A more sophisticated session fixation attack is one that first initiates a session on the target site, optionally keeps the session from timing out, and then executes the steps mentioned previously. An alternative to the approach used in Listing 2 is to call session_regenerate_id() whenever a user successfully logs in, since this is the moment the session data becomes sensitive for most applications. For example, whenever you validate a user's username and password, you might set a session variable that indicates success: $_SESSION['logged_in'] = true;
Just prior to setting such a session variable, a call to session_regenerate_id() can help to protect against a session fixation attack: session_regenerate_id(); $_SESSION['logged_in'] = true;
In fact, a good approach is to always regenerate the session identifier whenever the user's privilege level changes at all, including situations where the user must re-authenticate due to a timeout. By doing this, you can be sure that a session fixation vulnerability is not the weakest aspect of your access control mechanism. This approach is more secure than the previous example, because it adds another significant obstacle for an attacker to overcome, and it prevents sophisticated attacks where a valid session is first created and maintained. Unfortunately, it still has at least one weakness, although your application design may already prevent it. An Advanced Attack In the most advanced type of session fixation attack, the attacker first obtains a valid account on the target application—and this is typically only appealing when the attacker can do so anonymously. On some PHP applications, the login page is a separate script, such as login.php, and this script may not check the user's state, because it seems safer to assume that the user has not been authenticated. On the contrary, this approach can allow an attacker to create a session, log into the application with that session, optionally keep the session from timing out, and use the URL to the
63
SECURITY CORNER login page to launch the attack. If the login page accepts the new user's login but fails to regenerate the session identifier (because the privilege level has not changed), a vulnerability exists. This scenario may seem unlikely, but a thorough examination of your code with this situation in mind is well worth your time. There are two easy ways to prevent this particular issue: 1. Have the login page recognize the user's state. 2. Always regenerate the session identifier on the receiving script, regardless of the user's state. Until Next Time... A good generic recommendation for preventing session fixation attacks is to regenerate the session identifier anytime the user provides authentication information of any kind. Be wary of passing along such a simplistic catch-all suggestion, however, because misinterpretations are likely when someone is unfamiliar with the type of attack being prevented. There is no substitute for
a good understanding of session fixation, and it is possible that the best prevention for your applications is not even mentioned in this article. Hopefully, you can now eliminate session fixation from your list of serious security risks with which to be concerned. If you develop a particularly creative method of prevention, I would love to hear it. Until next month, be safe.
About the Author
?>
Chris Shiflett is a frequent contributor to the PHP community and one of the leading security experts in the field. His solutions to security problems are often used as points of reference, and these solutions are showcased in his talks at conferences such as ApacheCon and the O'Reilly Open Source Convention, his answers to questions on mailing lists such as PHP-General and NYPHP-Talk, and his articles in publications such as PHP Magazine and php|architect. Security Corner, his new monthly column for php|architect, is the industry's first and foremost PHP security column Chris is the author of the HTTP Developer's Handbook, published by Sams Publishing, and is currently writing PHP Security to be published by O'Reilly and Associates. In order to help bolster the strength of the PHP community, he is also leading an effort to create a PHP community site at PHPCommunity.org. You can contact him at [email protected] or visit his Web site at http://shiflett.org/.
FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit http://www.favorhosting.com/phpa/ call 1-866-4FAVOR1 now for information.
February 2004
●
PHP Architect
●
www.phparch.com
64
T I P S
&
T R I C K S
Tips & Tricks By John W. Holmes
Using FULLTEXT Searches to Prevent Duplicates Depending upon the purpose of your website, you may have to deal with the issue of duplicates. Duplicate news posts, forum posts, articles, reviews, or even malicious users quickly posting spam wherever they can. You may be able to make use of a FULLTEXT search on the user’s data before it’s put in the database to either catch or flag duplicates, though. The premise is that a FULLTEXT search of the complete data from the user on your entire table of content will catch submissions that are exactly the same or even similar. Catching the same exact data is easy, as a simple comparison will work, but to really make this worthwhile, we want to catch data that is similar. That’s where the FULLTEXT search comes in handy. I’m going to talk about and show examples for how to handle this with MySQL, but other database systems should have a similar method to do this. When running a FULLTEXT search, MySQL will return a relevancy number. If more keywords in your data are found within the content of the particular row, MySQL will assign it a higher relevancy number. This relevancy is the key to catching duplicates that are similar, but not exactly the same. So even though two news items
on the latest accomplishments of “Spirit on Mars” are not exactly the same, they’ll contain enough key words about the most recent event to trigger a high enough relevancy that you can catch or flag. This relevancy “threshold” is something that would need to be tweaked depending upon your specific application, though. It will require you to do a little testing with some sample data from your application. I did some tests when the current news from Slashdot (www.slashdot.org) for example, and determined that a relevancy number of 30 was adequate. This would allow for news items of the same general topic (Linux, SCO, Mars, etc) to be posted, yet flag news items that were very similar, i.e. a sentence removed or reworded, yet the same general news. Listing 1 shows some simple code that you can use to get you started on this. Add your articles, news, etc, and continue adjusting the relevancy threshold until you reach a point where duplicates are caught, but same topic items are not.
“If you’re using the PASSWORD() function in your applications to control your own user system, you shouldn’t be.”
TIPS & TRICKS Again, the uses for this are really dependent upon your application. Some forums will use a method similar like this to find FULLTEXT matches against the subject of your new threads and provide a list of possibly “related” threads. Running a match against users last 5 posts and flagging those users that have a high relevancy may catch those who are spamming forums with the same or similar posts (like those web hosting companies). News sites with multiple people managing submissions may be able to reduce duplicate stories making it to the web site. If you come up with a using a method like this, let me know and we may feature your implementation of this idea in a future issue. Using the MySQL PASSWORD() Function… Don’t. If you’re using the PASSWORD() function in your applications to control your own user system, you shouldn’t be. For one thing, and the main news of this tip, is that MySQL 4.1 is changing how the hashing function works so it will product a 41 character hash instead of the 16 that is uses now. So the CHAR(16) columns that Listing 1 1 {$_POST[‘rel’]}”; 16 $rs = mysql_query($query) or die(“1: “ . mysql_error()); 17 18 if(mysql_num_rows($rs)) 19 { 20 echo ‘
are being used now will not be long enough. MySQL 4.1.1 is going to change the hash yet again to produce a 45 character hash. This should be transparent to the users, though, as these functions should only be used by the MySQL authentication system and not used within your own applications: Quote: "An upgrade to MySQL 4.1 can cause a compatibility issue for applications that use PASSWORD() to generate passwords for their own purposes. (Applications really should not do this, because PASSWORD() should be used only to manage passwords for MySQL accounts. But some applications use PASSWORD() for their own purposes anyway.)" (http://www.mysql.com/doc/en/Password_hashing.html) The other thing to take from this is that pre-4.1 clients will not be able to connect to 4.1 servers. This means that all of the various GUI programs for interfacing with MySQL will need to be updated with the new client. This shouldn't be a big deal for those still under active development, but those unsupported yet still popular programs may finally have to be dropped entirely. Print or Echo? Most people think that Print and Echo are, essentially, two interchangeable PHP keywords. For most practical purposes, that's true enough-they will both send a string to the script's output. However, they are not exactly the same. Echo is a language construct, which is transformed directly into a special set of commands by the parser built into the PHP interpreter. It features a special syntax (for example, it requires no parentheses) and cannot be used as part of an expression. Print, on the other hand, behaves like a function whose return value is always int(1). Therefore, you could, in theory, use Print as part of an expression, although for all practical purposes that's pretty pointless. No Dumping While we're on the subject of printing information, let's talk about var_dump(). This function is often used as a quick-and-dirty method for debugging scripts. You know the drill-get to a problematic spot in your code, and dump one or more variables to the screen. When you're working on a website, the problem with this technique is that the output of var_dump() contains no HTML formatting and is, therefore, difficult to read in a browser. In particular, if you're outputting a very large array that contains many nested elements, everything will be displayed on the same line and create a confusing mess. The simplest way of making the output readable (short of writing your own version of var_dump() or using the PEAR::Var_dump package) is to output a <pre>
66
TIPS & TRICKS tag before your actual output: echo "<pre>"; var_dump ($my_var);
However, you now need to write two lines of code whenever you want to dump something to the browser, and if you have to do a lot of mucking around to find where your problem is, this approach can become very time-consuming. Luckily, var_dump() actually supports a variable number of parameters-you can actually dump more than one variable with a simple call. Therefore, all you really need to is change your code ever so slightly to include the <pre> tag in your call to var_dump():
other database systems with PHP. Let's see some PostgreSQL, Firebird, Oracle and even MSSQL tips on here. If you're using one of these databases, you're not alone. Share the tricks that make your life easier with the rest of us! Also, those of you not already playing with PHP5 should be. Those of you that already are, send in your tips and tricks for converting common bits of code from PHP4 to PHP5. These are the kind of tips people are going to be looking for in the coming months when, hopefully, PHP5 comes out officially.
var_dump ('<pre>', $my_var);
You can even add a closing <pre> tag at the end if you want to leave the remainder of the script's output undisturbed. Let's Talk About Other Databases and PHP5 Most tips here and there deal with PHP interacting with MySQL, but there are hosts of other databases out there that seamlessly interact with PHP. I write about what I use, but I encourage any of the readers out there to send in your tips, tricks, and knowledge about using
Have you had your PHP today?
About the Author
?>
John Holmes is a Captain in the U.S. Army and a freelance PHP and MySQL programmer. He has been programming in PHP for over 4 years and loves every minute of it. He is currently serving at Ft. Gordon, Georgia as a Company Commander with his wife and two sons.
http://www.phparch.com
NEW COMBO NOW AVAILABLE: PDF + PRINT The Magazine For PHP Professionals
February 2004
●
PHP Architect
●
www.phparch.com
67
Why Can’t We All Just Get Along?
e x i t ( 0 ) ;
By Marco Tabini
S
ometimes during the month of February, I happen to celebrate my birthday. For some people, birthdays are a cause for happiness and, if you're young enough, for lots and lots of beautiful gifts. Being past the beautiful gift stage myself, however, the cause for happiness is long gone. Even though I am not old (and wouldn't admit to being old even if I were old, which, in case you missed it, I most definitely am not), another birthday means that yet another year has gone by without me finally overtaking Bill Gates as the world's nerdiest man. Finding myself in the month of February, therefore, is the equivalent of stepping into a puddle of chocolate ice cream only to find out that it wasn't ice cream after all, and someone forgot all about the "scoop the poop" rule at the local park. Along with me being one step closer to Programmer Heaven (that mystical place where there are no computers whatsoever and the most advanced form of technology is the "light switch"), everything else is also one year older—including PHP. Granted, it might be slightly self-centered to measure the advancing age
February 2004
●
PHP Architect
●
of PHP in correspondence of the advancing age of yours truly, but it helps me put things into perspective. Where was PHP a year ago? Well, I'm tempted to say that it was pretty much in the same state as it is today, but that would be a gross oversimplification. Leaving PHP5 aside for a moment, PHP4, the "stable" PHP platform, has not changed dramatically since then, although some of the new features that have become available have made a bit of an impact. The introduction of file_get_contents(), for example, ended once and for all the need to write those ultra-hated three lines of code where just one is more than enough. What has really changed in the last year is the level of familiarity and confidence that people have with PHP. I see it, in particular, in the complexity of the articles that reach our office these days. When we started out, our goal was to delve into the lesser-known and more advanced capabilities of PHP, and that we did. The articles that we receive today, however, sometimes surpass even my wildest dreams. This month, for example, we feature an article on
www.phparch.com
writing an SMS gateway using nothing more than PHP, an open-source package and a common Nokia cell phone. It almost sounds like something taken out of a MacGyver episode—I'm sure the Swiss Army Man himself would approve. Surely, a messaging solution is not everyone's bag but, to be able to consider developing an application like that, one has to be very confident in the capabilities of the platform he chooses—and that we are, because PHP has shown that it is here to stay, and that it has the ability to deliver and to deliver consistently. Unfortunately, our ever increasing confidence in PHP4 is going to make adopting PHP5 much more difficult. For better or worse, intimate knowledge of a development platform means learning to cope with its deficiencies, and sometimes to take advantage of them in a way that its original designers never anticipated. The history of computer programming is full of cases where this tenet holds true; for example, for those of you old (and crazy) enough to remember, some enthusiasts learned that the feature that made it possible to change the border colour of the Apple IIGS' screen could be used to write text and draw animations outside of the actual addressable area of the screen, and thus created some of the most memorable demos seen until that day (sadly, the GS died an awful death shortly thereafter, so that the computer I bought with six months worth of savings is now safely stowed away from my wife's Spring Cleaning reach). On a more mundane level, a bug in the on-chip microcode for the original Intel 8088 CPU was once used by the IBM/PC bios to determine whether a computer was equipped with the older 8088 or the newer 8086. When the 286 and later x86 models came out, a different feature of the processor, originally intended for other uses, was taken advantage of for the purpose of identification, until Intel finally caught on and started to provide a CPUID command as part of the Pentium's machine language
68
EXIT(0);
Why Can’t We All Just Get Along?
instruction set. The underlying characteristic of all these examples is that they made use of features that the original designers of their platforms had not anticipated. Given that we've had a few years to get intimately acquainted with it, it's more than likely that something similar has happened with PHP4 and all sorts of things will start breaking once PHP5 is officially introduced. I know at least some of my code will— taking shortcuts (also known as "ugly hacks") with OOP in PHP4 is all but inevitable-and I won't be too happy about it. How does one go about addressing a problem like this? Well, I envision one of three possible solutions: • PHP5 features a "PHP4Compatibility" mode. The likelihood of something like this happening is pretty much on the same level as Canada winning the Soccer World Cup (for those who don't follow soccer, let me be a bit less cryptic—it's not going to happen). Even if it were technically feasible, it would be a pretty dumb solution to the problem— I can only begin to fathom the disastrous consequences of trying to simulate PHP4's erroneous handling of OOP in PHP5. • The "you pays your moneys and you takes your chances" approach—no facility is created to allow for the peaceful coexistence of PHP4 and PHP5 code in the same project (or on the same server). In my opinion, this would also be a pretty bad move, given that most people can't afford to run two servers just to be able to use both platforms, and no-one wants to rewrite entire applications unless they really have to.
February 2004
●
PHP Architect
●
• PHP4 and PHP5 are made to work well together. Being able to run both PHP4 and PHP5 code on the same server would be as excellent an outcome as one could possibly hope, so that new code can run alongside old one. Let's take a closer look at this last possibility. The best possible scenario would be to be able to run both PHP4 and PHP5 scripts on the same server and let them be capable to share everything, from cookies to session information. On some level, this may be feasible, but I think that, more realistically, one should aim lower and simply hope to be able to run PHP4 and PHP5 projects on the same server. With this approach, if, say, MySQLAdmin or Gallery don't work with PHP5, you can still use them together with your other PHP5-compatible websites without breaking the bank. In order for this to happen, I think that at least two events must take place. First, the PHP Team must ensure that such coexistence is possible, and that crystal clear instructions be provided for everybody who wants to take advantage of it. In particular, the instructions should be easy to implement for hosting companies: considering the level of competitiveness and the low profit margins they are used to working with, a simpler solution will mean that more providers will be able to take advantage of it, resulting in cheaper and more stable hosting for everyone. Another important step in the direction of better coexistence between different versions of PHP is in the naming convention for PHP files. Right now, we're all used to ending our files in .php. However, when PHP5 comes along, if we want to be able to use both PHP4 and PHP5 on the same server, we'll have to find another solution, since most web servers determine a file's type by its extension. If you're planning to write PHP5-only software, it might be worth to think about using some-
www.phparch.com
thing different, such as .php5, for example. This is not a new idea—it's already commonly done with PHP3 code (which ends in .php3); I just think that it should be done as a matter of routine for every version of the language. The final nail in the coffin of PHP4 will be proper user education. As of this writing, documentation for PHP5 is still rather sparse, and it will probably take a while before that changes in a meaningful way. It's important to understand that by "documentation" I don't mean the run-of-the-mill references—be it online or in print-that tell you how the new OOP functionality or the new extensions work. I'm referring to those resource that tell you how the new features of the language can be used. There is a major difference between explaining what an exception is and showing to someone that in practical terms it can be used to cut down and simplify error checking to a minimum without any appreciable loss of functionality, just like there is a difference between showing you how to access a database and using that functionality to build an SMS gateway. The problem with making these "advanced" techniques available to the general public is that... there has to be someone capable of understanding them and putting them to practical use first, and this is more likely to happen with a mature and well-understood language like PHP4 than with a new-and in some cases dramatically improved—platform like PHP5. One has to start somewhere, however, and hopefully we'll see more and more PHP5 tutorials on the web. As always, we'll try to bring you a good balance—introducing new features that are waiting for you in PHP5, but keeping a watchful eye on the fact that, on a day-to-day basis, we are all still more likely to use PHP4 for quite a while.