This copy is registered to: Rodney Burruss [email protected] CLASSROOMS VIRTUAL Online Training Courses from php|ar...

51 downloads 1099 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

DOWNLOAD PDF

This copy is registered to: Rodney Burruss [email protected]

CLASSROOMS VIRTUAL

Online Training Courses from php|architect Zend PHP Essentials Our introductory PHP course, Zend PHP Essentials, was developed for us and Zend Technologies by PHP expert Chris Shiflett, co-founder of the PHP Security Consortium. This 19-hour course provides a thorough introduction to PHP development, with particular care to "doing things right" by covering security, performance and the best development techniques. Rather than cramming as much theory as possible, PHP Essentials provides a thoroughly practical approach to learning PHP—thus ensuring that each student will be able to write good PHP code in a real-world setting by the end of the course. Zend PHP Certification Training

Zend Professional PHP Development

If you want to become a Zend Certified Engineer, this course is the best preparation tool that you'll ever find! Designed by some of the same Subject Matter Experts who also helped write the exam itself, this course covers every single topic that is part of the exam. The Zend PHP Certification Training (course) provides a complete overview of the exam, and doubles as an excellent refresher course in PHP for any developer.

This is our advanced course for the professional PHP developer. This course picks up from where PHP Essentials ends and provides a thorough, in-depth analysis of advanced features found in both PHP 4 and PHP 5, including object-oriented programming and design patterns, XML development, regular expressions, encryption, e-mail manipulation, performance management and advanced databases.

Course

Description

Start Dates

Zend PHP Essentials

• Covers PHP 4 and PHP 5 • Provides a thorough practical Every month introduction to PHP • Covers security and performance

7 Sessions 19 Hours 3 Weeks

YES

-

$769.99 US ($999.99 CAD)

Zend PHP Certification Training

• Covers every topic in the exam • Provides an excellent refresher course for PHP at all levels

Every month

7 Sessions 19 Hours 3 Weeks

YES

Zend PHP Essentials

$644.99 US ($838.99 CAD)

Every month

7 Sessions 19 Hours 3 Weeks

YES

Zend PHP Essentials

$769.99 US ($999.99 CAD)

• Covers advanced PHP 4 and PHP 5 topics

Zend Professional • Perfect for going "beyond the PHP Development basics" and learning the true

Duration

Tutoring Prerequisites

Cost

power of PHP

• All our courses are delivered entirely online using an innovative system that combines the convenience of the Internet with the unique experience of being in a real classroom. • All sessions take place in real time, and the students can interact directly with the instructor as if they were in a real classroom either via voice or text messaging. • In most cases, our system requires no software installation and works with the majority of operating systems and browsers, including Windows, Mac OS and Linux, as well as Internet Explorer, Firefox and Safari. • All courses include a generous amount of homework and in-class exercises to ensure that the students assimilate each topics thoroughly. • Tutoring is available (via e-mail) throughout the duration of the entire course. • Each class includes a complete set of recordings that the students can peruse at their leisure.

For more information, visit our website at http://www.phparch.com/phptraining or call us toll-free at (877) 630-6202 (416-630-6202 outside Canada and the U.S.)

09.2005 DEPARTMENTS

FEATURES

6 EDITORIAL The Whining Stops Here

7 WHAT’S NEW

16 Roll Your Own Database Abstraction Module by Jason Lustig

10 TIPS & TRICKS Input Filtering: Part 3 Ensuring Input Received is Input Expected by Ben Ramsey

23 An Introduction to PDO Uniform Database Access in PHP 5.x Ilia Alshanetsky

54 TEST PATTERN State of Confusion by Marcus Baker

59 PRODUCT REVIEW

37 What are Trackbacks and Why Do They Exist? by Chris Cornutt

FUDforum 2.7.1 by Peter B. MacIntyre

63 SECURITY CORNER PHP Security Audits

44 End-to-End Testing with PHP and Internet Explorer by Oz Solomon

by Chris Shiflett

68 Exit(0); Atomic Orange by Marco Tabini

Download this month’s code at: http://www.phparch.com/code/

ED ITO RIA L

php|architect

TM

The Whining Stops Here P

HP has long been attacked by those who complain who like to complain, usually about parts of the language that “don’t [quite] work properly,” or issues that have sprung up as a result of PHP’s constant evolution (but reluctance to break backwards-compatibility). How many times have you had to consult the manual to refresh your memory on the order of the needle and haystack parameters? Unfortunately, there’s no way to “fix” this particular issue, without breaking every script, in the history of PHP, that has ever used the in_array() function. Bogus complaints aside, one actually valid argument against PHP that I’ve seen, recurring amongst the pundits, is the lack of a built-in, common database access mechanism. Sure, there are a number of database abstraction packages floating around the PHP world. Some of these are even quite mature, and featurerich. Still, none have been bundled with PHP (with the exception of PEAR::DB), nor have they received the de facto PHP Core Seal of Approval. Enter PHP Data Objects (PDO), one of, if not the, first, compiled, true PHP extensions that allows uniform database access for the majority of popular database platforms. Not only is it actually a PHP extension (which generally means that the code will be fast—and PDO meets this expectation), and not a bunch of more common PHP user-land code, but it will be bundled with PHP 5.1, which should be released “Real Soon Now.” This is great news for everyone who uses PHP to communicate with a database. One of the main PDO developers, and a name you’re likely to recognize, Ilia Alshanetsky, has written an introduction to this wonderful new extension, and we’re proud to be running it in this issue. If you’re anxious to try out PDO, but aren’t so anxious as immediately upgrade to PHP 5.1 (or a release candidate), the extension has been available in PECL for a while, now, for anyone who is running at least PHP 5.0. Back to the pundits, one thing to remember in this argument is that PDO doesn’t claim to be a database abstraction layer, but a common database access interface. True database abstraction is nearly impossible to maintain. Consider database-specific SQL, such as MySQL’s NOW() versus MSSQL’s get_date() . So, PDO aptly defers this behavior to the user, and doesn’t attempt to re-write queries (for the most part—see the part of the article that discusses prepared statements and emulation). That’s why another approach, such as the one described in Jason Lustig’s piece (in this issue) would lend itself nicely to a common access interface such as PDO. Jason’s code could easily accommodate PDO, while allowing the user to specify RDBMS-specific SQL. Looks like the PHP-haters will have to find something else to whine about. In the mean time, we PHP-lovers will go about our lives, eating up new features with enthusiasm. Happy reading!

Volume IV - Issue 9 September, 2005

Publisher Marco Tabini

Editor-in-Chief Sean Coates

Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke

Graphics & Layout Aleksandar Ilievski

Managing Editor Emanuela Corso

News Editor Leslie Hill [email protected]

Authors Ilia Alshanetsky, Marcus Baker, Chris Cornutt, Jason Lustig, Peter B. MacIntyre, Ben Ramsey, Chris Shiflett, Oz Solomon

php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.

Contact Information: General mailbox: [email protected] Editorial: [email protected] Subscriptions: [email protected] Sales & advertising: [email protected] Technical support: [email protected] Copyright © 2003-2005 Marco Tabini & Associates, Inc. — All Rights Reserved

September 2005

●

PHP Architect

●

www.phparch.com

What’s

?>

NEW

PHP 5.1 RC 1 php.net announces the release of PHP 5.1 RC 1. "PHP 5.1 Release Candidate 1 is now available! If all goes well, this RC will be followed by a release within a couple of weeks. Some of the key improvements of PHP 5.1 include: • PDO (PHP Data Objects) - A new native database abstraction layer providing performance, ease-of-use, and flexibility. • Significantly improved language performance mainly due to the new Zend Engine II execution architecture. • The PCRE extension has been updated to PCRE 6.2. • Many more improvements including lots of new functionality & many bug fixes, especially in regards to SOAP, streams and SPL. • See the bundled NEWS file for a more complete list of changes. Everyone is encouraged to download and test this beta, although it is not yet recommended for mission-critical production use." Get your hands on the latest release at php.net.

MySQL 4.1.14

phpGroupWare 0.9.16.008

MySQL announces the release of version 4.1.14. Some new changes include: • SHOW CHARACTER SET and INFORMATION_SCHEMA now properly report the Latin1 character set as cp1252. • MySQL Cluster: A new -P option is available for use with the ndb_mgmd client. When called with this option, ndb_mgmd prints all configuration data to stdout, then exits. • The output of perror --help now displays the --ndb option. • NDB: Improved handling of the configuration variables NoOfPagesToDiskDuringRestartACC, NoOfPagesToDiskAfterRestartACC, NoOfPagesToDiskDuringRestartTUP, and NoOfPagesToDiskAfterRestartTUP should result in noticeably faster startup times for MySQL Cluster. • Added support of WHERE clause for queries with FROM DUAL. • Added an optimization that avoids key access with NULL keys for the ref method when used in outer joins. • Added new query cache test for the embedded server to the test suite, there are now specific tests for the embedded and nonembedded servers. • Release also contains several bug fixes.

The phpGroupWare team is proud to announce their latest release, 0.9.16.008. What is phpGroupWare? phpGroupWare.org describes it as:

Grab the latest release from mysql.com.

"phpGroupWare-formerly known as webdistro-is a multi-user groupware suite written in PHP. It provides about 50 web-based applications, such as Calendar, Address Book, an advanced Projects manager, To Do List, Notes, Email, Newsgroup and Headlines Reader, a File Manager and many more applications. The calendar supports repeating events and includes alarm functions. The email system supports inline graphics and file attachments. The system as a whole supports user preferences, themes, user permissions, multi-language support and user groups. It includes modules to set up and administer the working environment. The groupware suite is based on an advanced Application Programming Interface (API)." Get more info at phpGroupWare.org.

September 2005

●

PHP Architect

●

www.phparch.com

7

What’s New?>

Check out some of the hottest new releases from PEAR.

MP3_ID 1.2.0RC2 This class offers methods for reading and writing information tags (version 1) in MP3 files.

File_Find 1.0.0 File_Find, created as a replacement for its Perl counterpart, also named File_Find, is a directory searcher, which handles, globbing, recursive directory searching, as well as a slew of other cool features.

PHPUnit 1.3.0 PHPUnit is a regression testing framework used by developers to implement unit tests in PHP. This version is to be used with PHP 4.

Mail 1.1.8 PEAR's Mail package defines an interface for implementing mailers under the PEAR hierarchy. It also provides supporting functions that are useful to multiple mailer backends. Currently supported backends include: PHP's native mail() function, sendmail, and SMTP. This package also provides a RFC822 email address list validation utility class.

DB_DataObject_FormBuilder 0.18.1 DB_DataObject_FormBuilder will aid you in rapid application development using the DB_DataObject and HTML_QuickForm packages. For a quick, but working, prototype of your application, simply model the database, run DataObject's createTable script over it, and write a script that passes one of the resulting objects to the FormBuilder class. The FormBuilder will automatically generate a simple but working HTML_QuickForm object that you can use to test your application. It also provides a processing method that will automatically detect if an insert() or update() command has to be executed after the form has been submitted. If you have set up DataObject's links.ini file correctly, it will also automatically detect if a table field is a foreign key and will populate a select box with the linked table's entries. There are many optional parameters that you can place in your DataObjects.ini or in the properties of your derived classes, and will be used to fine-tune the form generation, gradually turning the prototypes into fully-featured forms. You can take control at any stage of the process.

Net_Curl 1.2.2 Provides an OO interface to PHP's curl extension.

php|architect Releases New Design Patterns Book We're proud to announce the release of php|architect's Guide to PHP Design Patterns, the latest release in our Nanobook series. You have probably heard a lot about Design Patterns---a technique that helps you design rock-solid solutions to practical problems that programmers everywhere encounter in their day-to-day work. Even though there has been a lot of buzz, however, no-one has yet come up with a comprehensive resource on design patterns for PHP developers—until today. Author Jason E. Sweat's book php|architect's Guide to PHP Design Patterns is the first, comprehensive guide to design patterns designed specifically for the PHP developer. This book includes coverage of 16 design patterns with a specific eye to their applications in PHP when building complex web applications, both in PHP 4 and PHP 5 (where appropriate, sample code for both versions of the language is provided). For more information, http://www.phparch.com/shop_product.php?itemid=96.

September 2005

●

PHP Architect

●

www.phparch.com

8

What’s New?>

Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.

pecl_http 0.12.0 pecl_http's features and functionality includes: • Building absolute URIs • RFC compliant HTTP redirects • RFC compliant HTTP date handling • Parsing of HTTP headers and messages • Caching by "Last-Modified" and/or ETag (with 'on the fly' option for ETag generation from buffered output) • Sending data/files/streams with (multiple) ranges support • Negotiating user preferred language/charset • Convenient request functions built upon libcurl • PHP 5 classes: HttpUtil, HttpResponse, HttpRequest, HttpRequestPool, HttpMessage

APC 3.0.8 APC is the Alternative PHP Cache. It was conceived of to provide a free, open, and robust framework for caching and optimizing PHP intermediate code.

ingres 1.1 This extension supports Computer Associates's Ingres Relational Database.

DTrace 1.0.2 Allows Solaris' dtrace to instrument PHP.

PHPEd 4.0 NuSphere announces the latest release of their php IDE: PHPEd 4.0. The announcement lists some of the main features of the new release as: • Advanced, efficient and highly customizable EDITOR with support for object-oriented coding. Code highlighter, user-defined shortcuts, instant syntax analysis, code insight, code templates and much more. • Sophisticated PHP DEBUGGER that can operate both locally and in the remote mode. Debugger module for the latest php version 5.0.4 is included in the package. • PHP PROFILER. PhpED profiler shows executing time for each line, function or module of the code with tenth milliseconds precision. All the bottlenecks in the code are located quickly and efficiently. • Project-wide CODE EXPLORER in PhpED IDE shows all php classes, methods, properties, functions and variables in every detail. • Enhanced project management and deployment. Support for FTPS (TLS/SSL), SFTP and WebDAV/HTTPS (SSL) protocols make deployment and data transfer secure. • Integrated MySQL, MSSQL, Oracle and UltraSQL/PostgreSQL clients. Connect to a database directly from the IDE. Browse databases, run SQL queries and work with database content without leaving the IDE. • Integrated CVS client. Review changes in old versions of a source files to track bugs while working on the same project in a team of developers. • NuSOAP Wizard. Easily build professional web services in PHP using the NuSoap library. • Enhanced integration. PhpED IDE can be easily integrated with 3rd party tools. The product is delivered with the embedded CSE HTML Validator LITE and PolyStyle Formatter. PhpED IDE includes a number of pre-configured tools like PHP documentor, HTML Tidy and a CVS client. • Support for international character sets, including UTF-8. True Unicode editing is now available. PhpED IDE can be used to create web sites in different encodings and natural languages. For all the latest info, visit NuSphere.com.

September 2005

●

PHP Architect

●

www.phparch.com

9

TIPS & TRICKS

Input Filtering, Part 3: Ensuring Input Received is Input Expected by Ben Ramsey

This year has seen an increased focus on PHP security, and this is good for the language, developers, and business community. One phrase that comes to mind when discussing secure coding practices is Chris Shiflett’s mantra of “filter input, escape output.” While we know what this means in a general sense, practical examples elude us. This month’s installment of Tips & Tricks concludes the series on filtering input, providing practical examples and helpful tips to filter input using regular expressions, test for the length of data, and ensure acceptable values.

P

art one of this series introduced the need to filter input and explained why all input, whether from a user or an RSS feed, should be considered tainted. I also introduced the whitelist approach as a best practice for filtering input. Part two further explained the whitelist approach, exploring the use of the ctype functions as excellent tools to implement a whitelistbased filter. Recall from parts one and two the HTML form used for discussion. I have included a modified version of

September 2005

●

PHP Architect

●

REQUIREMENTS PHP

n/a

CODE DIRECTORY

tips

this form in Listing 1. For the purposes of the present discussion, I have added the age, color , and username fields. Listing 2 shows the processing form as seen at the end of part two. Rounding out my three-part series on filtering input, this installment of Tips & Tricks includes discussion on using regular expressions to filter input, testing for the length of input, and ensuring the presence of acceptable values (e.g. from select, radio , or checkbox form fields, etc.).

www.phparch.com

Filtering with Regular Expressions In last month’s column, I discussed using PHP’s built-in character type (ctype ) functions to filter input. When application design allows, the ctype functions provide a fast and easy-to-use interface to implement a whitelist approach to filtering input. However, application design doesn’t always allow this, and the ctype functions lack flexibility. For example, ctype_alpha() only checks for alphabetic characters, 10

TIPS & TRICKS

Input Filtering, Part 3

Listing 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Listing 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

$value) { if (array_key_exists($key, $whitelist)) { switch ($whitelist[$key]) { case ‘string’: $clean[$key] = (ctype_print($value)) ? $value : ‘’; break; case ‘int’: $clean[$key] = (ctype_digit($value)) ? $value : ‘’; break; } } } return $clean; } $post_whitelist = ‘name’ ‘street’ ‘city’ ‘state’ ‘postal_code’ ‘phone’ ‘email’ );

array( => ‘string’, => ‘string’, => ‘string’, => ‘string’, => ‘int’, => ‘string’, => ‘string’

if ($_POST) { $clean = filter($_POST, $post_whitelist); } ?>

September 2005

●

PHP Architect

●

www.phparch.com

while ctype_digit() checks for only numeric characters. ctype_alnum() checks for both, but then it doesn’t allow for the presence of spaces, underscores, hyphens, or any other non-alphanumeric characters (nor do the previous two mentioned functions). On the other hand, ctype_print() is too open, allowing all printable characters, and this isn’t always a desired approach. When you know exactly what characters you want to allow, it’s best to restrict input to those characters—and only those characters. So, ctype_alnum() is good for usernames, and ctype_digit() is good for five-digit U.S. zip codes, but ctype_print() isn’t necessarily good for a first and last name, an email address, or a phone number. Good application design defines what characters these fields should accept; good filtering accepts only these characters. Enter PHP’s Perl-Compatible Regular Expression (PCRE) functions. These functions make up for their slowness—as compared to the ctype functions—with increased flexibility and power. Regular expressions can be used to match just about anything and can perform some amazing tasks. Take, for example, the name field in Listing 1. In Listing 2, I define it as a “string” type and then the filter() function filters it using ctype_print(). The decision to use ctype_print() over ctype_alpha() should be clear: I wanted to allow users to enter a space between their first and last names. However, now users can enter all sorts of random characters, characters that should not be acceptable for a name, so I turn to a regular expression to match a name. First, I come up with the following to replace the ctype_print() function: $clean[$key] = (preg_match(‘/^[A-Z ]*$/i’, $value)) ? $value : ‘’;

This works well for names such as “Ben Ramsey,” but suppose I want

11

TIPS & TRICKS

Input Filtering, Part 3

Listing 4 (cont’d)

Listing 3 1 2 3 4 5 6 7 8 9 10

Listing 4 1 array( 11 ‘type’ => ‘string’, 12 ‘maxlength’ => 50 13 ), 14 ‘street’ => array( 15 ‘type’ => ‘string’, 16 ‘maxlength’ => 100 17 ), 18 ‘city’ => array( 19 ‘type’ => ‘string’, 20 ‘maxlength’ => 50 21 ), 22 ‘state’ => array( 23 ‘type’ => ‘option’, 24 ‘options’ => array( 25 ‘Alabama’, 26 ‘Alaska’, 27 ‘Arizona’ 28 ) 29 ), 30 ‘postal_code’ => array( 31 ‘type’ => ‘postal’, 32 ‘maxlength’ => 10 33 ), 34 ‘phone’ => array( 35 ‘type’ => ‘phone’, 36 ‘maxlength’ => 25 37 ), 38 ‘email’ => array( 39 ‘type’ => ‘email’, 40 ‘maxlength’ => 255 41 ), 42 ‘age’ => array( 43 ‘type’ => ‘int’, 44 ‘maxlength’ => 3 45 ), 46 ‘color’ => array( 47 ‘type’ => ‘option’, 48 ‘options’ => array( 49 ‘blue’, 50 ‘red’, 51 ‘green’, 52 ‘yellow’ 53 ), 54 ‘multiselect’ => TRUE 55 ), 56 ‘username’ => array( 57 ‘type’ => ‘username’, 58 ‘maxlength’ => 16

Tim O’Reilly or Tim Berners-Lee to fill out my form; I’ll need to allow more characters. Also, assuming I want to use the “string” type as a general purpose string filter, I’ll want to make the regular expression a bit more liberal—but not too liberal. I’m still in control, so I want to accept only a small range of characters, a range of characters I

September 2005

●

PHP Architect

●

59 ) 60 ); 61 62 if ($_POST) { 63 $clean = filter($_POST, $post_whitelist); 64 } 65 66 function filter ($input, $whitelist) { 67 $clean = array(); 68 foreach ($input as $key => $value) { 69 if (array_key_exists($key, $whitelist)) { 70 $filtered = NULL; 71 if (isset($whitelist[$key][‘maxlength’]) 72 && (strlen($value) > 73 $whitelist[$key][‘maxlength’])) { 74 continue; 75 } 76 switch ($whitelist[$key][‘type’]) { 77 case ‘string’: 78 $filtered = (preg_match(STRING, $value)) 79 ? $value : NULL; 80 break; 81 case ‘int’: 82 $filtered = (ctype_digit($value)) 83 ? $value : NULL; 84 break; 85 case ‘option’: 86 if (is_array($value)) { 87 if ($whitelist[$key][‘multiselect’]) { 88 $filtered = array(); 89 foreach ($value as $option) { 90 if (in_array($option, 91 $whitelist[$key][‘options’])) { 92 $filtered[] = $option; 93 } 94 } 95 } 96 } else { 97 $filtered = 98 in_array($value, $whitelist[$key][‘options’]) 99 ? $value : NULL; 100 } 101 break; case ‘username’: 102 103 $filtered = (ctype_alnum($value)) 104 ? $value : NULL; 105 break; 106 case ‘email’: 107 $filtered = (preg_match(EMAIL, $value)) 108 ? $value : NULL; 109 break; 110 case ‘phone’: 111 $filtered = (preg_match(PHONE, $value)) 112 ? $value : NULL; 113 break; 114 case ‘postal’: 115 $filtered = (preg_match(POSTAL_US, $value)) 116 ? $value : NULL; 117 break; 118 } 119 if (!is_null($filtered)) { 120 $clean[$key] = $filtered; 121 } 122 } } 123 124 return $clean; 125 } 126 ?>

deem acceptable. A better, “general purpose” regular expression for matching strings is: /^[-A-Z0-9\.\’”_ ]*$/i

I won’t go into the particular details of how regular expressions work. There are books and Web sites for

www.phparch.com

that, but I will share a few of my preferred regular expressions for filtering standard types of information, such as e-mail addresses, phone numbers, and postal codes. Looking back at Listing 2, I defined the postal code with the “int” type, which works well in certain circumstances when only the five-digit U.S. zip code is accept12

TIPS & TRICKS

Input Filtering, Part 3

able, but what if I want to accept a zip+4 postal code? These are typically written as “12345-1234,” and will cause ctype_digit() to return FALSE, because of the hyphen. Since the “int” type is useful in other situations (e.g. the age field), I won’t rewrite its definition. Instead, I’ll create a new type for “postal,” and create a regular expression to accept either a five-digit zip code or a zip+4 code (with or without the hyphen). /^(\d{5})[\-]?(\d{4})?$/

Likewise, the e-mail and phone number fields in Listing 2 are of the “string” type, but I know that there are acceptable patterns I want to match for both of these. Plus, my existing “string” regular expression doesn’t allow the “@@” symbol, or parentheses. Thus, I create an “email” type and define its regular expression as: /^[^@\s]+@([-a-z0-9]+\.)+[az]{2,}$/i

I also create a “phone” type, giving it the following expression: /^[$]?(\d{3})[$]?[\s]?[\]?(\d{3})[\s]? [\-]?(\d{4})[\s]?[x]?(\d*)$/

These two regular expressions will match most e-mail addresses or U.S. phone numbers. In fact, the expression used for phone numbers here can extract all the parts of a standard phone number to the matches parameter of preg_match(), if desired. It should be noted, however, that the e-mail address regular expression used above will not match some addresses considered compliant according to RFC 822 guidelines. Take the following input, for example: “JJohn Doe (home address) <[email protected]>”. According to RFC 822 guidelines, this full string is acceptable, but the e-mail regular expression will reject it. Also, addresses that contain no TLD, such as jdoe@example, are valid RFC 822 addresses. If RFC 822 compliance is neces-

September 2005

●

PHP Architect

●

sary, then Listing 3 provides an alternative e-mail address filtering method using the PEAR::Mail package. This can also be accomplished using imap_rfc822_parse_adrlist() if PHP is compiled —with-imap. If portability is a concern, however, I suggest using the PEAR::Mail package.

rates all that I have discussed thus far. Notice how I have expanded $post_whitelist to include more information about each form field. Now, I associate an array with each field that defines the type of input to check against, in addition to several other details. One of those details is maxlength, which I check in the filter() function with:

Testing Input Length In part one of this series, I mentioned that, while the maxlength attribute of the HTML input tag controls how much data a user may enter when properly using a form located on the host site, it does not restrict the amount of data that a user may post when using a form located on another Web site, or when posting by some other means (see part one for more information). Likewise, client-side validation with JavaScript may provide good measure for practicing “defense in depth,” as well as a potentially better user experience, but it will not restrict the actual data that can be sent to the form processing script from somewhere else (e.g. another form on another Web site). Thus, it is necessary to perform all input filtering, or validation, on the server side, in addition to any client-side validation. Regardless of whether you filter input at the client, you must always filter input at the server. I have seen many sites that provide a maxlength attribute in their input tags but fail to test the length of the field from the server side. This leaves the processing script open to receive all lengths of data, which can lead to database constraint violation errors and, potentially, more dangerous issues. Checking the length of input, however, is simple, and, coupled with the maxlength attribute, it is easy to determine that a user is abusing the form if input received is longer than the expected length. Listing 4 is a finalized version of the filter() function that incorpo-

if (isset($whitelist[$key][‘maxlengt h’]) && (strlen($value) > $whitelist[$key][‘maxlength’])) { continue; }

www.phparch.com

Here, I use the continue statement to skip to the next item in the [foreach] loop, essentially excluding this value from the $clean array if it contains more data than expected. Since I have maxlength defined for these fields in my form, I am confident that no user using my form is able to enter more data than expected. If the input contains values that are longer than their respective maxlength, then I can assume that the user is abusing my form in some way, and I can safely exclude the input from the $clean array. Ensuring Acceptable Values In much the same way that maxlength cannot be relied upon to stop would-be attackers from sending unlimited amounts of data to form processing scripts, the values displayed in HTML select, radio button, and checkbox lists are not the only values that can be posted. Thus, it is necessary to filter the values of these fields and ensure that the input received is input expected. Again, this is not a hard practice to implement, but it does require more code. Take another look at Listing 4. In $post_whitelist, I’ve also added the “option” type, and for each item specified as type “option,” I have listed the expected options in the “options” array. For flexibility, I’ve also added the “multiselect”

13

TIPS & TRICKS

Input Filtering, Part 3

parameter that is defined on fields in which more than one item may be selected (i.e. checkboxes or menu lists). In the filter() function, under the “option” case of the switch statement, I check whether the input received is an array. If it is, then I further check to ensure that I’m allowing the user to select more than one item. If not, then the input received shouldn’t be an array, and I discard the data and move on. If it is a multi-select field, then I check to ensure that every item in the array matches those defined in the “options” parameter for the field. If it’s not an array, then I simply check to ensure that it matches one of the “options.” If it does, then I keep it; if not, then it is discarded. If a value is not acceptable—that is, it doesn’t conform to expectations—then I don’t keep it. It doesn’t get added to the $clean array. Notice how all values in Listing 4 are now set to NULL if they don’t

conform to expectations. Then, I check whether the value is null. If it is, I don’t save it to $clean. In part two of this series, recall that I did save it to the $clean array, with an empty value. I no longer do that, and, instead choose to completely discard the reference to the field. Now, the worst thing that can happen when working with user input is that a field doesn’t exist—but that’s easy to check and report. Moving Right Along Over the past three issues, I have given an in-depth look at input filtering in PHP. This discussion has covered such topics as “why to filter”, “using ctype functions and

regular expressions”, and “validating the length and acceptable values of received input.” I have discussed this all the while promoting a whitelist approach to ensure that input received is input expected. For future installments of Tips & Tricks, I would like to know what tips and tricks you are using. Please send your tip and/or trick to [email protected], and, if I use it, you’ll receive a free digital (PDF) subscription to php|architect. Until next time, happy coding!

About the Author

?>

Ben Ramsey is a Technology Manager for Hands On Network in Atlanta, Georgia. He is an author, Principal member of the PHP Security Consortium, and Zend Certified Engineer. Ben lives just north of Atlanta with his wife Liz and dog Ashley. You may contact him at [email protected] or read his blog at http://benramsey.com/.

To Discuss this article: http://forums.phparch.com/252

Award-winning IDE for dynamic languages, providing a powerful workspace for editing, debugging and testing your programs. Features advanced support for Perl, PHP, Python, Tcl and XSLT, on Linux, Solaris and Windows.

Download your free evalutation at www.ActiveState.com/Komodo30

FEATURE

Roll Your Own Database Abstraction Module by Jason Lustig

You may already use database abstraction in your applications, perhaps through one of the available database abstraction layers, such as PEAR::DB, or PDO (see the PDO article in this issue), but what about various idiosyncrasies in the actual SQL? Perhaps you’ve never even considered this problem. This article will help you the data abstraction beast.

H

ow does Adobe keep Photoshop working on both Windows and Mac OS, or Microsoft keep Office portable? Often, people take the route of maintaining separate codebases for different platforms. Mega-corporations have the resources to pull it off, but a smaller firm or even a lone coder probably couldn’t do it particularly efficiently. It’s one of the reasons why the Mozilla project decided to go with XUL as their frontend instead of maintaining different sets of code for Windows, Mac OS, Linux, and whatever else happened to come around. Prior to XUL, if the Netscape developers had to make a change, they had to update every codebase individually, and it was a major hassle. Web applications give us a little more freedom. HTML is fantastically portable—as long as there is a decent web browser for your desktop platform of choice, you will be able to access and work with your web applications. It has been argued that Microsoft has neglected Internet Explorer for exactly this reason: innovating too

September 2005

●

PHP Architect

●

www.phparch.com

REQUIREMENTS PHP

4

CODE DIRECTORY

abstraction

much in the browser space would kill the desktop, which is their big cash cow. Web applications are even more portable, on the server side, because most of the languages—be it PHP, Perl, or even some ASP, through emulators such as Chili ASP—can run on almost any web server in any operating system (within reason). The bottleneck to ultimate portability turns out to be the data itself. If you can abstract your data, then you will never be tied down again! This is, in a way, the “holy grail” of web application development: how can you make the database code portable but at the same time readable and hand-tuned for every database that you are writing for? How can I take advantage of lowlevel locking in Oracle when my MySQL code doesn’t

16

FEATURE

Roll Your Own Database Abstraction Module

even have transactions? How can I abstract my data to an extent that it can be used by all kinds of databases? It’s possible; I’ve done it. I was able to port my 200,000-line web application from MySQL to PostgreSQL in about two hours on a lazy Sunday afternoon. What Is Data Abstraction, and Why Bother? Data abstraction is when your application does not have to worry about where its data comes from. In the world of web applications, because most people use databases to handle their data, usually this translates into database portability—the ability for your application to interact with all different kinds of databases.

Server to mirror all of the data in our point of sale system, so that we can mess around and not have to worry about corrupting our actual, production data. This set up generally works pretty well. One day, we were having some trouble with the server, and my boss, who is pretty smart, and has some technical background, said “Would it help if we switched to Oracle?” The answer to this is, really, “I have no idea if it would help if we switched to Oracle.” We don’t need any of its fancy table locking features or anything like that, and SQL Server has been pretty good to us so far. It would be a lot of work to import our databases. The reason that I said “no, Oracle would not help us much at all,” is because we have many scripts, programs,

“The bottleneck to ultimate portability is the data itself. If you can abstract your data, then you will never be tied down again!”

Is portability worth it? In a perfect world, our computers would work properly most of the time, and we wouldn’t have any reason to switch operating systems, web browsers, etc. Why would we want to keep our code portable? Are there such big advantages that make the hassle worth the pain and suffering? (Because it is extra work to keep code portable, since you need to test across multiple systems.) It depends on your goals. There are definite advantages to portable code, such as opening up the market for your application to a larger group of people, avoiding lock-in, and more, but there are also disadvantages. Grow your market. If you are selling or otherwise making a computer program that other people will use, whether it’s a web-based application or not, it would be great to be able to offer it to more people. That’s the reason why the big guns (like Adobe) keep their software running on both Mac and Windows platforms. If they picked only one system to support, it would really cut costs, but would also alienate large group of potential customers. The more databases your web application supports, the larger the number of people who might be interested in purchasing or downloading it. Portability keeps your code more readable and more maintainable. If you use a modular approach to data abstraction, as I do, or even if you use an abstracted set of functions like query() instead of mysql_query(), then your code will be easier to read and maintain down the line. This is something nobody will argue against! Avoid lock-in. In my “real” job, I work in retail doing market research. Our current setup uses Microsoft SQL

September 2005

●

PHP Architect

●

www.phparch.com

nightly jobs, and other little bits of code written for SQL Server and fine-tuned to cater to its nuances and bugs. To port all of this code would take weeks and would not save us nearly enough time, in the long run. We’ve been locked in. Now, there is nothing particularly wrong with this, because we are doing everything internally and really there is no reason why we would want to switch to another database. But if we had to, we would really be in a bind. Unfortunately, it is much more difficult to keep code like this portable than it is to keep web applications portable. “Too Portable” or “Too Abstract?” It depends on what you are trying to accomplish. Just like many other processes that improve performance, grow your market, and make things easier to do, the concepts of data abstraction and portability function under the law of diminishing returns. What this means in English terms (as opposed to the economic mumbojumbo that it really consists of) is that as you make your code more and more portable, the benefit that you get out of it tends to decrease over time. So, when you first abstract your database, swapping mysql_query() for PEAR::DB, or another similar abstraction layer like ADODB, the relative increase in productivity will be greater than when you then go and abstract your queries, or do something crazy like begin to use an XML-based definition of your database structure. The key is to find a balance. You need to determine

17

FEATURE

Roll Your Own Database Abstraction Module

the point at which you are kidding yourself—where additional abstraction will cease to help you out. When you’ve reached this critical juncture, you should stop fussing around and get to programming your real application. This isn’t to say that abstracting your data isn’t worth it. But depending on the application you are writing and the job it is supposed to do, sometimes abstraction isn’t worth the time that you would spend to maintain it. A simple formula might be: the time spent maintaining data abstraction, divided by (the time it takes to

the language is standardized to a certain extent; you can assume that basic SELECT, UPDATE, and DELETE statements, JOINs, etc. will work on most modern databases. The tricky part with writing code like this is that you need to test it on all supported databases. When you make a change to the SQL, it might break some databases and not others. It increases the amount of QA work that needs to be done, while minimizing the amount of actual code you have to write. One important thing to remember is: databases are already a form of abstraction. They abstract away the

“Portability keeps your code more readable and more maintainable.”

write the application in the first place multiplied by the amount of time you plan to spend maintaining the application). If the result of this formula is greater than one, it probably isn’t worth it to abstract the data any more than you have to, in order to get it working properly without killing yourself with PHP’s arcane function names. Otherwise, it makes sense to abstract the data to your heart’s delight. Luckily for us, making a good abstraction layer is easy enough, and the learning curve is such that you can get used to it quickly enough, that the time to maintain data abstraction is usually low enough to guarantee that most of the time, it really is worth it. Let’s Get to Business, Shall We? We’ll begin with some simple pseudo-code to connect to a database, pull some data, and then display it. We are going to eventually abstract away different portions of the database code, in varying amounts, to try to find the “sweet spot” where we’ve balanced portability with the time we’ll spend on further abstraction. See Listing 1. Easy enough, right? We are already using some sort of basic database abstraction. We don’t call mysql_query() or postgresql_pconnect() anywhere in here; we have abstracted away the PHP functions so that we can rewrite the class to connect to an additional database. In fact, you might notice that the function names are similar to the ever-popular PEAR::DB abstraction class. It’s my personal favorite, because it is simple and takes care of most of the hard work for you, and at the same time it does not force you to abstract your database calls any further than you want. Additionally, the SQL code itself is pretty portable—

September 2005

●

PHP Architect

●

www.phparch.com

idea of data sitting on the disk in zeroes and ones, and think about it as tables and rows. SQL stands for Structured Query Language, and the theory is that it should be standard across all the different database engines. So, if you wrote your code with standard SQL, it should be portable… right? The problems that arise are often related to databasespecific extensions to the SQL standard. “Why use extensions?” you might ask. “Just stick to the standard—databases should be standards-compliant, just

Listing 1 1 2 3 4 5 6 7 8 9 10

query($sql); while ($result->fetchInto($row)) { var_dump($row); } ?>

Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

query($sql); while ($result->fetchInto($row)) { var_dump($row); } ?>

18

FEATURE

Roll Your Own Database Abstraction Module

like web browsers!” Reality is that databases just aren’t always so standards-compliant. MySQL (before version 5) didn’t support stored procedures, and has a number of different table types, many of which handle locking and transactions differently. Oracle and Microsoft SQL each have a hundred handy little features that they have added to the standard which, in theory, make it easier to write applications. These features often serve as convenience functions, and allow you to do things like grab only one row, quickly. Why not take advantage of these extra features? If you don’t, you are just hurting your application by making it slower. But, if you have the SQL itself hardcoded into your main code, there is no way to really do this, right? Wrong. If you were so inclined, you could dynamically generate the SQL query, based on the database platform you are using. Say, for example, that you want to select the top ten rows from a table, and want to support both MySQL and Microsoft SQL Server. These two databases use different syntaxes to limit the number of rows returned from a query. SQL Server uses “ttop xx” and MySQL uses “llimit xx”. However, the code in Listing 1 could be adapted to support both databases, as in Listing 2. Easy enough, right? In theory, yes, but it makes your code impossibly hard to maintain, especially if, one day, you decide that you also want to support Oracle, PostgreSQL, Firebird, and maybe also DBase or SQLite. Additionally, it is less secure because it opens the door to making some big mistakes, since you are always gen-

erating the SQL statement on-the-fly. What if you mess up and put something inside the “$$sql .=” portion that shouldn’t be there? This opens the $sql variable up to a possible injection attack. It is a hacker’s paradise. Roll Your Own Language Let’s say you just want to have one set of database code to rule them all. You could go the route of abstracting the idea of your query, and then write a class that will generate the SQL as necessary. You could add the ability to set optimization flags, if the database can handle it. Depending on the database, your SQL generator will either pay attention to or pretend these flags didn’t exist. Let’s look at the same code again but with a made-up SQL generator (Listing 3). In this latest attempt to abstract our database query, we have gone to great lengths to tell our code what we are trying to do. Essentially, the db_query::generate() function can figure out which database we want to talk to, and create an optimized query at will. You don’t even need to use a function-based abstraction; you can create XML files that describe your queries, or even your entire database structure, making it human-readable, as well. But is it the best way? Personally, I don’t think so. You end up just writing your own query language that needs to be debugged and audited for security. You’d have to maintain another complex abstraction layer in your application, when you could instead be writing

Figure 2

Figure 1

A Unified Binary contains executable code for both the x86 and PowerPC (PPC) architectures in one file

An Ideal Web Application Figure 3

The directory structure of our application makes it easy to make new data abstrac-tion modules and to differentiate between them.

September 2005

●

PHP Architect

●

www.phparch.com

19

FEATURE

Roll Your Own Database Abstraction Module

another simple layer. All too often, people over-abstract their applications and focus too much on the framework and not on making features that make their application cool and fun to use. Unified Binaries, Unified Abstraction Over the past few years, while working on various applications, I have developed a method which, in my opinion, is the best that I have seen. It’s a system that allows you to create new modules—or port your code to new databases—quickly and easily. In fact, this method makes it so that your “core” application never actually touches the database or whatever sort of data store you’re using. This opens up all sorts of interesting possibilities, because your application doesn’t care which database stores the data. It really doesn’t even need to be a database. You could write a module that stores your data in flat files, or even shared memory, if you wanted to. We’ll cover that, later. Most people will agree with the idea that modular applications are a good thing. This “ideal complicated program” is made up of modules that interact with each other, through interfaces, abstracting away the ugliness of any code that may reside underneath. Other portions of your program can assume (within reason) that this abstraction layer simply works and you will never have to think so much about what’s actually happening. You will only need to work with the data that is returned from the modules. Using a simple, standard way of returning errors from the database modules, failures can be easily handled, as well. For those who don’t understand all that architecture mumbo-jumbo, let’s draw a picture of this “ideal” web application (Figure 1). The idea, here, is that each level of the application takes care of one aspect of displaying a page, whether it is generating the HTML code that is sent to the browser (templates), the “business logic,” sanitizing users’ input data, or anything else that a typical web application must do. This makes “n-tiered applications”, where n represents the number of tiers (also known as levels, modules, or by many other names). The most popular and well-known of these n-tiered models in the web application space is the three-tiered application, also known as Model-View-Controller. In the Model-View-Controller architecture, you have three levels: a database (“model”), the business logic (“controller”) and HTML generator (“view”). There are many benefits to this model, especially in terms of scalability. You can put each of these three tiers on different groups of servers, and if you need to be able to support more users, just throw more hardware at your application. Having multiple-tiered applications is great for other reasons as well, including cleaner code, and better documentation. You can also pull out modules and replace

September 2005

●

PHP Architect

●

www.phparch.com

Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

fields[] = ‘*’; $query->table = ‘mytable’; $query->check(‘something = 5’); $query->limit_rows(10); $result = $db->query($query->generate()); while ($result->fetchInto($row)) { var_dump($row); } ?>

Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

fetchInto($row)) { var_dump($row); } } ?>

Listing 5 1 2 3 4 5 6 7 8 9 10

Listing 6 1 2 3 4 5 6 7 8 9 10

sql__select_top10_from_mytable() database connectivity class $db; ‘select * from mytable limit 10;’; $db->query($sql);

Listing 7 1 2 3 4 5 6 7 8 9 10 11 12

quote($test).’ limit 10;’; return $db->query($sql); } ?>

20

Roll Your Own Database Abstraction Module

them with others that have the same API, but work in a totally different manner, underneath. This is where it gets interesting with regard to our data abstraction problem. Another other great advantage of web applications is that, for the most part, especially if you use a language like PHP, they are dynamically compiled and run. This means that you can interchange files at will, and users will not be able to tell the difference. We’ll take advantage of this, to create multiple database modules that work along a set interface to our business logic. In this way, to create a new module—in other words, support a new database system—all we need to do is port one database module’s code to the new database, and voila! Your application has been ported to a new database. Usually, when people talk about database modules, it is for the most part constrained to database connectivity, as we looked at before. Connectivity defines how your application talks to the database and sends queries and other messages back and forth. We can use one if we want to with this system, but ultimately, because of our modular system, it does not matter where we are getting the data nor where we are storing it, so long as it conforms to the set interface that our “business logic” knows how to deal with. This modular tier that I propose won’t live on a different server (though you could put the files on one), because it is actually a part of the “controller” level of the application. It is surprisingly similar to Apple’s “Unified Binary” approach to compiling programs for both the PowerPC and x86 CPUs, which is why I like to refer to it as “Unified Abstraction.” What is a “Unified Binary,” how does it work, and what does this have to do with data abstraction? Well, Apple has a peculiar situation coming up where it will be supporting two CPU families: IBM’s PowerPC, which is what Macintosh computers have used for the past ten years or so, and Intel’s Pentium (x86) family of processors. This presents a major problem for software developers. What are you going to do about developing for both processors, since a binary compiled for PowerPC won’t run on x86, and vice-versa? It’s a very similar problem to our issue with databases. The solution that Apple came up with is this: within the “application” that you create is really two binary programs. One is compiled for the PowerPC processor, and the other for x86. When you open up a Unified Binary, Mac OS will just use whichever binary is compatible with your computer, and it can use resources (internationalization files, images, etc.) normally, because they are just normal files. We will use a similar method. When you create the PHP script for your web application, you will write it as a core file that doesn’t really care about which database you’ve chose; this is similar to the resource files. It figures out which database we are working with, and then September 2005

●

PHP Architect

●

www.phparch.com

FEATURE calls the appropriate database module, which is analogous to different binaries for PowerPC and x86. The key is that your application somehow needs to know which database it is using. Somewhere, you are storing the database connection credentials, such as the username, password, hostname, and so on. In this same place, you can keep information about whether you are connecting to MySQL, Oracle, Microsoft SQL Server, or even a flat-file database. We can add an extra line, $dbtype = ‘mysql’, to our examples. The main scripts that live in your web-root, which is what people see when they come to your site, won’t contain any actual database calls. Rather, they call functions that return database records. Alternately you can use an object-oriented approach, though I prefer simple functions because they lead to less code, which is, in turn, less complicated. In Listing 4, the sql__select_top10_from_mytable() function ideally return a PEAR::DB_Result object. We use the DB::isError() function to check to make sure that our query worked properly. You may have noticed that Listing 4 won’t run because it’s missing the declaration of the sql__select_top10_from_mytable() function. This is because the listing contains only the core script, which hasn’t yet called the database module. Let’s create a script called dbtest.php, and place it in our application’s root directory. We could create a subdirectory called sql, and within that, another directory, mysql, pgsql or whatever we want. This nested directory would contain our database module. In that way, we can create new modules simply by creating another directory beneath sql, such as mssql or oracle. How does the PHP file know where to find the SQL file associated with it? Listing 5 shows a function that performs this task. Within our main script, we can just add the line require_sql(‘dbtest.php’); and our file will be included. Within sql/$dbtype/dbtest.php is the function shown in Listing 6. Of course, you could name the function anything you like, but I usually choose to preface them with sql__ (and then, usually with something dealing with the name and location of the associated core script, in a larger application), because this way, functions won’t have the same name, thus avoiding naming conflict. You could also pass it variables, as shown in Listing 7. In this way, you could have a similar function that uses the alternate method of limiting rows, within the mssql module. Your application would be none the wiser—it would just proceed as normal, and wouldn’t care at all if you used limit or top within the query. You can optimize each query for each specific database, as much as you like, and you’ll not have to worry about the fact that all those obscure keywords might fail on another database system.

21

FEATURE

Roll Your Own Database Abstraction Module

To port your application to a new database, all you’ll need to do is take the database module whose SQL syntax is closest to the one you are porting to, duplicate its directory within sql/, and rename it appropriately (to e.g. oracle or dbase, etc.). Then, just go in and change the SQL calls so that it takes advantage of the new database’s features, and voila! You now support a new database type! The reason why you would start by copying the module for the database whose syntax is most similar to your new database is to require the fewest possible changes to the SQL within the module. Maintanence “Alright,” you might be saying, “this sounds interesting, but also it seems like a lot of work to maintain!” It really isn’t that much work, once you’re used to it. When you want to change a database query, you just need to change the SQL in each of the database modules. It’s also easy to add new functions, because, if you first write only a simple function that doesn’t use advanced and non-portable features of your favorite database, you can just copy the function over to your other modules and then go and make each one take advantage of your table hints or other bells and whis-

tles. Of course, if you keep your database modules welldocumented, maintenance is easier, as well. Conclusion Data abstraction can be done in many ways. The method that I have suggested is one that I personally prefer because of the ease of porting applications to new databases and data storage methods. It isn’t for everyone or for every project—just like some quickand-dirty applications don’t necessarily separate content from logic using templates, sometimes abstraction isn’t worth it. Database abstraction, at the SQL level is one of those things that doesn’t usually hurt too much, and helps out in the long run.

About the Author

?>

Jason Lustig is a student at Brandeis University in Boston. He is a freelance programmer who dabbles in database and application design, and works part-time doing market research and data mining.

To Discuss this article: http://forums.phparch.com/248

Available Right At Your Desk All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.

What You Get Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning

Curriculum The training program closely follows the certification guide— as it was built by some of its very same authors.

Sign-up and Save! For a limited time, you can get over $300 US in savings just by signing up for our training program! New classes start every three weeks!

http://www.phparch.com/cert

September 2005

●

PHP Architect

●

www.phparch.com

22

FEA TURE

An Introduction to

PDO by Ilia Alshanetsky

A common complaint of the anti-PHP “expert” is the lack of a bundled, uniform database access component. With the advent of an improved object model, in PHP 5.0, a few of PHP’s core developers decided that the time has come to fill this hole with PHP Data Objects (PDO). The pack-

age, itself, has been in PECL for quite a while, now, but with the upcoming release PHP 5.1, PDO will be bundled in the main PHP distribution. What does it do? How does it work? One of PDO’s main developers explains.

N

While this approach has proven to be quite affective over the years, it does pose one particular problem: the PHP APIs for talking with most databases are relatively similar, but are far from identical. This problem is most apparent in the functions defined by the various database extensions. Each has its own, distinct, set of functions. For example, the MySQL extension uses mysql_fetch_row() to retrieve a record as an array of elements, while PostgreSQL makes use of pg_fetch_row(). Aside from the differences in the names, the parameter order of the functions is also eclectic. Using MySQL and PostgreSQL as examples, the former’s query execution function does not require a database connection resource—and if one is provided, it takes the last position in the function call’s parameter list. In PostgreSQL, and several other extensions, a database resource is required, and must be supplied as the first parameter to the function. Document the differences between the various extensions would proba-

early everyone who has ever employed PHP has used it to talk to a database system. In most cases, a database provides a highly flexible and capable information storage and retrieval engine, ideal for data gathering and analysis. It is really no wonder that databases use is so prevalent in the developer community. As with most popular tools, there are often multiple approaches to the same problem, and database systems are no different from the norm. There are literally dozens of different database systems all competing for your attention as the best way of dealing with information. PHP—the language of choice for millions of developers—unsurprisingly supports the majority of these database engines, to ensure that no one is left out or feels neglected. In most instances, the development of a database interface in PHP is not the result of a master plan or even a consequence of a well-planned specification, designed to provide the ideal method of database communication. More often than not, it is the result of a situation where a developer needed to have PHP connect to a previously unfamiliar database. By taking some existing code, possibly from other database extensions, and adjusting it to work for their particular database, the developer creates an initial interface. Usually, other users and developers then come up with tweaks, additions and refinements to the initial code base that eventually evolves into a full database extension. September 2005

●

PHP Architect

●

www.phparch.com

REQUIREMENTS PHP

5.0+

OS

N/A

Other Software

PDO and an appropriate driver: http://pecl.php.net/pdo

Code Directory

n/a

23

FEATURE

An Introduction to PDO

bly require an entire book, and is far beyond the scope of this article. The API difference is something that is of little concern the developers who only communicate with a particular database; it does, however, present a serious problem to those who need to support multiple database back-ends. This has lead to the creation of numerous database abstraction libraries. These range from simple ones that merely choose the right native function for the job, and possibly juggle the arguments, to complex and ultimately slow beasts that not only abstract the interface, but also try to handle various incompatibles between the database systems, themselves. This has been somewhat of a pet peeve for the PHP core development community. This is why we decided to address the issue—during LinuxTag 2003—with the advent of PHP Data Object (PDO). PDO was designed to use the latest PHP 5 object orientation support to provide a common API for all database systems with which PHP can communicate. By creating a common database communication interface, the need for the majority of database wrappers is eliminated. Because it was written in C, rather than PHP, the interface is very fast, and has very minimal—if any— overhead to the native interface. Furthermore, PDO aimed to identify common operations that are performed on a database, and provide easy and convenient means of applying (or emulating if necessary) them, for all supported databases. These abilities include: • execution of INSERT/UUPDATE/DDELETE queries • retrieval of data from a database in various forms: • as an array • as an object (new of pre-existing) • into bound variables • as a string • retrieval of all rows as a multi-dimensional array • prepared statement querying • the use of transactions • auto-commit support • the ability to normalize the case of table columns Thus, the “only” thing the code author needs to worry about is the differences in the databases themselves, which is simple enough as long as you use standard SQL. Current State of Affairs At this time, PDO has reached the majority of the initially-set goals and offers nearly all of the initiallyplanned features. It also includes support for all major databases with

September 2005

●

PHP Architect

●

www.phparch.com

which PHP can communicate: • MySQL 3 and 4 (ppdo_mysql) • PostgreSQL (ppdo_pgsql) • SQLite 2 and 3 (ppdo_sqlite – in fact, PDO is the only way to connect PHP to SQLite 3) • Oracle (ppdo_oci) • Firebird (ppdo_firebird) • MSSQL and FreeTSD (ppdo_dblib) • ODBC (ppdo_odbc) All of the drivers (with the possible exception of the Firebird driver) are quite stable and are regularly tested for both bugs and functionality. At the present time, some are already being used on production systems. Nonetheless, PDO and its drivers are a relatively new addition to PHP, and as such, may contain some yet-tobe-discovered bugs, so consider yourself warned. Installing PDO How do you get PDO? In PHP 5.1 (which should be out shortly), the PDO core extension and its SQLite driver are enabled by default. Other drivers are part of the standard distribution; however, they need to be explicitly enabled via a configuration switch. These usually are in the —with-pdo[database_type]=[interface_lib_path] format. For example to enable MySQL support you would use the –-with-pdo-mysql=/usr/local/mysql, assuming that the MySQL client library can be found in /usr/local/mysql. For PHP 5.0.X users, the situation is a bit different. Because PDO is not part of the standard distribution, it must instead be downloaded and installed from the PECL repository, or downloaded in binary form (for Win32 users), from http://snaps.php.net/. For installation from PECL, you simply need to execute the following commands: pear install pdo pear install pdo_[driver] #(example: pear install pdo_sqlite)

Upon execution, these commands will download the latest stable PDO release, and then automatically compile it. The next step involves loading the compiled PDO modules into PHP via php.ini: //*NIX users extension=pdo.so extension=pdo_sqlite.so // Win32 users extension=php_pdo.so extension=php_pdo_sqlite.so

In PHP 5.0.x, there is no automatic handling of module dependencies; therefore, it is absolutely imperative that the PDO extension, itself, be loaded prior to any of its drivers. Failure to follow the correct loading sequence will usually result in a prompt crash, due to the driver

25

FEATURE

An Introduction to PDO

attempting to access information that is not yet available. PHP 4 users are, unfortunately, out of luck. PDO relies heavily on OO features only found in PHP 5 and higher, and simply does not work on previous releases. Starting to use PDO The first step in using PDO is not too dissimilar from using any other database interface. This procedure requires the creation of a database connection handle, which in the case of PDO, involves instantiation of a PDO object. The constructor of the object takes a number of parameters, but the only required argument is the DSN. The DSN, in most cases, defines the hostname

“One trick with PDO is the ability to iterate through PDOStatments, via the foreach construct.” and the database to talk with. For some databases like PostgreSQL and Firebird, it can also be used to specify the login and password—however for most databases, this information is supplied via the 2nd and 3rd arguments to the method, respectively. The constructor also takes an optional 4th argument that can be used to specify an array of attributes. These additional directives can only be set during the connection initiation phase, and adjust the entire connection for features like auto-commit, and set regular attributes like error reporting mode, etc… // MySQL connection new PDO(‘mysql:host=localhost;dbname=testdb’, $login, $passwd); // PostgreSQL new PDO(‘pgsql:host=localhost port=5432 dbname=testdb user=john password=mypass’); // SQLite new PDO(‘sqlite:/path/to/database_file’);

When it comes to the DSN parameter, it must always start with a database identifier, such as mysql: that allows PDO to determine which underlying driver to use. The remaining attributes indicate the actual connection parameters. In most cases—as demonstrated with the MySQL driver—the connection tokens are separated with a semicolon. One notable exception is

September 2005

●

PHP Architect

●

www.phparch.com

PostgreSQL, where the database client supports its own DSN style, natively, so PDO supports the native format to make things easier. SQLite is another exception to the rule: the database is just a file, so the only token (aside from the driver identifier) is either the path to the database or the special “::memory:” string for memorybased databases. The connection process is the really the one place in PDO where differences between databases are exposed—the rest of the code is standard. As with most object oriented extensions, failure during object construction—which translates to connection failure in this case—will cause PDO to throw an exception of type PDOException. A thrown exception is something that you definitely want to catch. Uncaught exceptions, in PHP, result in PHP’s native engine raising a fatal error, which terminates the currently-running script. try { $db = new PDO(…); } catch (PDOException $e) { echo $e->getMessage(); }

In most cases, the message component of the exception should provide sufficient information to indicate why the connection to the database could not be established. As with all exceptions, additional debug methods are available that can be used to gather the location of the code that has triggered the exception via the getFile() and getLine() methods. It is even possible to get the list of function and method calls that lead up to the offending code, via the getTrace() method. In some cases, it may be undesirable to include the database authentication DSN string directly inside the script. For those situations, PDO provides two alternatives to the default mode demonstrated in the previous examples. One approach is to use an INI setting that can store the entire DSN, and then be referenced via a special “name” token, which aliases the pdo.dsn.name configuration directive. The INI directive is without any scope restrictions, meaning that it can be set in php.ini, httpd.conf or .htaccess on Apache servers or even defined via ini_set() within the script itself. ini_set(“pdo.dsn.name”,”sqlite::memory:”); $db = new PDO(“name”);

As you might have guessed, the token is actually the last part of the INI setting name. So, you could easily do PDO(“ilia”), in which case the DSN will be fetched from the pdo.dsn.ilia INI directive. While the naming convention is somewhat amusing, this does have practical uses. By being able to use a custom name, each application can define its own connection string, without creating conflicts. For example, FUDforum can use pdo.dsn.fudforum, while phpMyGallery—which could be running on the same virtual host—would use

26

FEATURE

An Introduction to PDO

pdo.dsn.gallery, and so on. Another way to denote the DSN involves the use of the uri: prefix, followed by the path to a configuration file that contains the connection string. This method is a bit inefficient, since it requires an extra file access for every database connection attempt, but in some cases it may be worth it. $db = new PDO(“uri:/etc/app/config” . md5($_SERVER[‘DOCUMENT_ROOT’]));

For example, let’s say you want to install 50 instances of the same application, all using the same code base but different databases. Each application can be made to look for a separate DSN, which could be determined by the md5 hash of the document root where the application runs. In a scenario like this, the code does not need to be altered in anyway way and the custom configuration files can be easily generated by the installation process. Executing Queries Once the database connection is established, and the PDO object is available, a number of operations can now be performed via a variety of methods. These methods include exec(), which was designed for execution the of queries that perform an operation, but do not return a record set. Examples of this type of query are the UPDATE, INSERT, and DELETE operations. Upon successful execution of the query, the exec() method will return the number of rows that were affected by the operation. If no rows were affected, the value of 0 will be returned, and in the event that the query failed, due to an error, the function will return a boolean FALSE. $rows_affected = $db->exec(“INSERT INTO my_table (row1,row2) VALUES(1,2)”); if ($rows_affected === FALSE) { // query has failed $einfo = $db->errorInfo(); echo $db->errorCode().”: “.$einfo[2].”
\n”; }

Because PHP is a type-insensitive language, a casual comparison will cause both FALSE and 0 to evaluate to the same thing, due to internal type normalization. Subsequently, if in my code example I had used “iif (!$db->exec())”, or “$$db->exec() or die();”, the error condition would be triggered for a perfectly valid operation that didn’t actually fail. To avoid this situation, the “error check” is performed as a separate operation that compares the returned value to FALSE in a type-sensitive manner via ===. In the event of an error—such as a failed INSERT— there are two methods available for the purpose of retrieving information about the cause of the failure. First, we have the errorCode() method that returns an appropriate SQLSTATE code—a 5 byte long alphanu-

September 2005

●

PHP Architect

●

www.phparch.com

meric string, indicative of the error that has occurred. SQLSTATE codes are a cross-database standard for reporting errors, and by returning them, PDO provides a database-independent, and consistent way of identifying errors. If a bit more detail about the error is required, the errorInfo() method can employed to return database’s native error code and error message as the 2nd and 3rd array elements, respectively. This is generally useful in situations where queries have failed due to syntax problems and you want to see the part of the query that the database was incapable of parsing. PDO’s Approach Error Handling While we are on the topic of errors, let’s quickly examine how the PDO extension handles problematic situations. With the exception of a failed connection, which results in a the throwing of an exception, PDO keeps quiet regarding errors, and—unlike many other database extensions—does not emit warning or error messages. The failed operation simply returns FALSE and leaves it up to the developer to detect and handle the situation. This is, however, something that can be easily altered, by changing PDO’s error handling mode via the setAttribute() method. This method is the primary mechanism for changing PDO settings (although, settings can also be changed via the 4th parameter of PDO’s constructor). The attribute of interest in this case is PDO_ATTR_ERRMODE, which controls the error handling. The possible values are: • PDO_ERRMODE_SILENT: the default mode of operation, where errors and warnings are not raised • PDO_ERRMODE_WARNING: triggers warnings when an operation fails, and • PDO_ERRMODE_EXCEPTION: makes PDO throw exceptions on any failed query // an example of PDO::setAttribute $db->setAttribute(PDO_ATTR_ERRMODE, PDO_ERRMODE_WARNING);

Settable Attributes Settable attributes in PDO are divided into three sections. The first of which is “connection time only” attributes, whose values can only be specified when establishing the database connection. These include PDO_ATTR_TIMEOUT, which defines the maximum number of seconds PDO will wait for the database system to respond, and PDO_ATTR_PERSISTENT, which can be used to toggle the use of persistent connections. Unlike the native drivers, PDO does not require a separate driver—the connect operation handles this:

27

FEATURE

An Introduction to PDO

new PDO($dsn, $login, $pass, array(PDO_ATTR_TIMEOUT=>5,PDO_ATTR_PERSISTENT=>1));

The persistent connections implementation in PDO has an additional feature, which may seem a bit unusual, but is actually quite useful. This feature gives developers the ability to “name” the persistent connection, by specifying a string value as the persistent setting’s attribute. This functionality allows multiple, completely distinct persistent connections to exist within the scope of a single script: new PDO($dsn, $login, $pass, array(PDO_ATTR_PERSISTENT=>”con2”));

There are certain standard settings that are supported by all PDO drivers, such as PDO_ATTR_CASE, which defines the case of the column names, as well as PDO_ATTR_ERRMODE, which we’ve already covered. The former ensures that when it comes to retrieving data into an associated array, the keys—which are based on column names—are predictable. Different database systems have different rules pertaining to the handling of column names; this makes a normalization routine necessary for consistent database behavior. The final class of attributes is intended for setting options that are specific to a certain database. These, in most cases, are used to expose database-specific features via PDO, or overcome the limitations of a given native driver. One such attribute is the long-winded PDO_MYSQL_ATTR_USE_BUFFERED_QUERY, that enables the use of buffered queries within the MySQL driver. By default, the MySQL driver uses unbuffered queries, which are more memory efficient, but prevent you from working with multiple result sets at the same time. You can usually spot attributes in this class by checking to see if their name includes the name of the database. Since there is a way to set attributes, logic dictates that there would also be a way to determine their existing values. In PDO, this is made possible via the getAttribute() method, which takes the attribute constant and returns its current value. In addition to the settable attribute, this mechanism can also be used to retrieve the information about the database you are currently working with. For example, the PDO_ATTR_SERVER_INFO attribute can be used to retrieve the information about the server with which you are communicating, and the PDO_ATTR_SERVER_VERSION will return you the server’s version string—e.g. 4.1.11-max for MySQL. Another handy, non-settable attribute is PDO_ATTR_CONNECTION_STATUS, which enables retrieval of the connection status. This particular attribute becomes especially useful in instances where persistent connections are being used, as those may timeout due to extended periods of inactivity. It is possible to use this attribute to easily determine if the persistent connection acquired by PDO is usable or not.

September 2005

●

PHP Architect

●

www.phparch.com

Post-INSERT Record ID Retrieval Reversing a little back, let’s come back to our INSERT query example that we performed via exec(). Many database systems support an auto-incremented ID that is added to every inserted row. This ID allows for a quick and simple identification of the row, for all sorts of purposes. In MySQL, this is done by setting the column specification to INTEGER AUTO_INCREMENT PRIMARY KEY, in SQLite, it’s as simple as INTEGER PRIMARY KEY, and in PostgreSQL, a SERIAL column type that is attached to a sequence facilitates this need, and so on. It is a common behavior to retrieve this auto-created value so that the inserted record can be associated with another subset of data. PDO offers the lastInsertId() method, to facilitate the retrieval of this value. The method, when executed at the end of a successful insert, returns the row identifier. $db->exec(“CREATE TABLE my_table “ . “(id INTEGER PRIMARY KEY, a INT,b INT,c INT”); $db->exec(“INSERT INTO my_table (a,b,c) VALUES(1,2,3)”); $id = $db->lastInsertId();

There is a bit of a peculiarity with this functionality, when it comes to PostgreSQL. By default, the value returned is the OID id, which is an internal row counter assigned to each table record. This value is not the value of the current SERIAL column; however, it can be resolved via a query such as this: $oid = $db->lastInsertId(); $q = “SELECT id FROM tbl_name WHERE oid={$oid}”; $id = $db->query($q)->fetchColumn();

This is, however, rather inconvenient, and is somewhat slow, as the process requires execution of an additional query and retrieval of fetched results. To improve the situation, the authors of the PostgreSQL driver added an optional parameter to the lastInsertId() method which is the sequence name for the affected table. By specifying this parameter, you allow the underlying code to query the sequence, directly, and make the method return the desired ID right away. The sequence names (which are generally picked automatically by the database) are predictable, given the name of the table. For example, given the counter column name of id and the table name of foo, the generated sequence name would be foo_id_seq. Transactions When performing database modification operations, such as inserts and updates, it is often necessary to maintain consistency between operations. One common way to address this is by grouping queries into transactions, which maintain data integrity. Another benefit of transactions is the ability to undo, or as it is referred into in industry terms, rollback a transaction if an error occurs while processing the

28

FEATURE

An Introduction to PDO

queries found within. This would result in the reversal of the affected tables, to the state they were in, prior to transaction initiation. On the other hand, if all queries were performed successfully, then the changes can be committed to the database in a quick and consistent manner. To encourage transaction use, PDO provides three methods for working with them. while (1) { $db->beginTransaction(); // start transaction for ($i =0; $i < 10; $i++) { if ($db->exec(“INSERT INTO foo …”) === FALSE) { $db->rollBack(); // query failed, abort break 2; } } $db->commit(); break; }

The beginTransaction() method, as the name suggests, initializes a new transaction, inside which any number of queries can be executed. In the event that a query fails, the transaction can be aborted via the rollBack() call, or if no errors were detected, the queries can be committed to memory via the commit() method call. Data Retrieval Now, we get to the interesting part: data retrieval. Here, PDO shows, by far, the greatest amount of flexi-

In most cases, having both sets of keys is somewhat pointless and quite inefficient. So, the fetch() method allows you to specify the desired array keys to be used, via function’s parameter. This can be either PDO_FETCH_NUM, in which case only numeric keys (fastest) will be used, starting from key 0—the first column—and incrementing by one for each subsequently found column. Alternatively, if you want to use slightly slower, but more user-friendly associated keys, you can set the fetch mode to PDO_FETCH_ASSOC . Another possible fetch mode is PDO_FETCH_OBJ, in which case, the returned row will be represented in the form of an object—an instance of stdClass— where column names become the object properties. $rows = $db->query(“SELECT id FROM foo”); while ($row = $rows->fetch(PDO_FETCH_OBJ)) { var_dump($row);// object(stdClass)#1 (1) {[“id”]=> int(1)} }

In case these three fetch modes are not enough, PDO introduces yet another fetch mode, PDO_FETCH_LAZY. In many situations where results are fetched, only a portion of the data ends up being used. Ideally, the extra, unused data would be simply left unselected, but it may sometimes be needed for certain conditional operations. Lazy fetch allows you to retrieve the row as an object—akin to the PDO_FETCH_OBJ mode. This mode, however, only populates the properties with their respective values when they are being accessed. This

“When it comes to data retrieval, PDO shows the greatest amount of flexibility and capability.” bility and capability compared to any other database extension. But, before we get to data retrieval, let’s quickly examine the process by which we can execute queries that need to return a record set. For data retrieval queries that are to be executed only once throughout the script, there is a query() method. This method takes the query string as the first parameter, and if the query execution is successful, it returns a PDOStatement object that represents the fetched result set. The actual selected data can be extracted from this object, in a number of ways. One approach is to use the fetch() method, which by default will return each row as an array with both the numeric (column position) keys and string (column name) keys. $rows = $db->query(“SELECT id FROM foo”); while ($row = $rows->fetch()) { var_dump($row); // Ex: array(0 => 1, id=>1) }

September 2005

●

PHP Architect

●

www.phparch.com

means that if, while working with a result containing 10 columns, you’ve only used 5, PHP would only allocate memory for the 5 columns that were used, reducing the overhead involved in the data retrieval process. The object representation of a result set does not necessarily need to use the default stdClass. PDO provides a way to create an instance of any object, and populate its properties with the retrieved values. The column names will be used to reference object properties, as is the case with the default functionality. If a property is already defined, it will be assigned a value, and if no existing property with a matching name is found, it will be created, dynamically. This functionality is exposed via the little-known fetchObject() method that takes a class name as the first parameter and an optional array of arguments to pass to the class’ constructor. $stmt = $db->query(“SELECT * FROM user WHERE id=1”); $reg = $stmt->fetchObject(“user_data”);

29

FEATURE

An Introduction to PDO

Using retrieval mechanism, we’ve filled $reg with an instance of the user_data class, populated with data from columns found in the user table. Object-based data retrieval is not limited to the creation of new objects for each result set—an existing object can be populated with the retrieved data. This makes for a much more performance-friendly solution, since object creation can be a slow process. To accomplish this trick we need to call the setFetchMode(), method which is used to set the retrieval mode. This mechanism provides a bit more flexibility than passing the mode via fetch(), by allowing us to specify another mode-related value, which, in this case, is the class instance. $reg = new user_data; $stmt->setFetchMode(PDO_FETCH_INTO, $reg); while ($stmt->fetch()) { // $reg == user_data class filled with from-db data }

To further simplify and accelerate the data retrieval process, PDO offers the fetchAll() convenience method within PDOStatement objects. It works pretty much in the same was as fetch(), except instead of retrieving only a single record, fetchAll() will retrieve all records from the result cursor, in the form of an array. Each array element will be an array or an object, depending on the fetch mode that is specified via the method’s single parameter. $rows = $db->query(“SELECT id FROM foo LIMIT 1”)>fetchAll(); print_r($rows); /* array ( array(0 =>1, ‘id’=>1) ) */

As with fetch() , the default retrieval mode is PDO_FETCH_BOTH. The main goal of this function is to simplify the process to retrieving small results sets, where it would be faster to create an array of results and then iterate through the set, calling the fetch() method for every found record. In instances where the result comprises of just a single column, the process can be optimized even further. Rather than retrieving an array, the value can be fetched in the form of an immediately usable string, by specifying the PDO_FETCH_COLUMN fetch mode. When combined with fetchAll(), it provides an immediately usable array of values that can be accessed, directly. For example if I wanted to see the complete list of tables starting with prefix “ffoo_” (in MySQL), I could simply execute the following bit of code: $tables = $db->query(“SHOW TABLES LIKE ‘foo_%’” )->fetchAll(PDO_FETCH_COLUMN); print_r($tables); /* array( “foo_bar”, “foo_baz”,

September 2005

●

PHP Architect

●

www.phparch.com

… ) */

Even though PDO_FETCH_COLUMN is supported by the fetch() method, PDO offers a dedicated single column fetching method, fetchColumn(). In the default mode of operation, this method will fetch the first column from the result set and return the retrieved value in the form of a string. If the desired column is not first one, you can specify the numeric position of the column via an optional argument. Keep in mind that, as with most things in PHP, the column count begins at 0. $rows = $db->query(“SELECT id FROM tbl”); while ($id = $db->fetchColumn()) { … }

When the result set is comprised of a single row, the column value can be dereferenced, directly from the return PDOStatement object, in a way similar to the one I’ve used with fetchAll(), in the previous examples. This is something that is possible, due to the improved object support in PHP 5, which allows access to an object, directly as the return value of another operation. $qry = “SELECT id FROM users WHERE login=’”. $db->quote($_POST[‘login’]).”’ AND passwd=”’.md5($_POST[‘pwd’]).”’”; if ($db->query($qry)->fetchColumn()) { show_login_prompt(); exit; } else { create_user_session(); }

In the above example, this functionality simplifies the process of validating the user authentication information, and determining whether or not the user should be logged in to the system or prompted once again, for authentication information. Special Character Handling The previous example also introduces a new, previously unseen method, quote(). This particular method is used to escape values that are passed to the database system, preventing SQL injection. This is PDO’s equivalent of mysql_real_escape_string() for MySQL, and pg_escape_string() for PostgreSQL, and so on. The underlying functionality is provided through the database’s native escaping mechanism. If one is not available, then a boolean FALSE is returned, indicating that prepared statements should be used instead of direct query execution, more on that in a bit. Result Iterator One of the neat tricks that is possible with PDOStatments is the ability to iterate through it as if it was an array of

30

FEATURE

An Introduction to PDO

results, via the foreach construct, thus avoiding the need to call any functions and methods, which of course leads to greater performance. The iterator approach represents the fastest mechanism of retrieving data—even exceeding that of fetchAll()—since it does not require the pre-fetching of all results, and their subsequent storage in memory. foreach ($db->query(“SELECT id FROM foo”) as $v) { // $v == array(0=>1, ‘id’=>1) }

One “limitation” of this approach to data retrieval, is that there is seemingly no way to indicate the fetch mode. PDO does provides a solution for this, it’s just not as obvious as with the other fetching methods. The workaround involves passing the fetch mode via the second optional parameter of the query() method. foreach ($db->query(“SELECT id …”, PDO_FETCH_COLUMN) as $v) { // $v == 1 }

columns and PHP variables which will be populated with the relevant values. $stmt = $db->query(“SELECT login, pass FROM user”); $stmt->bindColumn(1,$login); $stmt->bindColumn(2,$pass);

The bindColumn() function, at its very minimum, requires two arguments. The first is the numeric position of the column to which you’d like to bind the variable. This position, interestingly enough, starts the count at 1, rather than 0. This is a bit of an inconsistency when it comes to PHP, and even other parts of PDO that normally start at 0. For better or worse, the developers have decided to follow the approach used by other similar interfaces in other languages, so be careful. If using the numeric position of the column seems like too much of a pain, due to this inconsistency, you can, of course, use the name of the column as a point of reference. However, to do this, you need to know the case of the column name. If the returned column is “FFOO”, and you try to bind “ffoo”, the association

“One of the tricks that is possible with PDOStatments is the ability to iterate through them as if they were arrays of results, via the foreach construct.” Now, instead of getting a complicated array, each $v variable (representing a row) is a simple and immediately-usable string value. This mechanism supports all of the same modes as the ones supported by fetch(). To make things even more interesting, setting the fetch mode inside the query() method saves you from having to specify it inside the fetch() calls, or having to explicitly call the setFetchMode() method, making the code a bit simpler. Parameter Binding Another approach to data retrieval involves variable binding. In this case, rather than creating a new variable with an array or object container being for every record, an existing variable or variables are automatically populated with returned information. This approach can be quite handy in many situations, such as template population, for example, when in most instances the retrieved data needs to be assigned to template variables. With variable bindings, this can be done completely automatically, simplifying the code and, in some instances, also improving its performance. To use variable bindings, the bindColumn() method of the PDOStatment object needs to be used, prior to data retrieval, to create associations between the result

September 2005

●

PHP Architect

●

www.phparch.com

process will fail. The second parameter to the function is much simpler; it is simply the variable whose value will be populated by the fetch process. As you can imagine, bindColumn() takes this value by reference. $stmt = $db->query(“SELECT id,login FROM user”); $stmt->bindColumn(‘id’, $id); $stmt->bindColumn(‘login’, $login); while ($stmt->fetch(PDO_FETCH_BOUND)) { // $login == current value of login column (as a string) // $id == current value of id column (as an integer) }

When it comes to fetching the data, the PDO_FETCH_BOUND mode is passed to the fetch function, which ensures that the returned value is not a variable with data, but merely a boolean indicator used to determine if further records are available. The data itself will, of course, be available through the bound variables, whose values will be appropriately adjusted on every successful fetch. Partial Data Retrieval When it comes to data retrieval, PDO tries to take the most memory efficient approach possible. This involves the use of unbuffered queries that does not require prefetching of the complete result set into memory. 31

FEATURE

An Introduction to PDO

The consequence of this optimization is that pending results remain active on the connection until they are retrieved. If an attempt is made to execute another query, prior to the retrieval of all records returned by the previous operation, an error condition will be triggered. To keep this issue from becoming a real problem, the script could forcibly retrieve all rows by executing while($stmt->fetch()); to ensure that no rows are left over. However, that would be highly inefficient, as it would result in the retrieval and temporary storage of unnecessary data. A much easier, and far more efficient solution involves the use of the LIMIT clause to limit the result set to the subset you intend to use. If that is not possible, then PDO provides the closeCursor() method, which can be used to forcibly terminate a result set that has not yet been completely retrieved. $stmt = $db->query(“SELECT * FROM foo”); while ($res = $stmt->fetch()) { /* some code that may set $abort to TRUE */ if ($abort) { $stmt->closeCursor(); break; } } // now can safely execute another query.

Prepared Statements There is yet one more feature of PDO that has, so far, been neglected, and deserves a mention. One of the core capabilities of PDO is the ability to use prepared statements, regardless of native database support for this functionality. Prepared statements are a very interesting bit of functionality as they increase both the security and the performance of an application. Prepared statements work by allowing separation between the parsing of the query and its execution. For example, when the query() method is used to execute dynamic SQL, every instance of a query call involves the database parsing the query and then executing it. While query parsing process is quite fast, if you end up executing the same (or similar) query multiple times, it does make things somewhat inefficient. With prepared statements, on the other hand, the query is pre-parsed, leaving places for dynamic tokens. The generated statement can then be reused multiple times. The execution step now merely needs to substitute the tokens with the given values, effectively eliminating all but one query parsing operation. The security advantage comes from the fact that tokens are no longer treated as part of the query—as is the case with dynamic query execution—and will always be interpreted as a value and nothing more. This means that SQL injection is no longer a possibility, and you don’t need to escape the input using the quote() method, which provides an extra bit of performance. Making use of prepared statements in PDO is a fairly simple process that consists of just two steps. The first

September 2005

●

PHP Architect

●

www.phparch.com

step is the compilation of the given SQL query into a statements, via the prepare() method. $stmt = $db->prepare(“INSERT INTO foo (a,b) VALUES(?,?)”);

If the query contains variable values, they can be represented using the “??” character or assigned by name, which may make the query a bit easier to understand. $stmt = $db->prepare(“INSERT INTO foo (a,b) VALUES(:a, :b)”);

Upon successful query parsing, a PDOStatment object will be returned that can then be used to execute queries based on the previously compiled SQL. This is accomplished via the execute() method, which takes an optional parameter—an array of values to be substituted into the dynamic tokens of the compiled query. As you can probably imagine, the number of values in the array must match that of the dynamic tokens in the query. If the unnamed tokens are used, then the parameter array should be a simple, and one-dimensional, where every single element corresponds to a token. $stmt->execute(array(1,2));

On the other hand, if named tokens were used, then the complete token name, including the “::” character should be used as an associative array key that points to the desired value. $stmt->execute(array(‘:a’=>1, ‘:b’=>2));

The dynamic tokens can be bound to variables, so that the array need not be created and passed via execute(), each time. This can be particularly useful, if the data is coming from another source such as a CSV file, and is already broken down into variables. $stmt = $db->prepare(“INSERT INTO users (name, email) VALUES(?, ?)”); $stmt->bindParam(1, $name, PDO_PARAM_STR, 255); $stmt->bindParam(2, $login, PDO_PARAM_STR, 32);

The bindParam() method associates a variable to a particular dynamic token, by using the token’s position. Once again, counting starts at one rather then zero. For each bound variable, we specify a type, based on the PDO_PARAM constant, which tells the database how to treat the input data. For strings, it is also possible and recommended to specify the maximum length of possible values to facilitate internal optimizations, for various database systems. Once the variables have been bound, their values can now be populated from the CSV file, and subsequent execute() method calls can be used to insert these values into the database. $fp = fopen(“users.csv”, “r”); while ($csv = fgetcsv($fp, 1024)) { list(,$name,$email,) = $csv; $stmt->execute(); }

32

FEATURE

An Introduction to PDO

The use of prepared statements is not limited to a particular query type. With the exception of table creation and modification queries, nearly every other query can be made into a prepared statement. In the case of SELECT, the same fetch process as we’ve seen before (at the completion of the execute()) can be used to retrieve the data. Not all databases support prepared statements, and in some instances their support may only supply the part of the internal support required by PDO. In those cases, an emulation layer that is built-in to PDO will be used to replace the absent features of the database in question. For example, in the case of the MySQL driver, prepared statements are only available in version 4.1.3 or later, and therefore will be emulated for older releases. In some situations, such as in the case of PostgreSQL, the database may support the functionality natively, this support is poor. When using native prepared statements PostgreSQL sometimes fails to optimize the query properly, leading to slower execution time. For this reason, the PDO_PGSQL_ATTR_DISABLE_NATIVE_PREPARED_STATEMENT attribute was added, which, if enabled, makes the driver use PDO’s emulation layer, rather than the native functionality. Utility Functions Aside from the previously mentioned mechanisms, PDO also provides a number of utility methods to facilitate various operations. These include PDOStatment’s columnCount() method, which will return the number of columns inside a result set. It can be coupled with the getColumnMeta() method of the same object to retrieve information about the contents of a particular column (provided by the database), including some PDO-specific information that is available for all database drivers.

• precision – the numeric precision of this column • pdo_type – the column type according to PDO, as one of the PDO_PARAM constants Another useful PDO utility function is yet another PDOStatment method, rowCount(). For databases that use buffered queries, this method can be used to determine the total number of rows found in the result set. In the event that the database does not support this functionality, or where unbuffered queries are being used, the size of the result set is not known, and the returned value will be -1. $db->query(“SELECT * FROM users”); $db->rowCount(); // returns -1, since unbuffered queries are used by default $db>setAttribute(PDO_MYSQL_ATTR_USE_BUFFERED_QUERY,1); $db->query(“SELECT * FROM users”); $db->rowCount(); // return a value row count, since the query is now buffered.

Additional utility and database-specific functions may be added in the future, and if you have an idea for a generally useful PDO function, feel free to voice your suggestion at http://bugs.php.net/, via a feature request, or on the PHP-Internals mailing list. Either way, suggestions—or better yet, patches—are more than welcome. Incidentally, this concludes our brief tour of PDO and its functionality, which hopefully convinced you to consider PDO as the interface for your next project.

$c = $stmt->columnCount(); for ($i=0; $i < $c; $i++) { $meta_data = $stmt->getColumnMeta($i); }

As you’ve probably guessed, the enumeration of columns starts at zero, following the standard PHP convention. The returned value is an associative array containing the following data set: • native_type – the PHP data type • driver:decl_type – the data type of the column, according to the database • flags – any flags particular to this column, in array form • name – the name of the column, as returned by the database, without any normalization • len – maximum length of a string column; may not always be available, and will be set to -1 if it isn’t

September 2005

●

PHP Architect

●

www.phparch.com

About the Author

?>

Ilia Alshanetsky is the principal of Advanced Internet Designs Inc., which specializes in security auditing, performance analysis and application development. He is the author of FUDforum (http://fudforum.org), a highly popular, Open Source bulletin board, focused on providing the maximum functionality at the highest levels of security and performance. Ilia is a core PHP Developer, an active member of PHP’s QA team, and was the Release Master for the PHP 4.3.x series. He has authored and co-authored a number of extensions, most notably SHMOP, PDO, SQLite and GD, and is responsible for a large number of bug fixes and performance tweaks in the language. A prolific lecturer and writer, Ilia can found speaking at international conferences. He is frequently published in print and online magazines on a variety of PHP topics, and is also the author of an upcoming book on PHP security. Ilia can be reached at [email protected].

To Discuss this article: http://forums.phparch.com/249

33

NEXCESS.NET Internet Solutions 304 1/2 S. State St. Ann Arbor, MI 48104-2445

http://nexcess.net

PHP / MySQL SPECIALISTS! Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions P O P U L A R S H A R E D H O S T I N G PAC K A G E S

MINI-ME

$

6 95

SMALL BIZ $ 2195/mo

/mo

500 MB Storage 15 GB Transfer 50 E-Mail Accounts 25 Subdomains 25 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel

2000 MB Storage 50 GB Transfer 200 E-Mail Accounts 75 Subdomains 75 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel

16 95

/mo

900 MB Storage 30 GB Transfer Unlimited MySQL Databases Host 30 Domains PHP5 / MYSQL 4.1.X NODEWORX Reseller Access

NEXRESELL 2 $

We'll install any PHP extension you need! Just ask :) PHP4 & MySQL 3.x/4.0.x options also available

59 95

/mo

7500 MB Storage 100 GB Transfer Unlimited MySQL Databases Host Unlimited Domains PHP5 / MySQL 4.1.X NODEWORX Reseller Access

: CONTROL

php 5 4.1.x

POPULAR RESELLER HOSTING PACKAGES NEXRESELL 1 $

NEW! PHP 5 & MYSQL 4.1.X

PA N E L

All of our servers run our in-house developed PHP/MySQL server control panel: INTERWORX-CP INTERWORX-CP features include: - Rigorous spam / virus filtering - Detailed website usage stats (including realtime metrics) - Superb file management; WYSIWYG HTML editor

INTERWORX-CP is also available for your dedicated server. Just visit http://interworx.info for more information and to place your order.

WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!

php 4 3.x/4.0.x

128 BIT SSL CERTIFICATES AS LOW AS $39.95 / YEAR DOMAIN NAME REGISTRATION FROM $10.00 / YEAR GENEROUS AFFILIATE PROGRAM

UP TO 100% PAYBACK PER REFERRAL

30 DAY MONEY BACK GUARANTEE

FREE DOMAIN NAME WITH ANY ANNUAL SIGNUP

ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS

Dedicated & Managed Dedicated server solutions also available Serving the web since Y2K

FEA T URE

FEA TURE

What Are

Trackbacks And Why Do They Exist by Chris Cornutt

If you’ve been around the internet for any length of time, chances are you’ve seen a weblog. Chances are, if you’ve seen a weblog, then you’ve seen a trackback. You might not have known it at the time or even understood what it was, but more and more of the blogging tools out there are using them. So, what are these elusive trackbacks and why do they even exist?

B

ack in August of 2002, a group called Six Apart (creators of the Movable Type weblog system) decided that there needed to be a way for one blog to inform another when linking to it. Sure, the administrator of the linked blog could just look at their web server logs and see where the hits were coming from, but trackbacks offer a dynamic way—for not only the site’s owner but also other visitors to the page—to see how many other sites had linked to them. These links could include anything from the URL of the linking site, to the site’s name, or even a snippet of the story from which it came. Six Apart created a technical specification for the transmission of these “pings,” back and forth between sites, and published it. Shortly after that, in October of 2002, they made some modifications to the specification, changing things like the protocol to use when sending trackbacks, and implemented some of the auto-discovery options. So, with spec in hand, several

September 2005

●

PHP Architect

●

www.phparch.com

REQUIREMENTS PHP

n/a

CODE DIRECTORY

trackbacks

RESOURCES URL

htt p:/ /ww w.i onc ube .co m/

URL

htt p:/ /ww w.z end .co m/

URL

htt p:/ /ww w.s our ceg uar dia n.c om/

URL

htt p:/ /ww w.p hpa udi t.c om/

i

of the existing blogging tools set out to implement this handy notification mechanism. They envisioned sites automatically linking to other sites, links upon links, dynamically relating blog content pages to one another. Some of the tools that have worked trackbacks into their structure include Movable Type (obviously), WordPress, Radio, and Serendipity. Unfortunately, as is usually the case with any kind of automatic resource that’s put out in the public view, people have seen fit to abuse trackbacks, filling peo-

37

FEATURE

Trackbacks

ples’ pages with random links and other such spam. Of course, bloggers are no strangers to spam on their pages, as comment spam is a prevalent problem these days, as well. Thankfully, there have been several efforts to help squelch the spam problem in the form of PEAR classes and other independent projects. Most of the popular blogging tools, however, don’t have built-in support for things like this. They do allow you to remove the offending trackbacks, easily, but when you have hundreds coming in a day (yes, it happens) that’s just not practical. Support for this kind of filtering is getting better, though—the WordPress blogging software, for example, has made a large, concerted effort to integrate filtering into their code. Tobias Schlitt has created a PEAR class that seeks to help with the problem too: Services_Trackback. This package provides all of the basic functionality that someone who would like to implement trackbacks on their site needs: sending a trackback, receiving a trackaback, autodiscovery of trackback URLs, etc. Where it really shines, though, is in the filtering techniques that it employs. There’s so much more than just the simple word list filtering here; it also offers regular expression matching, a DNS blacklist option, and sub-URL matching. Other bloggers go the other route, however, and simply turn trackbacks off, completely. They either don’t have the time to worry about the filtering or just don’t want the hassle of having to deal with them. Of course, to others, they simply seem like glorified links, serving no other purpose than to allow someone else to shamelessly promote themselves on someone else’s blog. Some users out there see the idea of trackbacks as somewhat rude, allowing anyone and everyone to post pretty much whatever they want to an entry of yours. It’s pretty easy to see how the possibility for spam and abuse wouldn’t be far behind this one. With the protocol open to anyone, and no inherent security for the posts, there’s really not much stopping someone who wants to spam their message all over your pages. Granted, some of the blogging software out there does its best to try to limit the spam that’s received, but with automation of the trackbacks being such an easy thing, it’s almost not worth it, at times. Some bloggers have left trackbacks on for a while, only to be burned by a spammer coming into their site and abusing its trackback interface. Of course, the use of trackbacks isn’t limited to just weblog software—there are people that have stretched the use of this handy little protocol to make it do other things for them. One example comes from the weblog of Matthew Haughey, founder of Metafilter.com, in which he actually posts a “Now Playing” item to his site from either Winamp or iTunes. He has scripted an interface from these two pieces of software through a trackback-style interface and, with the help of DoSomething/AppleScript, he has been able to send the information to his site. The protocol is pretty open, September 2005

●

PHP Architect

●

www.phparch.com

Listing 1

1 function sendPing($data_array){ 2 $request=””; $content=””; 3 $url_parts=parse_url($data_array[‘tb_url’]); 4 $content.=(isset($data_array[‘title’])) ? 5 6 “title=”.$data_array[‘title’].”&” : “”; 7 $content.=”url=”.$data_array[‘url’].”&”; 8 $content.=”blog_name=”.urlencode($this>site_name).”&”; 9 $content.=”&”.$url_parts[‘query’].”\n”; 10 $request.=”POST “.$url_parts[‘path’].” HTTP/1.0\r\n”; 11 12 $request.=”Host: “.$url_parts[‘host’].”\r\n”; 13 $request.=”Content-Type: “; 14 $request.=”application/x-www-form-urlencoded; “; 15 $request.=”charset=utf-8\r\n”; 16 $request.=”Content-Length: “.strlen($content).”\r\n”; 17 $request.=”\r\n”; 18 $request.=$content; 19 20 echo “<pre>request:
”.$request.””; 21 $response=$this->socket($request,$url_parts[‘host’]); 22 echo “

”; 23 echo “<pre>response: “.htmlspecialchars($response). 24 “”; 25 }

Listing 2 1 function socket($request,$host){ 2 $string=””; 3 $fp=fsockopen($host,”80”,$errno,$errstr); 4 if($fp){ 5 fwrite($fp,$request); 6 while(!feof($fp)){ 7 $string.=fread($fp,1024); 8 } 9 fclose($fp); 10 }else{ echo “Error: “.$errno.”: “.$errstr.”
”; } 11 return $string; 12 }

Listing 3 1 function handlePing(){ 2 $contents=””; 3 $fp=fopen(“php://input”,”r”); 4 while(!feof($fp)){ $contents.=fread($fp,1024); } 5 fclose($fp); 6 $parts=explode(“&”,$contents); 7 foreach($parts as $key => $value){ 8 if(!empty($value)){ 9 $p=explode(“=”,$value); 10 $arr[$p[0]]=$p[1]; 11 } 12 } 13 return $arr; 14 }

Listing 4 1 site_name).”&”; 20 $content.=”&”.$url_parts[‘query’].”\n”; 21 22 $request.=”POST “.$url_parts[‘path’].”

38

FEATURE

Trackbacks

Listing 3 (cont’d) HTTP/1.0\r\n”; 23 $request.=”Host: “.$url_parts[‘host’].”\r\n”; 24 $request.=”Content-Type: “ 25 .”application/x-www-form-urlencoded; “ 26 .”charset=utf-8\r\n”; 27 $request.=”Content-Length: “.strlen($content).”\r\n”; 28 $request.=”\r\n”; 29 $request.=$content; 30 31 echo “<pre>request:
”.$request.””; 32 $response=$this>socket($request,$url_parts[‘host’]); 33 echo “

”; 34 echo “<pre>response: “.htmlspecialchars($response). 35 “”; 36 } 37 function handlePing(){ 38 $contents=””; 39 $fp=fopen(“php://input”,”r”); 40 while(!feof($fp)){ $contents.=fread($fp,1024); } 41 fclose($fp); 42 //echo “contents: “.$contents.”
”; 43 44 $parts=explode(“&”,$contents); 45 foreach($parts as $key => $value){ 46 if(!empty($value)){ 47 $p=explode(“=”,$value); 48 $arr[$p[0]]=$p[1]; 49 } 50 } 51 //echo “<pre>”; print_r($arr); echo “”; 52 53 54 55 56 57 58 59 60 61 62 63 64 65

return $arr; } //——————————————function socket($request,$host){ $string=””; $fp=fsockopen($host,”80”,$errno,$errstr); if($fp){ fwrite($fp,$request); while(!feof($fp)){ $string.=fread($fp,1024); } fclose($fp); }else{ echo “Error: “.$errno.”: “.$errstr.”
”;

} 66 return $string; 67 } 68 } 69 70 //————————————————71 $tb=new trackbackManage(); 72 if(isset($_POST[‘url’]) && !isset($_POST[‘tb_submit’])){ 73 $arr=$tb->handlePing(); 74 echo “<pre>”; print_r($arr); echo “”; 75 }elseif(isset($_POST[‘tb_submit’])){ 76 unset($_POST[‘tb_submit’]); 77 $tb->sendPing($_POST); 78 }else{ 79 ?> 80

81 101

September 2005

●

PHP Architect

●

www.phparch.com

and can be adapted to more uses than just the typical commenting and linking done on most blogs. The real potential behind trackbacks can be seen in the fact that a trackback is more than just a normal link to another page. It’s more of a meta-link, providing more information than just a referrer in your server’s web logs. If you have a site that doesn’t currently have (or can’t really use) trackbacks, you might consider one of the trackback “hosting” services such as HaloScan.com. They offer a service that, with “just two lines of code”, can offer you all the benefits of having trackbacks on your site. They offer services like the banning of commenters, CSS templating, and a custom RSS feed for the trackbacks/comments left on your site. The How So, now that we’ve talked about how trackbacks can be used, and their potential for abuse (unfortunately), how can we actually create these handy little “pings?” Well, thankfully, the protocol is a very simple one as shown on the Six Apart pages. Trackbacks use a REST model (Representational State Transfer) and are created the same way that a normal HTTP call is performed. The request consists of a formatted POST request with certain variables set. Only one of these is required, the URL that the ping is coming from, and the rest are optional, but helpful—the title of the entry, an excerpt from the page, and the originating blog name, for reference. Each kind of weblog system that I looked into seemed to have its own interface for accepting trackbacks, with none of them resembling each other in name or URL to call. A formatted request looks like this: POST http://www.example.com/trackback/5 Content-Type: application/x-www-form-urlencoded; charset=utf-8 title=Foo+Bar&url=http://www.bar.com/&excerpt=My+Exce rpt&blog_name=Foo

It’s a normal HTTP POST request to the URL specified by the original entry, with a specific Content-Type, and the data encoded in the typical POST format. According to the specification for the request format, you must send a Content-Type header (set to “aapplication/x-wwwform-urlencoded; charset=utf-8”), in order for the request to be successfully accepted. Of course, the formatting of the values in the data section of the request must conform to the character set that you’ve indicated. If the request was successful, the trackback script should respond with an XML response: <error>0

On success, the error tag will contain a “0”. Otherwise, it will contain an error message such as “We already

39

Any more, and we’d have to take the exam for you! We’re proud to announce the publication of The Zend PHP Certification Practice Test Book, a new manual designed specifically to help candidates who are preparing for the Zend Certification Exam. Available in both PDF and Print

Written and edited by four members of the Zend Education Board, the same body that prepared the exam itself, and officially sanctioned by Zend Technologies, this book contains 200 questions that cover every topic in the exam. Each question comes with a detailed answer that not only provides the best choice, but also explains the relevant theory and the reason why a question is structured in a particular way. The Zend PHP Certification Practice Test Book is available now directly from php|architect, from most online retailers (such as Amazon.com and BarnesandNoble.com) and at bookstores throughout the world.

Get your copy today at http://www.phparch.com/cert/mock_testing.php

FEATURE

Trackbacks

have a trackback from that URL on this post.” Basically, when a given user posts a new note on his blog, he can enter a trackback URL for the entry that he’s posting about. His blogging software then takes this URL (and possibly some other information about the entry) and creates a POST request destined for it. One thing to look out for when posting a trackback: the URL for the trackback interface is usually different from the post being linked to. This URL can usually be found somewhere on the destination page, usually right between the comments and the main body of the post. The software will then send the request to this URL and the remote blog’s software will interpret it. There are many reasons that a post could be denied, one common error occurs when you’ve already posted a trackback to the target entry. There is also a provision in the trackback specification that allows for the auto-discovery of the trackback URL for a given post, but the weblog software has to support this feature. The method that a client uses to autodiscover the trackback URL for each post involves looking at the RDF/RSS file for the site. Most of the software that’s out there automatically creates a syndication file for your site, allowing people to subscribe with their aggregators and see when you make a new post. This same syndication file is used to share the trackback URL with your visitors. There is metadata placed in each entry of the RDF (for each post) with the trackback namespace:

Note the “ttrackback:ping” URL that’s provided there at the end. That’s what your script would need to look out for. Unfortunately, as I mentioned, it doesn’t see like many sites really use this format, so you might be stuck with parsing the page in an effort to find it. Thankfully, just about every page I’ve seen that has trackbacks enabled used the word “trackback” in the link for it on the post. Grabbing the page and parsing out this URL isn’t too big of a problem—as is evidenced, unfortunately, by the rampant trackback spam that goes on. The Code So, the formatting is great and all, you say, but when do we get to the actual code? Well, ask and ye shall receive. I’m going to show you a little class that I whipped up to deal with the sending and receiving of trackbacks. This is a very simplified version of something that could get really complex really quickly, so don’t expect much more than a simple send/receive. I’ll give you the code, then walk you through what it’s doing, line by line. The sendPing function (Listing 1) does just what it

September 2005

●

PHP Architect

●

www.phparch.com

sounds like: sends a trackback ping to another site. The $data_array that’s passed in contains information, from a form submission in this case, and can have the following values: tb_url, title, and url. Remember, the URL parameter is the only one that has to be set. If it’s not, chances are your request will be rejected. After initializing the $request and $content variables, I break out the parts of the trackback URL (ttb_url) that’s been given with parse_url(). The output of this function gives me all the data about the URL, including the path of the script, the host it’s being sent to, and the query that was on the end. It returns more than that, but for our purposes, that’s all we need. Then, we start building the content of the request— the data, not the headers. The $content variable is build according to the Six Apart specification. Once we have the content, we can build the actual request, in the $request variable. We needed to make the content of the request first so that we could use the strlen() function to give us a “CContent-Length”. The value of $request is appended to, and a normal POST request (with the required Content-Type) is created. Add a final line with the content itself and your request is complete. My code echo()s the $request for debugging purposes—so I could see what it was sending. After that, there’s a call to the $this->socket() function— this is another function in the class (Listing 2). The socket() function take the hostname and the data for a request. It’s called with the request that we’ve created, as well as the hostname pulled from the URL that sendPing() was given. A socket is opened to port 80 on the remote host and, if this operation is successful, the request is posted to the remote script. Once the request has been made and sent, the script looks for a response with fread() and, of found, appends it to the $string variable. The connection is closed and $string is returned to the sendPing() function. If there is when opening the socket, it will be passed to the else clause, and the error will be displayed. If all has gone well, your trackback (with the data you specified) should be on the page you submitted it to. Having the echo() statements in the sendPing() function can really help in the debugging process—sometimes web servers don’t return what you think they should. Ideally, though, the request that I laid out should work with any server out there. When it comes to receiving a ping, you want to do just the opposite—your script needs to look for a POST request from another page. Since most sites will have a separate script to handle their trackbacks, you could simply rely the assumption that any data posted to your trackback script is, in fact, a trackback request. Of course, that seems a little scary to me, so I threw an if on my page to check if $_POST[‘url’] was set. Since this is a required field for the trackback call, I figured that this is an acceptable check.

41

FEATURE

Trackbacks

Now, for the functionality that handles the ping, let’s take a look at Listing 3. First, we initialize the $contents variable that we’ll use to grab the contents of the POST request. Then, using the special “pphp://input” protocol in fopen(), we can grab everything that was fed to the script. When PHP is running in the

Trackback URL” field and the script should pick it up just fine. When the form is submitted, you’ll see the POST request that it sends, in the output, along with a response from the same script, eliminating the need to have two scripts for testing. When the POST request comes in, the script ensures that the “uurl” value is set,

“Trackbacks can include anything from the URL of the linking site, to the site’s name, or even a snippet of the stories from which they came.”

context of a HTTP request, this will grab all of the data except for the headers. There is also a $GLOBALS[‘HTTP_RAW_POST_DATA’] variable that can be accessed to get the same kind of information, but this variable is only populated when the server’s always_populate_raw_post_data ini directive is true. The POST data is appended to the $contents variable, inside the while loop, and the socket is closed. To get the values from the POST data into something we can use, we explode on the “&&” characters, which are used to separate the values in a POST request and put those into an array, $parts. Then, with a foreach loop, we go through each of the entries, exploding them on the “==” character (separating the key from the value) and set those in the $arr array. What we end up with it something like this: url=http://www.test.com&title=my%20title&blog_name=mi ne // becomes: [url] =>

http://www.test.com

[title] => my title [blog_name] => mine

This value in the $arr variable is then passed back out of the script to be used elsewhere. One thing that we’re not really going to look at, however, is auto-discovery of the trackback URLs. I’ve included an example of how to possibly parse out a trackback URL, along with the full class code in Listing 4, so you can see one possibility of how to this mechanism. The example uses a simple form to post the information through the trackback URL, and is set up in a manner where you can post the trackaback to the same page, and it will understand how to handle it. For example, if you named the page “ffoo.php”, once you created the class instance and have the form ready, you could put http://www.mysite.com/foo.php in the “Their

September 2005

●

PHP Architect

●

www.phparch.com

and that the “ttb_submit” field is not—this differentiates a trackback call from a regular (non-trackback) form submission. That’s pretty much it—using three simple functions, you have a base PHP class to help you send and receive trackbacks. As I said earlier, it’s nothing fancy and there’s no filtering, or other special features, in there; it’s just an introduction to the format and will send and receive. From here, you can add on whatever features you like—filtering, better error checking, etc. In Summary Trackbacks were designed to be a simple thing from the start, a friendly way to let other site owners know that you found their content interesting, and have linked to it in a post of your own. They can be very useful when used in the right manner—they can provide a meta-link from site to site, helping to link content and share ideas among a site’s visitors. The goal of articles referencing other articles, referencing other articles, etc. is a decent one, unfortunately, too many people out there have seen fit to abuse this simple “heads up” from one blogger to another. It’s all too common to come across posts that talk of problems with spam in trackbacks, or even stories of bloggers who have shut them off, altogether. It’s not too hard to tell when a site has been hit by a trackback spammer, either—the numbers don’t lie; a posting with a large number unrelated trackbacks is not hard to find. Since comments and trackbacks often occupy the same areas on most of the blogging tools out there, comment spam and trackback spam seem to go hand in hand. There are those out there that argue also that trackbacks, in themselves, really aren’t worth much to the online community. They argue that if you want to say something to the person posting on the blog, you

42

FEATURE

Trackbacks

should just leave a comment. Trackbacks are the comment equivalent of a PostIt Note. Their reasons range from the obvious spam difficulties all the way to is described as “uselessness.” Some bloggers suggest that trackbacks really aren’t needed—that a normal link to an entry is all that’s really necessary. They point out that trackbacks aren’t centralized enough to do any kind of good. Yes, they link back and forth between blogs, helping to bridge the gap that the user would otherwise have to search for, but since they are unique to each weblog and aren’t really stored in any centralized manner, they’re not really useful for much more than a “look at me! I linked to you!” sort of message. Of course, logically following that argument, the topic of Technorati is brought in. Technorati is a site (http://technorati.com/) that indexes and mines the data from millions (15.5 million at the time of writing) of weblogs. The site includes the ability to search all of the content, and show only results pertinent to your search. Many of the developers that see trackbacks as pointless seem to think that Technorati is a much better solution. Not only does the site index the content to make it searchable, but using their search feature, you can see which blogs have linked to the site of your choosing. The results include the name of the site, a link to it, the brief bit of content surrounding the search term, and how long ago the item was posted. They’ve even introduced an API that you can connect to, to perform these searches right on your own page (but using it’s a topic for another time). For example, a visitor could come to your site, view one of the postings on your blog, and see, courtesy of Technorati, an up-tothe-minute list of sites that link to that very page.

September 2005

●

PHP Architect

●

www.phparch.com

Sound familiar? Well, it should—and using this functionality, Technorati could very easily replace trackback functionality for any site out there. All they’d need is access to the API, one of the many libraries out there for accessing it, and a bit of patience to get it up and running. Trackbacks, on the whole, seem to be sticking around for a bit longer, but with things like the Technorati example above, it’s only a matter of time before they get phased out. Sure, you can argue the old “what happens if they just go away” mentality that applies to so many situations, but the benefits of such a centralized system really make linking between blogs much more useful. In the mean time, though, Trackbacks are still the de facto method of letting a blog’s maintainer know that you’re linking to her content.

About the Author

?>

Chris has been involved with PHP and its community for about five years now, most of that running his site, PHPDeveloper.org - a site devoted to bringing the most up-to-date, informative news and community happenings to the forefront. He’s a Zend Certified Engineer and works as a web site administrator at a large natural gas utility in Dallas, Tx.

To Discuss this article: http://forums.phparch.com/250

43

FEATURE

End-to-End Testing with

PHP and Internet Explorer by Oz Solomon

Automated testing can greatly improve the quality of your product. In this article, Oz presents a framework for creating automated tests that can simulate end-user activity. By leveraging the full faculty of Internet Explorer, these tests can do just about anything that your users can do.

A

few years ago, my team was suddenly pulled away from the project it was working on, and was commissioned to write a new transaction processing system for our company. Despite the complexities and strict reliability requirements imposed on the system, I was not given enough of a budget to hire any QA staff, let alone purchase expensive testing software. Fortunately, I manage a team of extremely bright people, and together, we were quickly able conquer the reliability beast. We were able to consistently keep a schedule that called for new releases of our system every 3-4 weeks. The level of reliability and maintainability that was required for such a tight schedule to be realistic was achieved due to our extensive use of automated tests. In this article, I will present a framework similar to the one that we developed to perform end-to-end testing of our web applications. This framework uses nothing more than PHP and Internet Explorer, and can be used to test anything Internet Explorer can get its hands on, be it PHP pages, JSP pages or hand coded HTML. Unit Tests vs. End-To-End Tests Before I discuss the framework, allow me to quickly review the differences between unit tests and end-toend tests. A unit test is a piece of code that exercises one function or functional unit and ensures that it works correctly. For example, if you write a function, validateCreditCardNum(), that checks for valid credit

September 2005

●

PHP Architect

●

www.phparch.com

REQUIREMENTS PHP

4.3.2+ (except 4.3.10), 5.x

OS

Microsoft Windows

Other Software

Internet Explorer, PHPUnit (PEAR)

Code Directory

endtesting

card numbers, the function’s unit tests would exercise it with various credit card numbers, ensuring that the returned values are correct on every call. In contrast, end-to-end tests exercise multiple components or the entire product to ensure that all the pieces work properly together. For example, imagine that you have a web form that accepts a credit card number. That form then submits to a PHP script that validates the credit card number and does some other processing. An end-to-end test would submit valid and invalid credit card numbers through the form, checking for proper responses from the application. Notice that you are no longer checking the validateCreditCardNum() function by itself. You are now checking the application as a whole. By running these tests, you will know, indirectly, that: • Your application is properly submitting the form • The backend code is properly passing credit

44

FEATURE

End-to-End Testing

card numbers to the validation function • The validation function is working as expected • The application is properly rendering errors Depending on your configuration, even the simplest end-to-end test can exercise many components, such as database layers, caching systems and graphics

Listing 1 1 visible = true; 12 13 // Navigate to the PHP web site 14 $ie->Navigate2(“http://www.php.net”); 15 ?> 16

Listing 2 1 visible = true; 5 6 $ie->Navigate2(“http://www.php.net”); 7 8 // pause so we can actually see the IE window 9 sleep(2); 10 11 // tell our instance of the Internet Explorer application 12 // to shut down 13 $ie->Quit(); 14 ?> 15

Listing 3 1 visible = true; 5 6 $ie->Navigate2(“http://www.php.net”); 7 8 // Wait for the document to load 9 com_message_pump(1000); // 1 second 10 11 // Access the HTML source 12 $doc = $ie->document; 13 $html = $doc->body->innerHTML; 14 15 // Find and print out all the headlines. We know that all 16 // the headlines are surrounded by

copy this form, fill it out and mail to the address above or fax to +1-416-630-5057 67

exit(0);

Atomic Orange by Marco Tabini

F

or a business—any business— growing is good, and painful (hence, I suppose, the proverbial phrase “pain is so close to pleasure”). Sometimes, once in a blue moon, it’s also funny. As our book business keeps becoming more and more successful, I keep having to find better ways to print our books. The economy of printing is a funky one—I’ve always thought that accountants who used to work in the printing industry must have been responsible for the foundation of the Hollywood economy, where a movie can make half a billion dollars without ever turning a profit— and difficult to navigate for the beginner. Suffice it to say that the vast majority of the costs connected with printing a book is not related to the actual printing, but rather to the set up—developing the film, etching the plates, and so on. Thus, printing, say, one thousand copies of a book can cost only marginally less (in relative terms) than printing 10,000 (the problem being, of course, where you chuck the other 9,000 copies you don’t actually need). Besides making for interesting coffee-time talk, this also means

September 2005

●

PHP Architect

●

that the cost of printing the same book varies wildly between printers, depending on how quickly they think they can set up a title, and how much they charge for their set up work. As a result, the moneyconscious publisher (or, as other people prefer to refer to me, the tight tyrant) finds himself having to shop around for the best price on a book-by-book basis. A couple of weeks ago, as we were about to run the first print of Jason Sweat’s php|architect’s Guide to PHP Design Patterns, I happened to drop a request for quotation with a printer we had never used, but whose pricing structure seemed very promising. Mindful of how much I dislike clients who send me requests for proposals without enough information to provide a meaningful price, I made sure to include as much detail as possible with my RFQ—trim size, page count, run size, and so on. Unlike the software world, where pretty much every project is completely different from the next, there is only so much that can be done with a book; therefore, I wasn’t expecting much in the way of problems. Imagine my surprise when I received a quote for something

www.phparch.com

that, had the title of our book not been on it, I would have simply thought had been faxed to the wrong speed-dial. Wrong trim size, spiral binding (uh? Did I order notebooks?) and quantities high enough to teach the savage people of the Deep Amazonian Forest, as yet untouched by western civilization, the ins and outs of Iterator pattern. To me, that’s like asking how much two pounds of oranges cost to your local grocer only to be told the price of three tonnes of fissionable plutonium. So, the morning juice is out of the question, but the afternoon build-your-own-nuke fest is on. Needless to say, the printer didn’t even get a call back. Everybody makes mistakes… just as long as they don’t make them with my stuff. In business, like in so many other aspects of life, first impressions are very important, and my first impression of this company was that I’d be shipped ten thousand copies of the latest Martha Stewart Living, which I have a feeling most of our readers wouldn’t have been interested in.

php|a

68

Global Finance (September 2005)

Read more

Harvard Business Review - September 2005

Read more

Sound on Sound (September 2005)

Read more

UbiComp 2005: Ubiquitous Computing: 7th International Conference, UbiComp 2005, Tokyo, Japan, September 11-14, 2005, Proceedings

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

September

Read more

2006 Trondheim, Norway, September 2-3, 2005 Seoul,

Read more

Secure Data Management: Second VLDB Workshop, SDM 2005, Trondheim, Norway, August 30-September 2, 2005, Proceedings

Read more

Image Analysis and Recognition: Second International Conference, ICIAR 2005, Toronto, Canada, September 28-30, 2005, Proceedings

Read more

Computational Life Sciences: First International Symposium, CompLife 2005, Konstanz, Germany, September 25-27, 2005, Proceedings

Read more

Advances in Artificial Life: 8th European Conference, ECAL 2005, Canterbury, UK, September 5-9, 2005, Proceedings

Read more

Distributed Computing: 19th International Conference, DISC 2005, Cracow, Poland, September 26-29, 2005, Proceedings

Read more

Image Analysis and Processing ICIAP 2005: 13th International Conference, Cagliari, Italy, September 6-8, 2005, Proceedings

Read more

Graph Drawing: 13 th International Symposium, GD 2005, Limerick, Ireland, September 12-14, 2005, Revised Papers

Read more

Frontiers of Combining Systems: 5th International Workshop, FroCoS 2005, Vienna, Austria, September 19-21, 2005, Proceedings

Read more

Theoretical and Computational Acoustics 2005: Hangzhou, China, 19-22 September 2005

Read more

Recommend Documents

Global Finance (September 2005)

ISLAMIC FINANCE: BANKS ASSESS THE INSIDE: WHO’S WHO IN BANKS JOIN BATTLE BENEFITS OF BEHAVING GLOBAL TREASURY & FOR MARK...

Harvard Business Review - September 2005

How Health Care Can Heal Itself…page 78 When Executives Feel Phony…page 108 www.hbr.org September 2005 What Leaders N...

Sound on Sound (September 2005)

In This Issue September 2005 In This Issue Click article title to open Reviews People Audio Ease Altiverb 5 Building...

UbiComp 2005: Ubiquitous Computing: 7th International Conference, UbiComp 2005, Tokyo, Japan, September 11-14, 2005, Proceedings

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

September

September

September

September

...

September

September

Their Trackback URL:
Your Post URL:
Post Title:

php|architect (September 2005)

Welcome to Listing 10

Welcome to Listing 10

Welcome to Listing 10

Global Finance (September 2005)

Harvard Business Review - September 2005

Sound on Sound (September 2005)

UbiComp 2005: Ubiquitous Computing: 7th International Conference, UbiComp 2005, Tokyo, Japan, September 11-14, 2005, Proceedings

September

September

September

September

September

September

September

September

September

September

September

September

September

September

September

September

2006 Trondheim, Norway, September 2-3, 2005 Seoul,

Secure Data Management: Second VLDB Workshop, SDM 2005, Trondheim, Norway, August 30-September 2, 2005, Proceedings

Image Analysis and Recognition: Second International Conference, ICIAR 2005, Toronto, Canada, September 28-30, 2005, Proceedings

Computational Life Sciences: First International Symposium, CompLife 2005, Konstanz, Germany, September 25-27, 2005, Proceedings

Advances in Artificial Life: 8th European Conference, ECAL 2005, Canterbury, UK, September 5-9, 2005, Proceedings

Distributed Computing: 19th International Conference, DISC 2005, Cracow, Poland, September 26-29, 2005, Proceedings

Image Analysis and Processing ICIAP 2005: 13th International Conference, Cagliari, Italy, September 6-8, 2005, Proceedings

Graph Drawing: 13 th International Symposium, GD 2005, Limerick, Ireland, September 12-14, 2005, Revised Papers

Frontiers of Combining Systems: 5th International Workshop, FroCoS 2005, Vienna, Austria, September 19-21, 2005, Proceedings

Theoretical and Computational Acoustics 2005: Hangzhou, China, 19-22 September 2005

Recommend Documents