Giveaway Contest : PHP and MongoDB Web Development Beginner’s Guide

I’m happy to announce that Packt Publishing has arranged a giveaway contest for the book PHP & MongoDB Web Development: Beginner’s Guide. All you have to do is participate in a small programming challenge. The winners of the contest will be receive a copy of the book each, free of charge!

The challenge is to build a standalone, re-useable HTTP session manager using PHP and MongoDB.

The details of the contest are available here.

We hope that this contest will spark your interest in building web apps using PHP and MongoDB. Happy Coding!

Advertisements

PHP and MongoDB Web Development Beginner’s Guide – Thoughts of a first-time author

PHP and MongoDB web development

Social networking doesn’t always make you procrastinate, sometimes it pays off! When @packtauthors tweeted that they were looking for someone to author a book on PHP and MongoDB, I made contact. Few weeks later I signed a contract for writing the book. And six months after that, I am pleased to announce that PHP and MongoDB Web Development Beginner’s Guide is published and out for sale!

In this post I intend to share a few words about the motivation behind the book and the journey of a first time author.

The Motivation

I’m a supporter of the idea the MongoDB can potentially be the new M in LAMP. The web application data storage requirements have changed a lot during the past 4-5 years. Instead of producing contents of their own, the most popular websites are hosting contents created by their users. These contents are diverse in nature and humongous in volume. Mapping the diverse data into a rigid data structure gets harder as the volume grows. This is where the ‘Flexible Schema’ nature of MongoDB fits really well. Also MongoDB is easy to learn, developers with relational database experience should find little trouble adapting to it. There is a lot of similarity between the underlying concepts of an RDBMS and MongoDB (think documents for rows, and collections for tables). Developers don’t need to wrestle with radical ideas such as column-oriented or graph theory based data structures as some other NoSQL databases require them to. Finally, it is open-source, freely available (Creative Commons License), supports multiple platforms (Windows/Linux/OS X), have great documentation and a very co-operative community, and plays nicely with PHP! All these have lead me to believe that in near future MongoDB will be where MySQL is right now, the de facto database for web application development (I would urge you to read Stephen O’Grady’s article which makes more persuasive arguments). And since PHP is the dominating language for web programming, writing a book on web development with PHP and MongoDB felt just right.

The intended audience for this book are web developers who are completely new to MongoDB. It focuses on application development with PHP and MongoDB rather than focusing only on MongoDB. The first few chapters will try to ease the reader into understanding MongoDB by building a simple web application (a blog) and handling HTTP sessions with MongoDB as the data back-end. In the next chapters he will learn to solve ‘interesting’ problems, such as storing real-time web analytics, hosting and serving media content from GridFS, use geospatial indexing to build location-aware web apps. He will also brainstorm about scenarios where MongoDB and MySQL can be used together as a hybrid data back-end.

The Inspiration

Scott Adams, the creator of the famous Dilbert comic strip, wrote an inspirational article on Wall Street Journal. I’m going to quote a few lines here:

“I succeeded as a cartoonist with negligible art talent, some basic writing skills, an ordinary sense of humor and a bit of experience in the business world. The ‘Dilbert’ comic is a combination of all four skills. The world has plenty of better artists, smarter writers, funnier humorists and more experienced business people. The rare part is that each of those modest skills is collected in one person. That’s how value is created.”

These words moved me. I like programming and I like writing, and although there are smarter programmers and better writers out there, by combining these two passions I could potentially produce something. Besides I had an amazing learning experience with MongoDB. I built an API analytics solution with MySQL which became difficult to handle as the volume of the data grew. I started playing with MongoDB as a potential alternative. A month later I moved the entire data from MySQL to a more solid and scalable solution based on MongoDB. I wanted to share this learning experience through a series of blog posts but lacked the personal discipline and commitment to do so. Being obligated a deliver a book within tight deadlines solved that problem!

I also must thank Nurul Ferdous, my friend and former colleague who is a published tech author himself. His guidance and influence has been instrumental.

The Journey

My journey as an author writing a book for the first time has been an exhaustive yet amazing one! I work in a tech startup, which naturally requires longer than usual hours and harder than usual problems to solve. I would come home late and tired, research on MongoDB topics, plan how to deliver the message to the reader, write code, test and debug the code, write the content on a text editor, fight with Microsoft Word so the content has proper formatting as required by the publisher. Then on weekends I would revise and rewrite most of what I have done over the week and hustle to make the deadline. Nevertheless it all had been a rewarding experience.

In the rewrite phase I had a lot of help from the technical reviewers – Sam Millman, Sigert De Vries, Vidyasagar N V and Nurul Ferdous. They corrected my errors, showed me what more could be added to the content and what should be gotten rid off, helped me communicate complicated topics to readers in a clearer way. I convey my sincere appreciations to them!

Time to end this lengthy blog post. I hope you find this book enjoyable and use it to build some really cool PHP-MongoDB apps! I will then consider my endeavor to be a success.

Modifying PDF files with PHP

Last week, a friend of mine asked me to help him with a programming problem that he had been wrestling with for some time. The problem sounds simple:

  1. Take a PDF file
  2. Write something at the footer of each page of that file

And this had to be done with PHP.

Although there are several libraries available in PHP for dealing with PDF files, none seem to have capabilities to modify the contents of an existing PDF file. Their manuals/tutorials are full of examples on how to create PDF on the fly. After spending few fruitless hours trying to get the much recommended PDFLib installed in my Mac and have it work with MAMP, I painfully realized this library is for commercial use only. The free version leaves a horrible watermark of their site address on the generated PDF documents.

My search for a solution took me to FPDF, an open-source library for PDF file generation in PHP. In their FAQ section, I found the link to an extension of the library, named FPDI. This one was seemingly capable of ‘manipulating’ PDF files in an ad hoc fashion. It extracts the contents of each page in the file, uses it as a template, lets you put texts/shapes on the template and then outputs the modified file. Excited, I got into coding and after an hour of labor, finally succeeded to achieve my goal! Thank God for creating open source!

Enough talk, now lets get our hand dirty!

First we need to have following libraries downloaded and unzipped. They are just packages of PHP scripts that you just require/include in your own script. No need to deal with .dll/.so extensions.

  1. FPDF
  2. FPDI
  3. FPDF_TPL

Keep them in the same directory of your script, or in the include path. The following code snippet gives a basic idea of how to get started with it:

require_once('fpdf/fpdf.php');
require_once('fpdi/fpdi.php');

$pdf =& new FPDI();
$pdf->AddPage();

//Set the source PDF file
$pagecount = $pdf->setSourceFile("my_existing_pdf.pdf");

//Import the first page of the file
$tpl = $pdf->importPage($i);
//Use this page as template
$pdf->useTemplate($tpl);

#Print Hello World at the bottom of the page

//Go to 1.5 cm from bottom
$pdf->SetY(-15);
//Select Arial italic 8
$pdf->SetFont('Arial','I',8);
//Print centered cell with a text in it
$pdf->Cell(0, 10, "Hello World", 0, 0, 'C');

$pdf->Output("my_modified_pdf.pdf", "F");

The above code takes a PDF file “my_existing_pdf.pdf”, and creates a copy of it “my_modified_pdf.pdf” with “Hello World” printed at the centre bottom of the first page.

That’s it! To achieve my goal, which I outlined at the start of this post, I extended the FPDI class, and overrode the Footer() method to print a customized footer in each page.

I only wish that the PHP online manual did NOT have an entire section dedicated to PDFLib, a non-free and commercial library, and rather point to free ones such FPDF or TCPDF. It could have saved me hours.

Working on Netbeans 6.5, and loving it

windowslivewriter_netbeansdatabaseexplorergetssmarterwithe_1ea_netbeans-65

Netbeans has recently released version 6.5 of their powerful IDE. It comes with PHP support which can either be downloaded as a plug-in or installed as a stand-alone module. I’ve started working with it few days ago and am really impressed with it. Here are a a few things that I like about it:

User friendly

Creating new projects is really easy. They can be created either from scratch or from existing source files. The interface is clean and useful. It has file and project explorer, a navigation panel to quickly access the methods/members of your class files, a pallet to create HTML pages in drag-and-drop manner and lots more.

Code Assistance

It provides your basic PHP/JavaScript/HTML/CSS code auto-complete features like most other IDEs. Interestingly you can add your PHP framework/library files in the include path of your project and have the IDE suggest methods/members of the classes you have included in your script. It also supports code assistance for JQuery, Prototype and Scriptaculous.

Lightweight

It has a smaller footprint on system resource compared to Eclipse PDT and Aptana Editor, and even its predecessor Netbeans 6.1. It loads much faster than those on my Windows XP machine. I am yet to try it out on Ubuntu.

It free!

Yes. The last but not the least!

The thing that I don’t like about it, that it no longer separates the project specific files from the source files, as the 6.1 did. I work on Windows Desktop connected to a Linux server through Samba share. So I am left with ‘nbproject’ folders on the development server, which I have to manually remove when migrating the sources to production server.

Still Netbeans 6.5 is a decent 8 out of 10 on my book.

Download Link:

http://www.netbeans.org/downloads/index.html

Here’s a couple of screencasts to get started with the IDE:

http://blogs.sun.com/netbeansphp/entry/demo_of_the_php_support
http://blogs.sun.com/netbeansphp/entry/demo_of_the_php_distribution

MySQL Prepared Statements and PHP : A small experiment

Consider a PHP-MySQL application where the information of 1000 users is being retrieved from the database by running a for loop:

for($i = 1; $i <= 1000; $i++){

$query = "SELECT * FROM user WHERE user_id = $i";

//run the query and fetch data

}

In each iteration, the first thing the MySQL engine does is to parse the query for syntax check. Then it sets up the query and runs it. Since the query remains unchanged during each iteration(except for the value of user_id), parsing the the query each time is definitely an overhead. In such cases use of prepared statements is most convenient. A prepared statement is just like a typical query, except that it has ‘placeholders’ that are supplied values at run time. The prepared statement in this case will look like this:

"SELECT * FROM user WHERE user_id = ?"

Notice the placeholder(‘?’) for the value of user_id in the query. Now MySQL engine needs to parse the query only once, then execute it 1000 times by binding the placeholder with PHP script supplied value for user_id. This pre-parsing of the query results in a significant performance boost.

The MySQL Improved extension in PHP, more commonly known as MySQLi, provides an API to work with prepared statements. The documentation at the online PHP manual is good enough to get you started on how to use them on your PHP application, so I’ll not go through it. Instead, I am going to share the results of my personal experiments on comparing performances of traditional and prepared SQL statements.

I conducted the experiment on a demo project which has large amount of data. I wrote two separate scripts on our development server, both of which performed the same operation: joining two related tables (one of which has over 150,000 records, the other has 350,000) and fetching some data . One script used regular SQL statement, the other employed prepared statement techniques. Each script was executed three times and the time required to fetch the data was measured at each pass.

The First script: traditional SQL statement

//Get the Database link
$dbLink = getDBLink();


$timeStart = microtime(true);


for($i = 0; $i < 162038; $i++){


$query = "SELECT article_id, article_name, username as author FROM articles a LEFT JOIN user u ON (a.author_id = u.user_id) WHERE article_id = $i";


if($result = $dbLink->query($query))

$obj = $result->fetch_object();

else die("Failed to execute query: $dbLink->error");


$result->close();


}


$timeEnd = microTime(true);
$dbLink->close();


//measure the time difference
$timeDiff = $timeEnd - $timeStart;

echo "Total time: $timeDiff seconds";

Output:

First Pass -> Total time: 25.5793459415 seconds
Second Pass -> Total time: 25.1708009243 seconds
Third Pass -> Total time: 25.2259421349 seconds

Average: 25.32536300023 seconds

The Second Script : using prepared statement

$dbLink = getDBLink();

$query = "SELECT article_id, article_name, username as author FROM article a LEFT JOIN user u ON (a.author_id = u.user_id) WHERE article_id = ?";


$stmt = $dbLink->stmt_init();


if(!$stmt->prepare($query))
die("Failed to prepare statement: ".$dbLink->error);

$timeStart = microtime(true);

for($i = 0; $i < 162038; $i++) {

//bind the parameter
$stmt->bind_param('i',$i);
//execute the statement
$stmt->execute();
//bind the result, fetch it, then free it
$stmt->bind_result($articleId, $articleName, $author);
$stmt->fetch();
$stmt->free_result();

}

$timeEnd = microTime(true);

$stmt->close();
$dbLink->close();

//measure the time difference
$timeDiff = $timeEnd - $timeStart;

echo "Total time: $timeDiff seconds";

Output:

First Pass -> Total time: 20.1434290409 seconds
Second Pass -> Total time: 20.182309866 seconds
Third Pass -> Total time: 20.6448199749 seconds

Average: 20.32351962726 seconds

The task takes 20% less time for prepared statement, a significant performance boost.

Other than performance, it can also improve application security by guarding against SQL Injections. Check out this informative blog post on that topic.

Harrison Fisk at MySQL AB wrote a very good article on MySQL prepared statements. Don’t forget to check out the section ‘When should you use prepared statement?’ if you read it.

Should Readability suffer for Performance?

Being a PHP developer, I often spend my spare time at work by browsing through blogs and articles on good coding practices, performance optimizations, tips, tutorials and so on. There have been some very resourceful and informative writings on these topics which helped me a lot. However there are some optimization tips on these articles that I found to be trivial, which may gain you a few nanoseconds of faster code at the expense of poor readability. For example, this article introduces a faster approach for detecting the length of a string:

if (strlen($foo) < 5) { echo "Foo is too short"; }

vs

if (!isset($foo{5})) { echo "Foo is too short"; }

The second approach, according to the article, is faster. May be, but should I quit using strlen() from now on? As far as I know, in a professional environment, the quality of code is not only measured by how fast it runs(performance) but also how easy it is to comprehend(readability) by someone other than the coder. It is more convenient to use strlen() to check the string length, because even if someone didn’t know the method, he/she could have guessed what it does from its name. Why obfuscate the code for a performance improvement that is barely comprehensible?

Another example of this, pushing an item in an array:

array_push($stack, $var) vs $stack[] = $var;

The second approach is claimed to be faster. I’ll still use the first one though. Its clear and convenient, I don’t mind if its a bit ‘slow’.

Finally there is an ongoing debate on the use of __autoload(). It clearly is better than require/include/require_once because includes the scripts at runtime, making the code cleaner and more manageble. However it has been said autoload() and other magic methods decrease performance and their use should be avoided. Personally I would prefer to use autoload() since I code for maintainability. Performance can be optimized at the hardware level.

Update: The SPL autoload functions can be used instead of __autoload() to improve performance.

PHP ‘Good’ Practices

This article lists a number of coding practices that should be followed in professional environments. Most of these practices are not of my invention, they have been accumulated from numerous blog and forum posts throughout the Internet written by expert PHP developers. I am just listing them here, along with few of my own conventions as a personal checklist for code review. If you are benefited by any of these, I’ll take that as a bonus 🙂

Disclaimer: I didn’t go with the usual ‘Best Practice’ term as I am personally opposed to it. A certain coding practice should not be called best if everyone follows it. Someone could come up with a better convention, thus invalidating the previous. Hence the title ‘PHP Good Practice'(I think ‘Best Practice so far’ should be a more accurate term, but it doesn’t sound as cool).

1. Use ‘===’ instead of ‘==’ for equality. ‘===’ compares both the types and values of its operands, unlike ‘==’ which only compares values. (That is why 0 == ‘0’ returns true in PHP, but what about 0 == ‘test’ ? Take a look.)

2. Avoid function calls within for() loop control blocks. For example,

for( $i=0; $i<count($x); $i++ )

The count($x) is called at every iteration. Rather call it before the for loop.

3. When dealing with strings, see if you can use the string library functions and avoid the regex. For example, suppose you are looking for a certain word (say ‘XML’) inside a string. You can simply use strpos() instead of preg_match(), since the word you are looking for is unchanging.

4. An improper use of strpos() would be like this:

if(strpos($substring,$string)){ //do something}

strpos() returns the position of the first occurrence of the substring within a string, otherwise returns false. The above condition will fail if you are looking for ‘abc’ inside ‘abcdef’, because PHP treats 0 and false to be equal. The following code better handles this situation

if(strpos($substring,$string) !== FALSE)
{ //do something}

This further proves the point number 1.

5. Use single quotes for wrapping strings. PHP looks for variables inside double quotes making things slower.

6. If a class method can be static, declare it static. Speed improvement is by a factor of 4.

7. var_dump() is better than print_r() for debugging as it prints both type and value of the data. You cannot tell an empty string from NULL using print_r().

8. $row[’id’] is 7 times faster than $row[id].

9. To find out the time when the script started executing, $_SERVER[’REQUEST_TIME’] is preferred to time().

10. Variables should be initialized before being used. Makes the code less sloppy. Also incrementing an initialized variable is faster than incrementing an uninitialized one.

11. I recently found a code where the developer is checking if a certain constant is defined before invoking an action. His code was like This

if(MY_CONSTANT)
{
//do something
}
else
{
//do something else
}

If for some reason MY_CONSTANT is not defined, PHP will still execute the statements in the if block, because MY_CONSTANT will then be a string and if(MY_CONSTANT) will return true. The proper way to do this will then be

if(defined(MY_CONSTANT))
{
//do something
}

12. Close database connection, free resources once you are done with them. It improves execution speed.

13. To improve robustness of the application, make use of try-catch statements. Define distinct exception classes for distinct operations (e.g. DatabaseException, FileNotFoundException) and handle them properly for graceful crash of the code.

14. Avoid the ‘PHP White Screen of Death’ in production environment. Use custom fatal error handlers. Here is a good article about them.