Playing golf with PHP
One of my geekiest hobbies is playing golf. Not the 18 holes of weathered drudgery kind, but the 18 hours of staring at the same 2 lines of code kind. If you still don't know what I'm talking about, the aim of the game is solve a given problem in the smallest amount of code (fewest strokes... geddit?). (more...)
Splitting a string into quoted and unquoted sections
<?php $str = 'unquoted section "quoted" and unquoted "another quoted \"with a subquote\" section" more unquoted'; preg_match_all('/"(?:\\\\.|[^"])*"|[^"]+/', $str, $m); print_r($m); ?>
Output
Array
(
[0] => Array
(
[0] => unquoted section
[1] => "quoted"
[2] => and unquoted
[3] => "another quoted \"with a subquote\" section"
[4] => more unquoted
)
)
Function to readable
<?php $str = 'Some_fugly_FunctionNaming_schemeHere'; $str = preg_replace('/(?<=[a-z\d])([A-Z])|_([a-zA-Z])/e', '" \xDF"&" $1$2"', $str); echo $str; ?>
Linear to nested array
<?php $a = array('a', 'b', 'c'); foreach (array_reverse($a) as $v) { $b = isset($b) ? array($v => $b) : array($v); } print_r($b); ?>
Output
Array
(
[a] => Array
(
[b] => Array
(
[0] => c
)
)
)
Removing duplicate characters
<?php $str = 'foobarrrr'; $str = preg_replace('/(.)\1+/s', '$1', $str); echo $str; // fobar ?>
The adventure of PHP and the magic quotes
Back in PHP 2, the "magic quotes" setting seemed like a great idea. It would automatically escape all of your input so you didn't have to worry about those pesky SQL injections. Any dodgy characters entered by the user would be automatically escaped by a backslash.
Like register_globals, it helped lower the barrier of entry to building a dynamic website by removing some of the complexity. However it certainly wasn't without sacrifice, problems with the implementation quickly appeared and continued to abound for the next ten years. Finally in PHP 5.2.2 we got an implementation which (as far as its intentions went) seemed to be bug free, but of course by then it was turned off by default and was already slated to be dropped in PHP 6.
Why are magic quotes so bad?
-
Insufficiency
This is the single biggest problem with magic quotes. Even if you follow all the instructions right and code defensively, relying on magic quotes to escape strings for database queries is still not secure. Magic quotes uses addslashes() for its escaping which doesn't care about the character encoding of your database, meaning it's vulnerable to multi-byte character attacks as explained by Chris Shiflett.
-
Portability
You simply cannot rely on magic quotes being on or off if you expect to distribute your PHP script to others, since many hosts don't provide a way of changing the setting.
-
Complacency
The very concept of magic quotes fosters complacency. Why bother validating input if it's going to be escaped anyway? All too often on IRC I see code which completely neglects validation - strings and supposedly numeric values going straight into the query with no effort being made to check them. Examples can be found in countless books and tutorials, and they are still being written.
-
Inconsistency
Depending on which version of PHP you use, you can expect vastly different results from magic quotes. As mentioned above, PHP 5.2.2 was the first version in which magic quotes actually did what it said on the tin. Some features which may or may not work include: escaping double quotes; escaping the $_SERVER and $_ENV arrays; escaping the keys of arrays.
-
Performance
When magic quotes is on, everything is escaped. You might have a page which only uses PHP to output the current date, but if the user decides to send 8 MB of post data, it's going to be escaped whether you like it or not, before your script even starts. Extreme examples aside, any posted value which you wouldn't need to escape is another useless call to addslashes().
-
Inconvenience
As convenient as it is not worrying about having to escape anything, once you're aware of magic quotes' flaws, they become a real pain in the arse. We've all seen sites suffering from double escaping, where content ends up being actually output like "the Pope's underpants".
Coping with magic quotes
The number one piece of advice regarding magic quotes has to be, turn them off! However that's easier said than done. Since it takes effect before the script is run you can't just use
The luckiest people host their own sites or have a host agreeable enough to turn it off in php.ini or the web server configuration. The slightly less lucky people can still control PHP settings per-directory, for example by the use of an .htaccess file.
Everyone else is forced to try to cope with it at runtime, and this is where the real fun begins. Since PHP 4, there's been the handy get_magic_quotes_gpc() function, which is most commonly seen used like the following example from the PHP manual:
<?php if (!get_magic_quotes_gpc()) { $lastname = addslashes($_POST['lastname']); } else { $lastname = $_POST['lastname']; } ?>
Unfortunately, get_magic_quotes_gpc() is not aware of the magic_quotes_sybase setting, which changes the escape method to Sybase-style doubled quotes, and also knows nothing of the various inconsistencies of magic quotes. Therefore you could easily be applying
The other common approach is to try to completely undo the effects of magic quotes, in the hope that it will end up as if magic quotes was never turned on, and then proceed with proper escaping. This solution means that at worst you'll be double escaping - there's no chance of anything dodgy getting through as long as your proper escaping is effective. However, it has some problems of its own.
At the primitive end of things, you might see code like the following:
<?php if (get_magic_quotes_gpc()) { $_GET = array_map('stripslashes', $_GET); $_POST = array_map('stripslashes', $_POST); $_COOKIE = array_map('stripslashes', $_COOKIE); $_REQUEST = array_map('stripslashes', $_REQUEST); } ?>
Ignoring the potential double-escaping, this solution has two fundamental problems. Firstly any arrays in the input will be converted to the string "Array", which is slightly less useful. Secondly it only fixes the values, ignoring the keys.
It is quite simple to make a recursive function to tackle the first problem, and that's what the PHP manual opts for on its page on Disabling Magic Quotes:
<?php if (get_magic_quotes_gpc()) { function stripslashes_deep($value) { $value = is_array($value) ? array_map('stripslashes_deep', $value) : stripslashes($value); return $value; } $_POST = array_map('stripslashes_deep', $_POST); $_GET = array_map('stripslashes_deep', $_GET); $_COOKIE = array_map('stripslashes_deep', $_COOKIE); $_REQUEST = array_map('stripslashes_deep', $_REQUEST); } ?>
The recursion means that it won't destroy your arrays but it still suffers from the second problem, not escaping the keys, and it introduces yet another problem - arbitrary recursion depth. Consider the following query string:
foo[][][][][][][][][][]=1&foo[][][][][][][][][][1]=2
Fixing those two values alone takes 22 function calls, imagine how much work a single request could cause using the maximum post data size - by default 8 MB. That's a pretty effective basis for a DoS attack.
In practice PHP will fall over far before it completes all that work, namely because it sucks at recursion. A new configuration setting (max_input_nesting_level) was added in 5.2.3 to control the maximum recursion depth, but hosts providing 5.2.3 or above are sadly in the minority.
The easy solution is to add another argument and a depth check to that function, but that leaves the problem of escaping the keys, and the double-escaping problem we've been ignoring from the beginning.
Enter, PHP_Compat. Aidan Lister pioneered this awesome PEAR package and I've proudly been helping out for the last couple of years.
In our upcoming release, PHP_Compat 1.6.0, we've added an environment module which includes magic_quotes_gpc among the settings it fully supports changing at runtime. Because of the nature of the package we've gone with the second approach, trying to restore the environment to exactly how it was before magic_quotes_gpc took effect. However we use an iterative algorithm so recursion depth isn't a problem and we take into account every inconsistency going back to PHP 4.0, so you shouldn't experience any double-escaping at all.
The future
Until they're gone for good, we have to spread the message that magic quotes are evil, turn them off! The less people make use of magic quotes, the sooner we can be rid of them. PHP 6 is continually being delayed and at the rate people have moved to PHP 5 it will take at least another ten years for everyone to be on PHP 6. I'm sure we'll see plenty more adventures along the way.
Update:
For completeness I think I should point out here, that yet another problem with magic_quotes_gpc emerged with PHP 5.2.7, it was completely broken.
Also Chris Shiflett added in a kind comment on phpdeveloper.org:
One thing I would add to your list is the fact that magic quotes is an escaping solution. Escaping input provides a unique set of challenges, because it's an inappropriate design decision. Because data is not necessarily destined for exactly one context, it is only important to be sure input is valid. (Filter input.) Each context can then be handled as needed, typically when the data leaves PHP for another context. (Escape output.)
Magic quotes violates this simple tenet, and in my opinion, this is the root cause of most of the problems related to using it.
Random string
<?php /** * Generates a random string. * * @param int $length * The length of the string * @param string $ranges * The characters to use, formatted like a regex character class (a-fA-F0-9) * @param bool $unique * Every character in the result is unique or not * @return string * The resulting string. */ require_once('preg_expand.php'); function random_string($length = 6, $ranges = 'a-zA-Z0-9', $unique = false) { $out = ''; $chars = preg_expand($ranges, 0); if ($unique) { $keys = array_rand($chars, $length); } else { for ($i = $length; $i--;) { $keys[] = array_rand($chars); } } foreach ($keys as $k) { $out .= $chars[$k]; } return $out; } /** * Quick and simple version */ function random_string_simple($length = 6) { $chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'; for ($i = $length, $out = ''; $i--;) { $out .= $chars{rand(0, 61)}; } return $out; } ?>
Example
<?php echo '<h4>Default:</h4>', random_string(), '<br /><br />'; echo '<h4>8 characters, A-F0-9:</h4>', random_string(8, 'A-F0-9'), '<br /><br />'; echo '<h4>6 characters, a-z, unique:</h4>', random_string(6, 'a-z', 1), '<br /><br />'; echo '<h4>Simple:</h4>', random_string_simple(); ?>
Output
Default:
shAjlw8 characters, A-F0-9:
9A181CE76 characters, a-z, unique:
fgkqswSimple:
RsxWZj
preg_expand
<?php /** * Expands all characters from the given PCRE character class definition. * * @param string $in * The input character class (not contained in '[]') * @param bool $string * Set to true to return the result as a string * @param int $begin * The start of the character range to consider * @param int $end * The end of the character range to consider * @return mixed * All the characters specified by the input. This is a string if the $string * argument has been provided, or else an array with one element per character. */ function preg_expand($in, $string = true, $begin = 0, $end = 127) { if (!is_string($in) || !($len = strlen($in))) { return false; } $range = range(chr($begin), chr($end)); $in = strtr($in, array('[' => '\[', ']' => '\]', '/' => '\/')); $in = ($in{0} == '^' ? substr($in, 1) : '^' . $in); $in = preg_replace('/[' .$in . ']/', '', implode($range)); return $string ? $in : str_split($in); } ?>
Example:
<?php echo 'Alphabet: ', preg_expand('a-z', 1), '<br /><br />'; echo 'Hexadecimal: ', preg_expand('A-F\d', 1), '<br /><br />'; ?>
Output:
Alphabet: abcdefghijklmnopqrstuvwxyz Hexadecimal: 0123456789ABCDEF
secondsh
<?php /** * Turns a number of seconds into a human readable time. * This function is NOT calendar aware, months and years are based on the * average days per year in the Gregorian calendar. * * @param int $in * the input number of seconds * @return string * a string representation of the input, broken down into larger units */ function secondsh($in, $limit = 0, $sep = ', ') { $units = array('second', 'minute', 'hour', 'day', 'week', 'month', 'year'); $factors = array(1, 60, 3600, 86400, 604800, 2629746, 31556952); for ($i = 7, $bal = 0; $i--;) { $cur = floor(($in - $bal) / $factors[$i]); if (!$cur && ($i || $in)) { continue; } $bal += $cur * $factors[$i]; $ret[] = $cur . ' ' . $units[$i] . ($cur != 1 ? 's' : ''); } if ($limit) { $ret = array_slice($ret, 0, $limit); } return implode($sep, $ret); } ?>
Example
<?php $a = array(0, 1, 30, 60, 3599, 3600, 100000, 2419200, 2500000, 555555555); echo '<strong>Default</strong><table>'; foreach ($a as $v) { echo '<tr><td style="padding-right:20px">' . $v . '</td><td>' . secondsh($v) . '</td></tr>'; } echo '</table>'; echo '<br /><br />'; echo '<strong>Limit 2</strong><table>'; foreach ($a as $v) { echo '<tr><td style="padding-right:20px">' . $v . '</td><td>' . secondsh($v, 2) . '</td></tr>'; } echo '</table>'; ?>
Output
Default
| 0 | 0 seconds |
| 1 | 1 second |
| 30 | 30 seconds |
| 60 | 1 minute |
| 3599 | 59 minutes, 59 seconds |
| 3600 | 1 hour |
| 100000 | 1 day, 3 hours, 46 minutes, 40 seconds |
| 2419200 | 4 weeks |
| 2500000 | 4 weeks, 22 hours, 26 minutes, 40 seconds |
| 555555555 | 17 years, 7 months, 1 week, 20 hours, 39 minutes, 9 seconds |
| 0 | 0 seconds |
| 1 | 1 second |
| 30 | 30 seconds |
| 60 | 1 minute |
| 3599 | 59 minutes, 59 seconds |
| 3600 | 1 hour |
| 100000 | 1 day, 3 hours |
| 2419200 | 4 weeks |
| 2500000 | 4 weeks, 22 hours |
| 555555555 | 17 years, 7 months |
http_file_exists
<?php /** * HTTP version of file_exists * * @param string $path the path to test * @param bool $redir accept redirects (3xx) * @return bool true if the file exists, false otherwise */ function http_file_exists($path, $redir = false) { $path = parse_url($path); if (!isset($path['host'])) { return false; } $fp = fsockopen($path['host'], 80, $errno, $errstr, 4); if (!$fp) { return false; } if (!isset($path['path'])) { $path['path'] = '/'; } $request = "HEAD $path[path] HTTP/1.0\r\n" . "Host: $path[host]\r\n" . "Connection: close\r\n\r\n"; fputs($fp, $request); $pattern = '#HTTP/[\d\.]+ ' . ($redir ? '[23]' : '2') . '#'; while (!feof($fp)) { $response = fgets($fp, 1024); if (preg_match($pattern, $response)) { return true; } } fclose($fp); return false; } ?>
Example
<?php $p = 'http://www.rajeczy.com'; echo $p, ': '; var_dump(http_file_exists($p, 1)); echo '<br /><br />'; $p = 'http://www.rajeczy.com/something_nonexistant'; echo $p, ': '; var_dump(http_file_exists($p, 1)); ?>
Output
http://www.rajeczy.com: bool(true) http://www.rajeczy.com/something_nonexistant: bool(false)