Wednesday, August 29, 2007

A Stitch In Time

I've got a probem with date sorting.

I have a PHP/mySQL application which requires sorting a dataset by year.

As mySQL won't let you store just the year, I originally used a varchar (text) field to store the year, YYYY, which sorts nicely.

However, I now have a client who wants to be more specific with some dates only, ie they want to be able to store YYYY, YYYY/MM or YYYY/MM/DD in the same field and still have it sort nicely.

OK, so I can force them to input the full date (when required) in the YYYY/MM/DD format, but that looks really ugly when output to the web page as a human-readable date.

So my problem is this:

  • If I want human-readable dates, like 1 January 2007, and go for a text field, they won't sort properly as 1 April 2007 comes before 1 January 2007 in that case.
  • If I opt for the date field, it would sort properly but I can't store dates which are just month/year or year only, as mySQL throws a wobbler and protests when you don't have the full date input.
Storing the "date" as a string: 2007/08/01 (the pubDate) sorts nicely, and I had hoped using the following PHP:
<?php echo date('j F Y', strtotime($row_rsByDate['pubDate'])); ?>
would solve the problem, since it gives a nice date 1 August 2007. But there are still issues with just storing part of the date - see screenshot below:

The entries are sorted correctly (most specific at the top, descending order). The first date (red) is the nice output of the PHP function - the second (puce green) is the actual stored text.

It falls over horribly when only the YYYY/MM is stored (returning 1st January 1970) or YYYY (returning today's date).

EUREKA!

After much discussion (via the comments on this post and my buddies on Twitter) , I was almost ready to explode with frustration at not being able to find a solution. Turns out, that's exactly what I needed to do!

James Aylett suggested I use the PHP explode() and implode() functions to write a custom parser. I didn't actually need the implode() half as it happens. Being a relative newbie to PHP, I'd never heard of the explode function, so thanks for the pointer, James!

So, to recap. The database column should be stored as TEXT or VARCHAR(10) not DATETIME. That way, all variations of YYYY, YYYY/MM or YYYY/MM/DD are acceptable, and will sort correctly. This will look horrible if put straight into the page, so use something like the following to present it nicely to the web page:

<?php $split = explode("/",$row_rsByDate['pubDate'],3);
print_r($split[2]);
echo " ";
if ($split[1]==1) print_r("Jan");
elseif ($split[1]==2) print_r("Feb");
elseif ($split[1]==3) print_r("Mar");
elseif ($split[1]==4) print_r("Apr");
elseif ($split[1]==5) print_r("May");
elseif ($split[1]==6) print_r("Jun");
elseif ($split[1]==7) print_r("Jul");
elseif ($split[1]==8) print_r("Aug");
elseif ($split[1]==9) print_r("Sep");
elseif ($split[1]==10) print_r("Oct");
elseif ($split[1]==11) print_r("Nov");
elseif ($split[1]==12) print_r("Dec");
echo " ";
print_r($split[0]);
?>

explode() - separates the string found in pubDate into its component parts (the separator is set by the first parameter, "/"), giving the $split array. This containts up to 3 elements, depending on how specific the stored "date" is.
$split[2] is the DD day element and is printed directly.
$split[0] is the YYYY year element and can be printed directly.
$split[1] is the MM month element, which requires the pretty formatting. So a match against each month number prints out a more human-readable month. Voilà!

Another screenshot - this time the green output is the unprocessed pubDate text field, and the preceding red date is the bespoke-parsed version - allowing for all cases of "date" format.

Some caveats:
I wouldn't normally recommend spoofing the date like this - if it's a real date (specified to day, month, year for every record) then use the DATETIME type in your database - this will sort and you can perform arithmetic on it quite readily (eg next month, last week etc).

I only went through this pain because of the special case I was finding - the publication "dates" of newspaper articles (day, month, year), journals (month, year) and books (year only) which my data required. Using this solution lets it all sort properly and look nice to viewers. Win!