Wednesday, August 29, 2007

A Stitch In Time

I've got a probem with date sorting.

I have a PHP/mySQL application which requires sorting a dataset by year.

As mySQL won't let you store just the year, I originally used a varchar (text) field to store the year, YYYY, which sorts nicely.

However, I now have a client who wants to be more specific with some dates only, ie they want to be able to store YYYY, YYYY/MM or YYYY/MM/DD in the same field and still have it sort nicely.

OK, so I can force them to input the full date (when required) in the YYYY/MM/DD format, but that looks really ugly when output to the web page as a human-readable date.

So my problem is this:

  • If I want human-readable dates, like 1 January 2007, and go for a text field, they won't sort properly as 1 April 2007 comes before 1 January 2007 in that case.
  • If I opt for the date field, it would sort properly but I can't store dates which are just month/year or year only, as mySQL throws a wobbler and protests when you don't have the full date input.
Storing the "date" as a string: 2007/08/01 (the pubDate) sorts nicely, and I had hoped using the following PHP:
<?php echo date('j F Y', strtotime($row_rsByDate['pubDate'])); ?>
would solve the problem, since it gives a nice date 1 August 2007. But there are still issues with just storing part of the date - see screenshot below:

The entries are sorted correctly (most specific at the top, descending order). The first date (red) is the nice output of the PHP function - the second (puce green) is the actual stored text.

It falls over horribly when only the YYYY/MM is stored (returning 1st January 1970) or YYYY (returning today's date).

EUREKA!

After much discussion (via the comments on this post and my buddies on Twitter) , I was almost ready to explode with frustration at not being able to find a solution. Turns out, that's exactly what I needed to do!

James Aylett suggested I use the PHP explode() and implode() functions to write a custom parser. I didn't actually need the implode() half as it happens. Being a relative newbie to PHP, I'd never heard of the explode function, so thanks for the pointer, James!

So, to recap. The database column should be stored as TEXT or VARCHAR(10) not DATETIME. That way, all variations of YYYY, YYYY/MM or YYYY/MM/DD are acceptable, and will sort correctly. This will look horrible if put straight into the page, so use something like the following to present it nicely to the web page:

<?php $split = explode("/",$row_rsByDate['pubDate'],3);
print_r($split[2]);
echo " ";
if ($split[1]==1) print_r("Jan");
elseif ($split[1]==2) print_r("Feb");
elseif ($split[1]==3) print_r("Mar");
elseif ($split[1]==4) print_r("Apr");
elseif ($split[1]==5) print_r("May");
elseif ($split[1]==6) print_r("Jun");
elseif ($split[1]==7) print_r("Jul");
elseif ($split[1]==8) print_r("Aug");
elseif ($split[1]==9) print_r("Sep");
elseif ($split[1]==10) print_r("Oct");
elseif ($split[1]==11) print_r("Nov");
elseif ($split[1]==12) print_r("Dec");
echo " ";
print_r($split[0]);
?>

explode() - separates the string found in pubDate into its component parts (the separator is set by the first parameter, "/"), giving the $split array. This containts up to 3 elements, depending on how specific the stored "date" is.
$split[2] is the DD day element and is printed directly.
$split[0] is the YYYY year element and can be printed directly.
$split[1] is the MM month element, which requires the pretty formatting. So a match against each month number prints out a more human-readable month. Voilà!

Another screenshot - this time the green output is the unprocessed pubDate text field, and the preceding red date is the bespoke-parsed version - allowing for all cases of "date" format.

Some caveats:
I wouldn't normally recommend spoofing the date like this - if it's a real date (specified to day, month, year for every record) then use the DATETIME type in your database - this will sort and you can perform arithmetic on it quite readily (eg next month, last week etc).

I only went through this pain because of the special case I was finding - the publication "dates" of newspaper articles (day, month, year), journals (month, year) and books (year only) which my data required. Using this solution lets it all sort properly and look nice to viewers. Win!

7 comments:

Anonymous said...

You could go for an "elegant hack" that will enter 1st Jan YYYY when that is just entered or 1st MM YYYY when they do both a month and year.

Human readable dates can be done with a combination of the strtotome and date functions - so for instance:

date('whatever options you want to show, strtotime($['dateFromDB'])');

Any good?!?!

Caz Mockett said...

Ross, thanks for the suggestion, but as you can see, it doesn't parse the right "date" if the date/month aren't supplied. Gah!

Anonymous said...

Sorry Caz - have re-read my comment and while it made sense in my head that is indeed a magical place!

Attempt 2:

Store your dates as a datefield, or a timestamp field.

Have some PHP logic that will pad out the dates as required (so that 2000 would actually be added into the database as 1st January 2000) and so on.

You can use my previous comment to format your dates as required.
AND
Things should sort as required!?!

If they want things to sort better tell them to be more specific ;>

Anonymous said...

I appreciate this might not be particularly helpful right now, but in my experience it's always best to store dates as a date-specific field (whether that's a timestamp or a date) regardless of the level of precision you intend to use. Pretty much as Ross says; if you set it to January 1st YYYY at 00:00:00 then those additional specifics are there waiting to be set as and when you need them. Until that point, you can just use the year part without problems. Also, using this format can help get around problems with US vs UK date formats when you start adding days and months.

Caz Mockett said...

Guys,

Whilst I see what you are saying with the date formatting (and padding), it still doesn't really solve my dilemma. Even if (some) dates get padded with extra day/month info, others will have the correct thing stored, and so the application won't "know" which portion of the date to take as true - it's an all or nothing solution.

I can't perceive any need for adding dates togeather, since it was just a method of indicating when something was published. For articles in the newspaper, it needs to be specified to day level, journals are often just month/year and books are only specified as year of publication.

I wonder if I can put the lid back on the can of worms or not...!

Caz

James Aylett said...

While dates should indeed be stored as DATE, I don't think this *is* a date - it's a more complex datatype which can't be expressed in native SQL (you'd need an equivalent of union typing or something).

By the way, rather than all the print_r() invocations, I'd be inclined to:

(a) have a stringify function for your magical data type (useful for logging, if you ever need to email, etc.)

(b) convert your textual type that comes out of the database into a PHP struct, or better an object that knows how to format itself -- (a) then becomes a method

(c) use PHP date formatting functions so that localisation isn't such a pain

- J

Anas said...

Hi Caz!

I am a web designer myself located in South Africa but as of yet have not started programming in PHp and MYSQL. Can you suggest a good book to start of with?

Nice blog by the way.