Friday, December 15, 2006

Lies, Damned Lies, And Statistics

No matter who coined the phrase, it has often been used to cite the inaccuracy of some conclusions which can be drawn from analysing statistics. I'm usually pretty wary of them myself, but sometimes your server stats are your best friend.

I have recently been involved with the redesign of a website which has been online since the end of 2003. It was originally written by another team, and it contained many nested tables, a few styles, but basically not terribly semantic. The team I work with have been looking after the site's content since it's original launch, and a few months ago, the site owner came to us to ask if we could give it a fresh new look, and a bit of a re-organisation. It had grown organically since it's inception, and things had got a little muddled. It was felt that documents in certain areas of the site just weren't being found.

We undertook some user-centred design, testing our new proposals with paper wireframes and some open and closed card sorting. On the basis of these results, we tinkered a bit more and tested again. Then we set about reorganising the content and making much more semantic pages - lists of documents were coded as a list. I'll admit that one table remains for the basic layout, but this was pretty much proscribed by the templating system in use on the server. Everything else has been pared down to provide minimum tag soup.

On 1st November, the new-look site was relaunched. Fast forward a month, and I ran a statistics check on the site, comparing results from October 2006 (old style) and November 2006 (after relaunch). The results were startling.

October - Access Statistics

  • Total page impressions: 98,037
  • Top URL was the site root (no surprise)
  • 4th was the search page, with 1,530 hits.
  • That suggested people weren't finding what they were looking for.
  • "Responses" section (where the bulk of the answers to the public's FOI requests were published) was at 46th, with only 262 hits.
  • We were regularly publishing responses to very similar questions - because users didn't find them on the site before making their own request.
November - Access Statistics
  • Total page impressions: 108,632
  • Top URL was site root, 2nd was the new "responses" index page, with 1,658 hits.
  • Search page had plummeted to 437th - with a mere 24 hits!
  • Other (new) subpages of the responses section were getting plenty of traffic as people explored the new way of accessing the documents.
  • We are still publishing lots of responses to requests, but the number of near-duplicate queries has dropped significantly.
Page Weight Statistics
Some pages were completely re-structred in terms of their content, but about 20 pretty much retained their original information - it was just recoded from tables to lists. I did some analysis on these as a before and after comparison too.

The smallest page started out at 16Kb, and went down to 7Kb (56% reduction); the largest page was originally 119Kb and dropped to 20Kb (83% reduction). On average these 20 files' sizes were reduced by 73%. Not bad in itself, but when you multiply that by the number of page impresesions, you get an idea of the considerable reduction in bandwidth being used.

Conclusions
So there you have it - some numbers to back up the priciples of good user-centred design. I felt that the search page statistic was the most significant - and certainly backed up the old adage that if you have a decent navigation and information hierarchy, people won't need to use the search but will naturally find things themselves.

Pub Standards Party

As if last Saturday's BBC Backstage Xmas party wasn't enough, a group of hardy drinkers gathered last night for the Pub Standards Christmas party.

Much beer was consumed. It was Norm!'s birthday, and he'd very kindly paid for food to be provided, so the gannets soon swooped and demolished that too. The CSS Div had made a very sticky chocolate confection in Birthday Boy's honour, and that was handed round. Very rich, one tiny piece could send you into hyper-glycemic shock.

Frances was quite tipsy, although she's sworn me to secrecy on that one. Oops! And before I left, Matt and Patrick had persuaded me to go along to Matt's birthday showing of Raiders Of The Lost Ark tomorrow. Renting the whole cinema! Extravagance, or what? Most of us just rent the DVD. Ah, well, I could not provide a good alibi for not attending, so I'll be in the audience at 3pm - Matt might even give me a free "John Wayne" badge...

Thursday, December 14, 2006

A Quick Roundup

Screen Reader Demo
I've been very poor at keeping up with my blogging of late. I meant to post a few days ago after attending an excellent Screen Reader demo at Test Partners. Steve Green led the session, and John Welsman (Freelance Assistive Technology Consultant) amazed us all with his speed and dexterity at negotiating the web with JAWS. John also brought along his lovely dog, Dalton, who sat very quietly in the corner all afternoon.

It's only when you witness a blind user having to negotiate the tag soup and badly muddled code that still makes up a huge proportion of the web, that you really appreciate why semantic markup is so important.

  • A website may look great, but take off those stylesheets, and if there are no headings in a document, the user has no way of knowing which sections are which, without wading through loads of text.
  • Jaws will announce how many items are in a list, so it can be very easy to get a mental picture of the navigation items, if they are marked up this way, rather than dumped in a table.
  • A visually impared user builds up a mental model of the page in a completely different way to a sighted user - they have no sense of left or right, everything is top-down - so source order of your code is vital in aiding their understanding.
  • Consistency is the byword - not just in the way things look, but in source order, for instance. Markup pages the same way, and you will give a blind user a head start in visualising the next page they visit on your site - because some of the mental model will still apply from previous pages they may have visited.
Frances also did a writeup of the event, and so I won't repeat too much stuff here. Steve Green also makes some excellent comments on Frances' post.

BBC Backstage Xmas Bash
The geek Christmas event of 2006 was great fun, at The Cuban bar in Citipoint. Ably organised by Ian Forrester, nearly 400 folks attended. It was really nice to catch up with The Usual Suspects, and I was also able to chat with a few people I hadn't had the pleasure of talking to before, including Eric Meyer, who was in London running a 2-day Carson Workshop. I'm kicking myself that I wasn't able to go along to share Eric's knowledge, but work wouldn't pay! Maybe next time?

Glutton For Punishment
Talking of parties, I'm just about to head off to the Pub Standards Christmas party this evening, so I better get my skates on...

Wednesday, December 06, 2006

Cowboys (Me) And Indians (That'll be Apache)

I'm A PHP Newbie
For some weeks, I've been meaning to try my hand at some PHP development, having done most of my projects to date with .NET. I bought the excellent book Blog Design Solutions in September, and have been gradually reading my way through it in my spare time. It gives advice on installing and tweaking some of the most common blog engines such as Movable Type, ExpressionEngine, WordPress and TextPattern, but the last chapter leads you through building your own blogging solution.

I thought this was a good place to start for a PHP newbie, since there were copious examples and plenty of advice about setting up your test environment, a notorious minefield to tread safely on your own.

Setting Up The Test Environment
Unfortunately, this is where the pain and suffering began... I downloaded the lastest stable Apache release (as adivsed by the book), which was supposedly 2.2.3, as the Win MSI installer. It half loaded up, but would not run as a service on my WinXP Pro machine. The Apache icon appared in my SysTray, but the context menu was blank, and it did not appear in my list of services to start manually! After going round the loop several times, I gave up and went back for the 2.0.59 release instead - which worked first time!

Because I've already got IIS running as my default web server on localhost, I had to tell Apache to use a different port - 8080 is the conventional one for a second web server. Then you can use this in your URL to call on Apache to serve your pages:

http://localhost:8080/blog/index.php
I had already installed MySQL 5.0.22 a while back, along with useful tools such as the accompanying Administrator, Query Browser and Migration Toolkit, but had not really used any of it in anger since.Consequently, it took a few minutes to remember what I'd chosen as the root password for the MySQL Administratior package! Eventually, I set up my database (all very straightfoward with the Admin plugin, you don't have to go messing about with SQL statements to make a new table etc, it's all done from a neat little GUI). I'd also got PHP 5.2.0 installed by now.

Connecting To The Database
The next hurdle came when I tried actually running a PHP page with a database connection. I kept getting an error:
Call to undefined function mysql_connect()
After some reading around in my book Beginning PHP and MySQL 5 book (another one which has been propping up the coffee table of late but came into it's own for this), it turns out that PHP5 does not ship with native MySQL support embedded; you have to download some extra libraries and then go fiddling around with the php.ini file. I found this tutorial page really useful in explaining what was needed. And for all the knocking that Microsoft gets in various quarters, I don't ever remember this much effort being required to set up IIS to run with the .NET framework! Bah, humbug.

Once the environment was properly configured, the actual blog development wasn't too bad. I had a few "moments" of frustration trying to chase down some syntax typos which caused various things to blow up, but you get used to that with hand coding!

Telling The Time
Another tricky thing to get right is date and time formatting. My PHP book gave me info if you want to use PHP to display the current date:
<?php echo "".date("l, jS F, Y"); ?>
Gives you "Wednesday, 6th December 2006".

There are occasions when you want to format the date in the SQL statement, and trying to get your head round a seemingly-arbitrary set of case-sensitive parameters in the format string is difficult. Which is where Dan Winchester's guide to MySQL date_format was also very handy. You might use something like this:
SELECT post_id, title, post
DATE_FORMAT(postdate, '%W, %D %M %Y') AS dateposted, DATE_FORMAT(postdate, '%H:%i') AS timeposted
FROM posts WHERE post_id=$post_id LIMIT 1
dateposted would display "Wednesday, 6th December 2006" as before, and the timeposted variable shows "22:45". I split these in two so the parsed date string could be displayed separately from the time portion - if you made two posts in a day, it's nice not to repeat the day/date element every time. If you wanted to lump them together, just use this instead:
SELECT post_id, title, post
DATE_FORMAT(postdate, '%W, %D %M %Y' at %H:%i) AS dateposted
FROM posts WHERE post_id=$post_id LIMIT 1
This will give "Wednesday, 6th December 2006 at 22:25" as one string.

Future Developments?
So now, I have my own blogging engine running on my localhost using PHP and MySQL. I'm not about to share the new blog with the world, as it largely consists of a personal diary and various rants, but it's been a very worthwhile exercise in dipping my toes in the murky PHP waters.

I may decide to develop the code further, and perhaps use it to host this blog on my own server in due course, but for the moment, it's staying right here at blogger.