An Exercise in Irrelevance - Restoring Kblog

We’ve been working for sometime now on our lightweight semantic publishing environment, kblog. Near the beginning of the last academic year, unfortunately, we were compromised through a zero-day vulnerability. With the press of academic life and teaching, it has taken me a long, long time to get the show back on the road. However, finally, I think we have achieved this. The main cause for the delay was, simply, that I didn’t have time — at other stages of the year, it would not have been so much of a problem, but in October there is very little give in my working week. However, an additional problem has been that restoring kblog was a lot more effort than it should have been.

If you are only interested in kblog itself, well, it should be up now. The rest of this post is going to be a technical write-up.

First, we decided to take the opportunity to move kblog over to a larger machine. It was running on a small virtual machine previously. In general, this coped well with the traffic, but we wanted something with a bit more memory — it was a little stretched while we held workshops, and lots of people were authoring at once. We have also secured the machine up further than before. I wont go into too many details here, for obvious reasons, but I think it is less like to get hacked in future, although the risk is still there.

Kblog started out as an experiment: both a development machine for the knowledgeblog environment and a service for people to read. This is not really a tenable situation; in future, we’ll only upload new kblog plugins after a while longer testing. I’ve also removed some of the third party plugins — I want to move knowledgeblog to being run under wordpress DEBUG without warnings. For a while, this is like to result in a few missing pieces of functionality: gravatars and versioning being the most obvious examples. The server will not ONLY be for the web. Mailing lists and such like will move onto Google code; why host something we do not have to?

To restore cleanly, we have decided to use a from fresh install. No PHP from wordpress has been maintained from the old site; everything has been checked out anew. We have also been through the database, and as far as we can tell, there is no malicious content there, although we cannot discount the possibility.

The restoration itself was much harder than expected, for historical reasons. Kblog was originally a Wordpress MU installation — the multisite version of wordpress. Since Wordpress 3.0 came out, however, this has not been an independent install; we ported over to a being a standard Network installed Wordpress 3.0 a while back. However, this historical baggage was a substantial impediment to restoration. It turns out that the standard install and restore process doesn’t work under these circumstances. Essentially, we got the infamous “Error establishing a database connection”. With some help from the forums, I tried testing the MySQL connection (it worked), the privileges (which were correct), a clean installation in the same database (by swapping the table prefixes and installing anew). After a minor diversion with my rewrite rules, I could still find no problems or mistakes with my install.

At this point, I resorted to a full debug of the Wordpress load. The error started in wp-settings.php, at the call to wp_not_installed() Chasing down a bit further, got me to is_blog_installed. This checks for the options table, which appeared to not exist, and then for other WP tables which DID exist, at which point it calls dead_db(). Despite having just connected to the database, Wordpress prints the entirely unhelpful “Error establishing a database connection” message at this point.

After much poking around, I found the cause. Rather than using a normalised schema in its database, wordpress use duplicate tables. So, rather than having WP_POSTS, with a blog id column, each blog has an independent set of tables, as well as a few which are shared between all blogs, a sort of “psuedoschema”. Between WPMU and WP3 this psuedoschema changed — the tables for the first blog were prefixed wp_1_ in MU, but are now just wp_. Subsequent blogs are wp_2_, wp_3_ and so on. Not ideal, but I guess is simplifies the SQL, as most of Wordpress deals with only a single blog at a time.

Now, normally, wordpress copes with this. Something in the wp_config.php tells Wordpress to look for wp_1_ rather than wp_, when this wp_config.php has resulted from an upgraded WPMU installation. Unfortunately, I had done a fresh install. Now, I realised that this was a risk, but I decided that “fixing” the database was the best option. It already appears that having an ex-WPMU database had caused problems now, and into the future this is only going to get worse. So, I renamed the tables with a bit of SQL.

This worked, problem solved? Well, nearly. However, all the users from my first blog (served now from the wp_ tables, previously from wp_1_ tables) had disappeared, as had all my roles. Again, more forum searching found the problem. Not only does Wordpress use a psuedoschema, it stores table names in the database. So, wp_options had a line with a “option_name” called “wp_1_user_roles”. Again, more SQL.

update wp_options set option_name = "wp_user_roles" where option_name = "wp_1_user_roles";

and back came my roles, but still no users. In the end, I greped through the database dump (I am sure that this is not how databases are supposed to work) for “wp_1”. The culprit was the wp_usermeta table. This SQL revealed the problem.

select * from wp_usermeta where meta_key like "wp_1\_";

With a bit of Emacs hacking and Perl, I generated the 10 statements of this form…

update wp_usermeta set meta_key = "wp_capabilities" where meta_key = "wp_1_capabilities";

which finally solved the issue, and voila, users are back.

So, all a bit messy really. In the end, I can see why Wordpress decided to go for a psuedoschema approach, but then putting the database table names into two tables as well? To me, this goes to far, and is not a great design solution. Combined with the entirely unhelpful error message; well, it has cost me 3 or 4 days of hacking, time which could have been better spent. Combined with no straight-forward way of forcibly resetting the passwords for a network install (more Emacs, Perl and SQL!), I don’t think that wordpress has been overly helpful here.

Regardless, the situation now appears to have been resolved. While I may be slightly irritated with Wordpress, it does show the strength of free software; as is often the case, the community were very helpful in providing me with the pointers I needed, and where they could not help, I did have the option of reading, altering and debugging the code. It may be the nuclear option, and one rather avoided, but it is there if you need it.

In the meantime, I have been playing around with kcite. I was going to make it cleverer for the next release, but in the end, I have decided to focus on making it broader. I now have preliminary support for both arXiv and datacite IDs and metadata. Neither was a great deal of effort, but are strategically important: data and preprints being considered to be first class citizens. Next up, support for refering to Kblog URLs directly.