Archive for the ‘dev’ Category

Announcement: New perlwikipedia maintainer

December 8, 2007

Well, I finally bit the bullet today and stepped down as maintainer of Perlwikipedia, my MediaWiki bot framework. My successor is ST47, a fellow admin on enwiki who serves on the Bot Approvals Group and has more bots than I have fingers.

I can’t say that it hasn’t been a long time coming, but I think that ST47 will do a much better job as maintainer than I did. He’s enthusiastic about Wikipedia, is a great Perl hacker, and has written more bolt-on enhancements to Perlwikipedia than there are original lines of code.

In any case, I believe we’ll see a brand-spankin’-new Perlwikipedia release in the near future, one that’s more shiny and can do your dishes.


libgnomecanvas woes

December 1, 2007

Jhbuild was merrily cranking away at Evolution deps, when libgnomeprintui spit out an error about “Too many open files.” So, I cranked up my per-user open files limit in /etc/security/limits.conf to 4096. Logged-out and back in again, and it was still there.

Turns out, on older versions of libgnomecanvas and gail, like the ones that jhbuild uses by default, the two libraries have circular dependencies.

Solution? When jhbuild fails, go to the shell and switch to the libgnomecanvas directory. Then, execute

svn switch

Then do the “./ && make && make install” business.

Now, in the gail directory, do

svn switch

Build it. Now you can exit the shell and re-run the jhbuild configure with the circular dependencies resolved.


Jhbuild headaches

November 30, 2007

I’ve spent the last day or so wrestling with Jhbuild, Gnome’s build-from-SVN program. I thought “All I want is to build Evolution, it’s all I ask.” Nah, that would be too easy!

There’s a reason that they say trunk is unstable. I’ve had more build errors trying to get jhbuild to do a clean run on Evolution than I did trying to compile everything from source on Fedora Core 4.

I’ll post problems I encounter and their solutions, assuming I manage to get this thing built finally. It might be easier just to grab Fedora’s srpms. Meh, I’m a developer, let’s take the hard way.


Postgres and the Ultimate Hitchhiker’s Guide Part One

October 10, 2007

So, after some remarkably easy setup with xml2sql and PostgreSQL 8.3beta1, I’ve finally loaded all mainspace articles and templates into a database system. Now the hard part starts.

In order to generate the HTML for each article, I need to have a copy of the wikitext for that article. Since going through each article individually, loading it into a file, and then using the hacked-together parser I found on the resulting file is terribly slow, I’m pulling down the wikitext for every article and storing it in a flat text file for later parsing.

The interesting part about this is that Postgres likes to run the query, cache it to the disk, then replay it to the client. This is remarkably inefficient for my query, which returns about 2 million rows that total about 8 GB. Solution? Postgres cursors!

A cursor is essentially a way to tell Postgres to run a query, but don’t actually run the query. Then, using the FETCH command, the server will dynamically execute the query and return an arbitrary number of rows, without putting the entire query on disk. Now that’s efficient (or better suited for my hardware, anyway).

So right now, my hackish Perl script is fetching about 1,000 articles every 5-10 seconds and pushing them to disk. Should be done in no time…


Perlwikipedia version 1.0

September 8, 2007

Well, after finally remembering that I own a blog, here’s an announcement: The Perlwikipedia development team is pleased to announce that Perlwikipedia version 1.0 has been released! Perlwikipedia is a MediaWiki framework written in Perl, which can be used to develop bots and other tools that need to edit or get information for any MediaWiki-based site. You can download a copy of the framework from