Archive for October, 2007

Chinese firewall evasion project

October 18, 2007

So about a month ago, a user contacted me, asking if I could convert a hardblock I enacted on an open proxy to a softblock. In accordance with Wikipedia policy, I denied the request. A short while later, I realized that the user was an established editor who would be a great loss to China-related articles. This is where I had the proverbial “lightbulb moment.”

If I could set up password-protected, SSL-enabled proxy servers, then users could contact me to use the servers and edit Wikipedia without violating policy! Now, about three weeks later, I’ve kicked off the WikiProject on closed proxies, a project designed to coordinate efforts between users who operate closed proxy servers. Currently, we have two members (myself and ST47, another admin), with a third user who has expressed interest. I’m the only proxy sysop at the moment, but hopefully we can get more online. Sometime in the next week or so, once we have more than one server up and everything is running smoothly, we’ll start accepting email-based applications for server accounts.

The proxy server I run is powered by Apache 2.2.6, with mod_ssl covering encryption and a somewhat-working mod_rewrite-based method of denying account creation. Apache’s built-in htpasswd-configured authentication, with flat text files, provides the authentication backend.

I don’t believe I have any legal issues to worry about, unless users start using the proxy to post libel and other bad things, but I don’t think a case would hold up in court.

Now all I need to do is wait for open registration, and hope that men in riot gear with big guns, black vans, and helicopters don’t show up at my house…



Postgres and the Ultimate Hitchhiker’s Guide Part One

October 10, 2007

So, after some remarkably easy setup with xml2sql and PostgreSQL 8.3beta1, I’ve finally loaded all mainspace articles and templates into a database system. Now the hard part starts.

In order to generate the HTML for each article, I need to have a copy of the wikitext for that article. Since going through each article individually, loading it into a file, and then using the hacked-together parser I found on the resulting file is terribly slow, I’m pulling down the wikitext for every article and storing it in a flat text file for later parsing.

The interesting part about this is that Postgres likes to run the query, cache it to the disk, then replay it to the client. This is remarkably inefficient for my query, which returns about 2 million rows that total about 8 GB. Solution? Postgres cursors!

A cursor is essentially a way to tell Postgres to run a query, but don’t actually run the query. Then, using the FETCH command, the server will dynamically execute the query and return an arbitrary number of rows, without putting the entire query on disk. Now that’s efficient (or better suited for my hardware, anyway).

So right now, my hackish Perl script is fetching about 1,000 articles every 5-10 seconds and pushing them to disk. Should be done in no time…


New project

October 1, 2007

So after trying out the Wikipedia-on-a-CD project, I’ve decided that it’s not enough. Wikipedia is the sum of all human knowledge! Surely it should be equivalent to an Earthly Hitchhiker’s Guide! So, amongst all of the other projects I have going right now, I’ll be taking the full Wikipedia dump, getting the relevant namespaces, and generating HTML for all 2 million articles, which will subsequently be stored on a CD. Only God knows how.

Take that, Douglas Adams!


Ohio LinuxFest 2007

October 1, 2007

OLF this year was a blast! IT was the first conference I’ve been to, so I’m sure I thought it was better than it really was, but it was still fun nevertheless. Here’s a quick summary of how it went:

Friday night: Arrive at the hotel. Sleep.

Saturday, 7-9: Wake up, get breakfast, get checked-in at the convention center. Got a bag o’ schwag from the organizers; had all sorts of cool stuff from the sponsors. Got my t-shirt (which is REALLY nice) and meal tickets.

Saturday, 9-12: Listen to Max Spevack, Fedora Project Leader, give the keynote, while trying to get wireless working. Apparently there wasn’t a big enough pipe available for internet, so I idled on the conference ircd. Went to the ZenOss talk and found my new favorite monitoring system. Discovered that the Developinga Linux Distro talk was canceled, and the Ubuntu for Beginners talk wasn’t interesting, so I went onto the show floor, met some of the GNOME and PostgreSQL guys, bought a GPLv3 shirt.

Saturday, 12-1: Found out about a potential keysigning, but missed it. Went over by the food court to try to find free wireless, but failed miserably (Stupid T-Mobile). Got Subway, looked through the conference schwag I got so far. Headed back to the conference and found a mythical room filled with free Google Code shirts. Awesome.

Saturday, 1-2: No interesting talks, so went to the Fedora BoF instead. I think I’ve just decided I’ll join the project 🙂

Saturday, 2-6: Went to the Cfengine talk, some really cool stuff there. Found another keysigning, got my key verified and signed by two other guys there. Headed to the Python talk, which was made problematic because I didn’t have the visual module installed. Still a good talk. Went to the Linux Link Tech Show raffle, didn’t win anything. Caught the PostgreSQL 8.3 talk; their new XML and uuid datatypes are phenomenally cool, as well as their new anti-VACUUM improvements.

Saturday, 6-?: Listened to about 15 minutes of Drew Curtis’s talk ( Didn’t want to wait until 7 for the ending keynote, so I went for a final walk around the show floor, got another guy to sign my key, and then packed up and headed home.

Hopefully we can get more organized for the keysigning next year. I might be able to get a Wikipedia BoF going as well!