Wikipedia’s Tor Problem

Today I noticed an essay on Tor. This essay, while very interesting, brought something slightly disturbing to my attention: That another admin had performed a bot-assisted blocking run on “suspected” Tor servers. This isn’t new; admins have been doing these sorts of runs under the radar for a while. Go back and look at the CharlotteWebb RFAR. Yet, this was a fairly large run, and I wasn’t comfortable assuming there wasn’t collateral damage.

So, I went ahead and wrote several Perl scripts to grab all pages linking to Template:Tor, the standard template used to notify editors that an IP is a blocked Tor server. This list went into a Postgres database. Then, another script checked every IP and decided if it was really blocked or not. Only about 12 of the 740 IPs with this template weren’t blocked, nothing major. I removed the template from the IPs’ talk pages and went on. Then, I used a Python script distributed with Tor to get a list of all exit nodes that can access the Wikipedia servers. This also went into Postgres. Now for the problematic part.

I ran an SQL query that took all blocked IPs marked as Tor nodes, then checked if they were actually Tor nodes. The list of supposed Tor nodes contained 87 IP addresses. Want to know how many were really Tor nodes?

87.

That’s right, there are currently 653 IP addresses that were, at one point, probably Tor nodes, but now they aren’t. 653 innocent IP addresses. Now, to put this in context, let’s examine how many REAL Tor nodes are blocked.

I used the block-checking script to check the list of actual, live, Tor nodes that could access Wikipedia. There were 1553 Tor exit nodes when I ran the query. Guess how many were blocked.

269.

To save you the math, that means we are NOT preventing 82.7% of Tor exit nodes from accessing Wikipedia. That’s a great statistic, considering that Wikipedia’s policy on Tor is to disable editing access for Torified users.

Now, this isn’t a perfect study. I’m not taking into account rangeblocks, which I don’t believe show up on Special:Ipblocklist for an IP in their range, autoblocks, and other things I can’t scan for. All this means is that we have a relatively huge hole through which users can “abuse.” However, I highy doubt they will.

People need to stop taking WP:OP so seriously if they aren’t going to enforce it. I can’t begin to count how many open proxies and Tor nodes I’ve seen blocked that have since been closed or switched to a different IP. Meanwhile, the IP is still blocked, usually for periods of 5 years or more. If you block a proxy, you need to follow up on it! Administrators can’t just assume Tor nodes have static IPs; I, for one, operate a center Tor node (read: A node that can’t allow traffic out, except to other Tor nodes) on a dynamic IP address. We need to start taking more responsibility for our blocks and stop issuing fire-and-forget blocks that will, at some point when the IP changes, affect legitimate users.

I’m probably going to start testing the waters to see how the community would react to a TawkerbotTorA clone. Perhaps now that we’re seeing more adminbots, they’ll finally realize that adminbots are useful for some tasks. Based on what I’ve seen in my study, a bot would certainly be more effective and accurate than some administrators.

~alex

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: