wagons, ho!

October 6, 2008

The great pre-WotLK armory scan has begun at last. I can see that, in some respects, this is a bit of a waste of time since our Lich King friend is going to change a lot of things. Still, it does no real harm and it will be a test of my armory-crawling algorithm.

It’s worth saying a few words on the ideas behind armory-crawling strategies. First, what is the problem we need to solve? The problem is that Activizzard does not provide a GetAllCharacters() function. There is no way to write a loop like this:

for i=0 to Armoury.GetNumberOfCharacters()
    Fetch Character(i)
    Add Character(i) To Database
end for

The solution involves using the search functions that Blizz do provide in creative ways. It is possible to search for all guilds by a specific name and for all characters by a specific name. It is also possible to get a couple of lists of character names – guild rosters and arena team rosters. So there is a pattern here – given a valid name (of a guild or an arena team) we can get other names that are not known to us. Then, using those character names, we can ask for the XML of each character.

My algorithm starts off with some valid guild names, which can be harvested from other web sites that track guild progress in the game. It then uses a simple trick – it asks for all guilds of that name. There are a lot of guild names that are popular and are reused throughout the Blizz-o-sphere (which here means across both US and European servers; I have not done any investigation of the Asian servers yet.) For example there are over 100 “Deathknights” guilds in the US region alone. “Sin” seems to be another popular one.

So just asking the armoury for “Deathknights” gives us over 100 guild rosters straight off. Each roster may provide several hundred character names. For each specific guild/character, we then repeat the trick – ask for all characters of that name. For example we might have found a toon “Nooblette” of Deathknights-US-Aman’Thul. We then ask for all characters called “Nooblette” on all servers and we get back, say, 10 Nooblettes. Nine of those characters will not be in Deathknights; only one will be – the one we started with from the guild roster. But the beautiful thing is that those nine will be in other guilds (or unguilded).

So, as well as individual characters, we also get new, valid, guild names which we can then query for and the process repeats itself.

There is an amazing fan-out effect in this algorithm. From an initial seed of just a couple of guild names, it quickly captures literally thousands of new guild names and many thousands of toon names. When I started this project, I thought that it would be necessary to have an initial seed of guild names running into the hundreds. But that is not so. On my last test run, I was able to harvest 50,000 characters with an initial seed of just one guild name. And 50,000 was not the limit; the crawler was still going strong when I stopped the run.

The principle behind this rapid fan-out is that a lot of names are reused. The priest who decides to call themselves “ihealu” may think they are being original but… um… well… Nevermind; I make no claims to originality either. Also the game suggests names, which a lot of people take up. Those names are repeated across the servers.

The weakness of the algorithm is that it won’t find unguilded toons with genuinely original names – certainly “asdfgh” and “zxcvbn” may well escape our notice. Characters in small guilds that have odd names may well be missed too. But that doesn’t really matter; all we need is a sample size large enough to be representative of the player base. Anyway, the crucial thing is that large guilds with unique names get picked up because they invariably contain at least one character with a standard-ish name. That’s how the algorithm achieves its broad coverage.

On this run I’ve started in the same way – with “Deathknights” as the single seed. We’ll see just how many characters can be discovered from this humble beginning.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: