back to the future

February 2, 2011

Just a brief note to say that I haven’t abandoned all hope of getting this site going again. Even though I’m not playing MMOs at the moment, it seems a shame to just leave everything sit idle. My basic infrastructure runs without too much effort, so it is no great problem to refresh the data every couple of months.

The main obstacle is that Blizz is now serving the up-to-date data from in HTML format rather than XML. My page-scraping code needs to change to cope with that. Fortunately however the Blizz engineers are serving up valid XHTML, which means that XPath expressions can still be used to extract the data we need.

If I’ve been a good little engineer then only my XPaths need to change and nothing else…

There is a danger that the XPath paths can become more than a bit baroque because they have to navigate through all the HTML markup to get to the data nodes, although there are tricks to get the XPath engine to do a lot of the searching.

Anybody looking for inspiration on how to parse the XHTML should check out these posts by a geek blooger called Kastang.  That’s the method I’ll be using when I get back to all this.

blog status

October 29, 2010

Sigh. You can see that things have gone quiet here. I’m moving on to new projects and interests. But I’m hoping to set aside a bit of time to see how… um… cataclysmic… the changes in the game are going to be for my database.

Everything is set up to keep the data pages up to date without a lot of effort on my part. So long as the armoury data doesn’t change too much, that is! If I can find a way to keep the site current I’ll do that.

But no promises as to timescale. Sorry.

updated for patch 3.3.5

August 13, 2010

I’ve refreshed all the reports over at the Google Appengine site to bring them up to date, except for the ones related to twinks and bg performance. I should have enough data to refresh them early next week.

No dramatic changes meet my eye, but alas I’m a bit busy with other things so I’ve only given the reports a quick once-over.

druid forms at last!

July 29, 2010

I love the smell of data in the morning. It smells like… victory!

Druid forms; phew! It’s taken a while. The count of moonkin and tree forms was always straightforward since these forms derive from a talent. But now, thanks to all my correspondents on the question of feral druids and thanks to the power of modern datamining packages, I’m happy that I’ve got a reasonable estimate of those level 80 feral raiders who favour bear form and those who favour cat form.

The number of unclassifiable ferals is still a bit high but that is mainly due to the number of feral druids who do not raid. Only a tiny number of feral raiders really have an equal investment in cattish and bearish talents and glyphs. There may also be roles for cats and bears in PvP, but I don’t have a solution for estimating those.

Still we’ve got most of the answers we were after.

These numbers are for patch 3.3.3 and are the percentages based on all level 80 druids. So without further ado:

Form Popularity
Moonkin 26%
Tree 40%
Feral Raiders Cat-oriented 11%
Feral Raiders Bear-oriented 17%
Unclassifiable Ferals 6%

If we estimate cats and bears as a percentage of  feral combat druids only, we get this:

Form Popularity
Feral Raiders Cat-oriented 33%
Feral Raiders Bear-oriented 50%
Unclassifiable ferals 16%

Percentages are based on active specs only, to keep things simple. That still strikes me as reasonable, over a large sample, since the percentages reflect what you’d see in-game on average.

I’ve got a couple more posts to come which explain in detail how the feral estimates are derived. And I’ll put up the data set so interested people can have a play around with it and see if there is any better way to cluster the bears and the cats. I’ll also add these tables to my druids reports over at the Google appengine site.

But that’s as good an estimate as I know how to get. And it was a fun ride getting to this point too.

the talented Mr Druid

July 28, 2010

Now that we’ve got a dataset which can give us feral druids who:

  • are consistently geared and spec-ed and
  • are serious participants in instance-running and raiding

then we can move to the next stage: trying to find ways to partition that set into bears and cats.

This is where the visualization tools in a datamining package really come into their own. We can add a third dimension to any cluster by using colour. And we can quickly iterate through all the data dimensions to see which ones produce the best clusters. In these charts I’m filtering out all ferals druids with spellpower gear and all who have run fewer than 75 instances or raids.

I’m still plotting health vs mana, to keep the charts consistent across posts, but we’re getting close to the point where we will have to find different stats to graph. We know now that mana is irrelevant and health only a partial indicator of tank-ness. But for the time being, the main cluster that results from that plot is good enough.

Now we want to know which talents can partition the cluster. (And we could ask the same question of glyphs or character stats too.) How about Primal Gore? This is the result – not a lot of partitioning going on there:

Primal Gore - not effective in clustering

Thanks to various comments, it’s clear that there are a set of talents which people expect to effectively partition the cluster. Popular suggestions have included: Thick Hide, Natural Reaction and Protector of the Pack for bears and Shredding Attacks, Predatory InstinctsKing of the Jungle, Survival Instincts and Natural Shapeshifter for cats.

Now we can look at each of those in detail:


Protector of the Pack cluster

Natural Reaction cluster

Thick Hide cluster


Survival Instincts Cluster

Shredding Attacks cluster

Predatory Instincts cluster

Natural Shapeshifter cluster

King of the Jungle cluster

You can see that some talents appear to be better than others at defining two distinct clusters. They all have a bit of partitioning effect, but some are better than others at producing the largest “distance” between the two clusters. Predatory Instincts produces clear gold and light blue clusters but Natural Shapeshifter produces more of a greenish middle ground which means that many players in both camps have put a point or two into it.

Datamining clustering algorithms work by calculating “distances” between data points along each of the data dimensions then aggregating those distance measures across all the dimensions. For example, the distance between a toon which has 3 points in Thick Hide and a toon which has zero points in the talent could be measured as “3” and then a sum of all distances could produce a measure of how distinct one toon is from another (although the algorithms generally use more sophisticated maths than just that.)

So we want the talents with the greatest distance between the two clusters. You can have a look at the charts and see which ones you think are the best ones. I’ll put up my numbers on that in the next post.

Now if we use the better of those talent dimensions as inputs to our clustering algorithm we get this:

Clusters in five talent dimensions.

The crucial thing here is that the blue cluster, which are the toons with bear-ish talents, extends right along the health x-axis. No doubt serious tanks are picking gear, gems and enchants that boost health. But since we are looking for a count of all tanks, all the way from those running 5-toon instances to those in the endgame raids, we should expect that there will be a wide spread of health between those just starting out and those nearer the end of the raiding dungeon chain.

That’s one reason why I’m about to abandon the health vs mana thing and move onto other character stats. More about that in the next post. But the reason we can make decisions like that is due to the insights that this data visualization gives us.

the truth is in there

July 20, 2010

Thanks to all the correspondents who commented on my feral druids datamining experiment. I’m happy that I’ve got a reasonable estimate for the number of bear tanks now. But I’m holding back from putting up my final word on the subject since I’m trying to encourage a couple of people to write up their own analysis first.

You may recall we left off with a simple graph of health vs mana for level 80 feral druids that produced two very distinct clusters – a red and a blue one – sorta like one of those political maps of the USA except with all the republicans and democrats clumped together in separate parts of the country.

And the key question was… um… While we’re on that subject… Can anybody explain to me why those political maps always colour the conservatives red and the liberals blue? It’s very confusing to a foreigner since just about everywhere else in the world, red is associated with the left or progressive side and blue with the Tory or conservative side.

Remember that great movie from the Reaganite ’80s? It was Red Dawn, not Blue Dawn. But I digress…

And the key question was: what were those red ferals doing at the high mana end of the scale? I had my doubts that there could be so many toons carrying mismatched specs and gear. But I’ve been convinced that, yes, there is something not quite right there. A simple filter that drops those toons in the sample with significant spellpower gear basically makes the red cluster disappear.

Now that might not sound like progress – ending up with one cluster – but don’t forget that the power of these datamining algorithms is that they cluster in multiple data “dimensions”. To the eye, there is one cluster, because we are drawing the graph in two “dimensions”: health and mana.

But as soon as we add some talent and glyph dimensions, the big blue blob starts to break up into separate clusters. And this time, there is a good match between the talents and glyphs that we expect to distinguish cats from bears and the actual location of each cluster in the multi-dimensional space.

But it’s a whole lot easier to show you than to tell you, so I’ll leave you with a simple illustration of how that all works. We can add a third dimension to the graph by using colour. The datamining packages that I’m playing with are very good at that sort of visualization, as you can see here.

With the spellpower toons gone, the high mana group has also mostly gone  and the shape of the blue cluster has become clearer as the graph scale has changed. Then we overlay, say, a cat glyph:

Feral Druids with Glyph of Shred

and a bear glyph:

Feral Druids with Glyph of Maul

and the clusters within the cluster become pretty clear. Thanks again to Narkondas for the key clues that inspired those graphs.

The datamining algorithms will generate a count of the toons in each cluster, but I’ll leave that till the next post. But as you can imagine, with a big clump of ferals filtered out, then the percentage of bear tanks in the overall mix is getting smaller.

I should also say that I’m about to collect a new data set and update my armoury reports since the data is getting a bit old and stale. As usual that will take a week or so.

practical cats

July 10, 2010

Thanks to various people for input on that last post. I’m still happy that the blue cluster represents the feral druid bear tanks. Characters in that cluster are stacking all the stats recommended by the various bear tanking blogs, including agility.

But I’m happy to admit that the red cluster is more of a mystery – something at least partly to do with cat form, but there are some oddities there.

All the druids in the sample are feral druids – the balance and resto ones were all filtered out in the database query. Undoubtedly there are druids with two specs who forget to swap gear when they swap specs, but could there really be so many? Those two clusters are pretty dense – to me they represent lots of players following a standard pattern rather than something that could be the result of  mistakes.

I’ve got a lot more charts to post on this question, but I see from the comments that I haven’t quite selected all the right stats. Let me fix that on Monday and we’ll have another look at what’s going on.

UPDATE: More interesting comments. Thanks all. Unfortunately RL affairs are diverting me for the next couple of days but I’ll get back to it as soon as I can.

Just briefly but:

I have done an analysis of talent distribution and yes, they seem to be very poor predictors of rading roles.

I agree that the red cluster may represent hybrid behaviour and I suspect now that there is possibly no way to get a sense of how many players heavily lean towards cat over bear.

Thnks especially to Narkondas who pointed out something that I hadn’t properly considered. I talked about the blobs representing players who were “stacking” certain stats. But that has to be proved; it is not a starting point. The mana available to the red blob toons may simply be the default mana values granted by the gear etc that the toon is wearing. Unfortunately the armoury picture is static and doesn’t replace mana with energy when the toon is in cat form (otherwise we’d have a foolproof way of counting cats).

UPDATE 2: D’Oh! Yes there is an error in the talents data. The database query was picking up some talents from the inactive spec. Thanks to Narkondas for spotting that. I’ve replaced the files in the previous post with corrected versions. Every entry now adds up to 71 points or less. Also I’ve now become convinced that all those high spellpower/high mana toons really are running with gear that is not ideal for feral builds – either by accident or by design. So I’ve added a spellpower column into the talents data to see if we can see any patterns there. Or the spellpower column can be used to filter out those toons that are um… trying to subvert the dominant feral paradigm…