feral druid forms yet again
July 9, 2010
A long, long time ago on a website far, far away there appeared a post which argued that feral druid bear tanks were in short supply. The article also made the perfectly reasonable point that armoury datamining sites had no data on the popularity of the various druid forms.
Now, being the kind of nerd who likes a challenge (is there any other kind?) I couldn’t let that one go past. But how to solve the problem? Druids get some of their forms through talents; easy enough to get a count of the toons invested in those talents. But cats and bears were not so straightforward.
As an old SQL hacker of the in Codd we trust school, my first cunning plan to get the numbers on feral druid bear tanks basically went like this:
- Mine data
- SELECT level 80 feral druids FROM toons
- GROUP BY ???
Unfortunately, that didn’t work so well, for a number of reasons. I now understand one very interesting reason why it didn’t work: the feral talents that we expected to use to identify cats and bears are not actually distributed that way. But more on that later.
The solution involved spending some time getting up to speed with more sophisticated datamining algorithms. These algorithms are also based on a sort of GROUP BY principle, but are capable of grouping (or “clustering”, as the datamining jargon has it) across multiple data dimensions. They can easily handle the 85 talents in the three druid trees and, in essence, can group samples of druids into clusters of toons in an 85-dimensional space. Alternatively, we can cluster on character stats – health, mana, strength, agi etc – or on any combination of talents, stats and playstyle numbers that interest us.
They also use calculus techniques to find the borders of each cluster in a way that can tolerate outliers. This is important in real life data, but also important in WoW data where there is always a small but significant number of players who insist on being… individuals…
Up until a few days ago, I was expecting to have to put off working on the druid forms question until I had a really good understanding of these algorithms. But that is not necessary, for the simple reason that the data we are dealing with is not all that complex. Consider the following graph:
Here we have selected for toons that have a history of running instances and raid dungeons, and have filtered out the toons that are serious PvPers. What we have are two groups that in essence are clustering themselves – no real datamining required.
The bottom, horizontal, group is emphasizing health over mana. This group is stacking stamina, agility, armour and dodge (I’ll prove all those things in the next post) and selecting some of the talents we’d expect for bears. In other words, Tanking 101. The top group is selecting for mana over health and is stacking intellect and spirit. This group has taken some of the cat talents too, although as we’ll see in the next post, talents do not seem to be a great predictor of role.
That graph was generated from a 500-toon data set. So we’re ready to cluster and count the full sample. And voilà:
You can see there are some outliers, but the bulk of the sample falls neatly into the two clusters. What you can’t see properly from the chart is that the blue tank cluster is in fact more populous than the red DPSers. It’s just that their health and mana stats don’t vary much, so the cluster is more dense. That’s where a clustering algorithm is needed to get a count of the population of each blob.
And the answer? There are 13, 187 level 80 feral druids in the sample who have done more than 75 instances and raids and have done no arenas. That’s my (s0mewhat generous) working definition of a PvE raider. It’s also a problematic definition because the arena stats are historical – they don’t prove that the toon was not geared up for raiding when the armoury snapshot was taken.
Of those raiders, 60% are in the blue tank cluster and 40% are in the red DPS cluster. So that’s one useful piece of information: feral druid raiders do seem to prefer to tank rather than DPS by a narrow majority.
But the PvE raiders are less than 1/2 of the total sample. So, the worst case scenario is that on any given day, only 30% of level 80 feral druids are set up for tanking (although, to repeat, it is not likely that every arena player is geared for PvP all the time).
The data is from patch 3.3.3.
Then there is the question of effective tanks. Some of those blue crosses in the bottom left hand corner of the chart are probably not seriously gearing up for very much at all. That will be the subject of the next post, when we will use some of the wonderful data visualization tools in these datamining packages to look much more closely into the dark heart of that big blue blob.
Meanwhile, if you’d like to play around with the data for yourself, here are the data sets I’m using. I’ve got a small set of feral talent builds, a larger set of builds and a large set of character stats. Each data set contains counts of instances and raids, battlegrounds and arenas played so you can filter on the raiders.
NB these data sets have been corrected and updated on 12 July. If you downloaded them before that, apologies for the error, and please download them again: