the talented Mr Druid
July 28, 2010
Now that we’ve got a dataset which can give us feral druids who:
- are consistently geared and spec-ed and
- are serious participants in instance-running and raiding
then we can move to the next stage: trying to find ways to partition that set into bears and cats.
This is where the visualization tools in a datamining package really come into their own. We can add a third dimension to any cluster by using colour. And we can quickly iterate through all the data dimensions to see which ones produce the best clusters. In these charts I’m filtering out all ferals druids with spellpower gear and all who have run fewer than 75 instances or raids.
I’m still plotting health vs mana, to keep the charts consistent across posts, but we’re getting close to the point where we will have to find different stats to graph. We know now that mana is irrelevant and health only a partial indicator of tank-ness. But for the time being, the main cluster that results from that plot is good enough.
Now we want to know which talents can partition the cluster. (And we could ask the same question of glyphs or character stats too.) How about Primal Gore? This is the result – not a lot of partitioning going on there:
Thanks to various comments, it’s clear that there are a set of talents which people expect to effectively partition the cluster. Popular suggestions have included: Thick Hide, Natural Reaction and Protector of the Pack for bears and Shredding Attacks, Predatory Instincts, King of the Jungle, Survival Instincts and Natural Shapeshifter for cats.
Now we can look at each of those in detail:
You can see that some talents appear to be better than others at defining two distinct clusters. They all have a bit of partitioning effect, but some are better than others at producing the largest “distance” between the two clusters. Predatory Instincts produces clear gold and light blue clusters but Natural Shapeshifter produces more of a greenish middle ground which means that many players in both camps have put a point or two into it.
Datamining clustering algorithms work by calculating “distances” between data points along each of the data dimensions then aggregating those distance measures across all the dimensions. For example, the distance between a toon which has 3 points in Thick Hide and a toon which has zero points in the talent could be measured as “3″ and then a sum of all distances could produce a measure of how distinct one toon is from another (although the algorithms generally use more sophisticated maths than just that.)
So we want the talents with the greatest distance between the two clusters. You can have a look at the charts and see which ones you think are the best ones. I’ll put up my numbers on that in the next post.
Now if we use the better of those talent dimensions as inputs to our clustering algorithm we get this:
The crucial thing here is that the blue cluster, which are the toons with bear-ish talents, extends right along the health x-axis. No doubt serious tanks are picking gear, gems and enchants that boost health. But since we are looking for a count of all tanks, all the way from those running 5-toon instances to those in the endgame raids, we should expect that there will be a wide spread of health between those just starting out and those nearer the end of the raiding dungeon chain.
That’s one reason why I’m about to abandon the health vs mana thing and move onto other character stats. More about that in the next post. But the reason we can make decisions like that is due to the insights that this data visualization gives us.