druid cats and bears again
March 2, 2010
This is a guest post by Darush.
A while back Zardoz posted an estimate of feral druid builds (bears vs. cats). The original analysis was based on human expertise: bears usually choose certain talents while cats choose others. However, one can take another approach. There are inherent differences in the builds for each spec, and computational methods that know nothing about World of Warcraft might able to identify these differences by examining the raw data. Armed with Matlab and some free time, I tried to explore the great kitty question.
A quick note about myself: I have been playing World of Warcraft since Wrath of the Lich King was released. I currently have two characters on Cairne, Darush (a hunter) and Azabroth (a priest). Outside of the game, I’m a PhD student in computational biology, and my thesis deals with identifying subpopulations in cell data. Interestingly, one can easily replace “cell data” with the druid data Zardoz mined from the armory.
For those who would just like the executive summary: Out of the data Zardoz sent me, 27% of the toons could be bears and 65% could be cats. Be aware that there are some ifs-and-buts associated with these numbers and they should be treated with due caution.
Now for the details:
I started with a very large table that included the agility, stamina, dodge rating, and talent points allocation for 28,970 toons who had a Feral build (either active or not). I initially wanted to assess how well agility and stamina predict whether a toon is a cat or a bear. Plotting stamina vs. agility, I see the following:
First of all, notice that annoying flat line at the bottom. These are toons with low agility, which means they’re not wearing feral gear in their feral spec. Since I don’t want to confuse my algorithm, I removed these. I ended up with 20,365 toons and the following plot:
Two separate populations emerge. There are the high stamina/low agility toons, which I’m guessing are mostly bears, and the high agility/low stamina toons, which are probably mostly cats. Unfortunately, there is a huge overlap, especially in the lower attributes range. It appears that stamina and agility will not be enough to decide who is what.
Now, for some magic! I used an algorithm called PCA (principal component analysis). In a nutshell, PCA attempts to score the variability in the data. The algorithm takes a long list of numbers (in this case, I used the talent builds for each toon) and outputs a series of scores for each toon. Each number in the series is a component; the first number for each toon is the first component, the second is the second component, etc. Intuitively, PCA compresses the data by discarding less variable information.
(Please notice that this is a very hand wavy explanation. Apologies to all mathematicians, physicists and computer scientists in the crowd.)
After running PCA using just the talent specs, I looked at the first and second components, and ended up with this plot:
We can clearly see two populations. I guessed that one was bears, the other cats. Usually at this step I will run another fancy algorithm to actually identify these automatically, and score how different they are; such algorithms are called clustering algorithms, and each group is a cluster. I have chosen the lazier path for the purpose of this analysis, and decided on two arbitrary lines:
The top green cluster has 5,571 toons. The bottom one has 13,226. There are 1,568 blue toons, which are undecided (7.6% of data).
It’s time for a guessing game. I am guessing one of these groups is bears and the other is cats. It is quite possible that I am completely mistaken, but since this is one of the major differences between feral druids (and the initial purpose of my analysis), I decided to give it a shot. If I am correct, then which one of these is the bear cluster, and which one is the cat cluster? I decided to do some more speculation. Let’s examine the stamina vs. agility plot again:
It appears that toons around black line #1 are mostly bears and toons around black line 2 are mostly cats. Again, as mentioned previously, there is a huge overlap. Fortunately, the overlap ends at some point. We can guess that points to the right of red line 1 are “definitely bears”, since they have very high stamina, while points above red line 2 are “definitely cats”, since they have very high agility.
What shall we do next? On one hand, from the PCA analysis (only talent spec) we have the top green cluster (which I called c1) and the bottom green cluster (c2). From the stamina vs. agility plot, we have the “definitely bear” and the “definitely cat” toons. We can now ask the following four questions:
- Does C1 have many “definitely bears”?
- Does C1 have many “definitely cats”?
- Does C2 have many “definitely bears”?
- Does C2 have many “definitely cats”?
We answer these using a statistical test called a hypergeometric test, also called a one-tailed Fisher’s exact test. The answers are yes, no, no, yes. Therefore, we can say that c1 is highly enriched for bears while c2 is highly enriched for cats.
(For the statistically inclined, all four p-values are lower than 10-20)
It is time for some truth in advertising. If I will present my thesis adviser with this analysis, she will probably hang me, rez me, hang me again, and then /gkick me out of my PhD program. There is much unsubstantiated guesswork involved, a mix of rules of thumb and hunches. The good news is that Zardoz manually examined some of the toons in c1 and c2, and did not identify any contradictions with my analysis.
And, of course, both of us will be glad to hear any remarks and comments.