New Computer Ranking System

Schweizerkas · Post by **Schweizerkas** » Sun Mar 09, 2008 10:04 pm

Hi,
I've been working for a little over a month on a computer ranking system that ranks individual players using a Markov chain technique. I'll give a brief description of the technique, and the present results. If you're bored by details, and just want to see the results, scroll down to the bottom of the post.

Overview

This quizbowl rating system is a Markov chain based rating, heavily based on the LRMC (Logistic Regression/Markov Chain) model used to rank college basketball teams. It uses head-to-head results between individual players to estimate the probability that Player A is better than Player B, after which these probabilities are fed into a transition matrix. Using standard Markov chain techniques, a vector of steady-state probabilities is then calculated, from which a rating for each player can be derived.

Basic Assumptions

There are several difficulties in trying to rate individual quizbowlers. Since quizbowl is by its nature a team game, we must somehow extract a player's individual performance from that of his team. The major issues are correcting for
strength of opponents and the "shadow effect" of one's teammates.

This rating tries to eliminate (or at least decrease) the effects of strength of schedule and the shadow effect by only relying on the results of head-to-head matchups. For example, assume Team 1 (with Player A, B, C, and D) plays Team 2 (with Player E, F, G, and H). Then, to calculate how good Player A is, we can compare how many points he scores to how many points E, F, G, and H score. Each of these 4 comparisons can be treated as a "match", and for each "match", Player A has a certain "points difference", i.e., Points Scored(Player A) - Points Scored (opponent). My basic assumption in this ranking is that if A has a positive points difference over an opponent, then A is "better" than that opponent. Since we are only looking at head-to-head matchups, A and B are both competing versus the same players for tossups, so this assumption should be approximately correct.

Of course this assumption is not perfect. For example, if A's teammates neg a lot, these negs will prevent A from scoring on that tossup, but will not prevent A's opponents from scoring. However, this effect should tend to average out, since A's opponents are also harmed by their own teammates negging. But if A has particularly neg-prone teammates, A could be underrated by this ranking scheme. A correction-factor for negs may be applied to this ranking in the future, but for now, I expect this correction would be small and should not effect rankings much.

A further problem with this assumption is that A's teammates may "shadow" particular subjects that are A's specialties. For example, if A is a player who is great at history, but his teammates are also great at history, he will be hurt more by this overlap than his opponents (who on average have a wide variety of knowledge). However, I doubt that any computer rating can accurately measure this effect, and even if it could, we would still have to give a (fairly subjective) answer to the question of how much a rating should value specialized knowledge versus general knowledge of the entire canon. In short, I'm not really worried about this issue.

In any case, we should note that both the negging-effect and the subject-specific-shadow-effect would probably tend to cause this rating method to slightly underrate players with good teammates, if anything. However, I would expect this underrating to be very small compared to the case if we were to simply rank players by PPG.

Calculation

In each game, Player A can have as many as four "matches." From the point difference in each match, I calculate a probability that A is better than his opponent (Player B). If you want to see details of how I calculate this probability, see this page:
http://www.stanford.edu/~blindqui/cgi-b ... theory.php

Now, if I have data for N different players, I can calculate the head-to-head results for Player i vs. Player j for every i,j < N. From this, I can calculate a probability that Player i is better than Player j for every combination of (i,j). Using this information, I then fill an NxN transition matrix "t" with transition probabilities t_ij (see the LRMC paper for a detailed explanation of how these are calculated).

What is the interpretation of this transtion matrix t? Basically, let's imagine we have an ensemble of voters (voting for the best quizbowler), who initially have some arbitrary random choices for who they think is the best quizbowler. Let's say one random voter thinks Player i is the best at time=0. Then, at time=1, with probability t_ij, he will change his vote for best player to Player j. We then iterate this procedure over and over. Of course, an individual voter will just keep on moving his vote around over and over, but the entire ensemble of voters will eventually reach a steady state distribution, regardless of their initial distribution of votes. (This is a so-called Markov chain, which is explained in many textbooks, or lots of places on the Internet). If the steady state distribution is a vector denoted p_i, where Sum(p_i)=1, then p_i gives
a measure of how good Player i is. I then apply a correction to p_i to correct for how many different opponents Player i has played, and then I renormalize the p_i so that they add up to 100 (just for aesthetic reasons). The new p_i is defined as the rating for Player i.

Sorry if this all a little hard to understand. If you want more mathematical details (and a pretty plot), see here:
http://www.stanford.edu/~blindqui/cgi-b ... theory.php
That link also contains a link to the paper on the LRMC, which explains the basic method in much more detail than I have space for.

Results (What you're all really interested in)

I've written a Python script that reads in SQBS files and does all the fancy calculations. So far, I've read in about 17 tournaments (counting mirrors as multiple tournaments), and you can see the results on this page I've set up:
http://www.stanford.edu/~blindqui/cgi-b ... s_home.php
I've only done calculations for players who've played 16 games or more. Note that the Python script currently does a poor job of parsing player's names (for example, it doesn't realize that "Jonathan from Maryland" and "Jonathan Magin from Maryland" are the same player). So, some players who have actually played in at least 16 games may not currently be in the database (for example, Magin). This will be corrected in the near future. I have to say that I think the results of this ranking system are quite reasonable. Even with a limited amount of data (not limited by the number of tournaments, but rather by the stupidity of my parser), the rankings are generally not too far off from what I think the circuit's subjective ranking would be. There are a few weird rankings, yes, but they generally seem to be for players who have only played a very small number of matches versus other players in the database, so this is almost certainly just random fluctuation due to limited statistics.

Well, let me know what you think of this ranking system. If people like it, I'll work on it some more. Actually, I'll probably work on it some more even if people hate it. If people want to look over my code, I can make the Python script available for download.

Sima Guang Hater · Post by **Sima Guang Hater** » Sun Mar 09, 2008 10:31 pm

EATEN BY THE BEAST FROSH R.I.P

May as well list out "weird" issues that you suggest:
-already pointed out Magin isn't there (which is indeed a travesty)
-Andrew and Rob of Minnesota?
-The Dart A people should be on here somewhere, especially if the Dart B people are.
-Optimus in the MIT 2007 Fall is me.

In general, I really like this ranking system, but incorporating something to counter the shadowing effects of teammates is important (I agree that the neg shadow effect cancels itself out). One way would be to have some corrective index for every game based on how many tossups your teammates either got or negged on - the larger this value, the larger the index.

aestheteboy · Post by **aestheteboy** » Sun Mar 09, 2008 10:32 pm

Jonathan Magin not on the list? The horror! The horror!

magin · Post by **magin** » Sun Mar 09, 2008 10:44 pm

aestheteboy wrote:Jonathan Magin not on the list? The horror! The horror!

If you read Brian's post, he explains that Python is currently unable to realize that "Jonathan" and "Jonathan Magin" are the same person, thus leaving me off.

Also, this looks really, really good. Now all we need is that program that records when people buzz and automatically keeps score; we should offer Jerry's library to the first person to come up with such a useful thing.

Post by **theMoMA** » Mon Mar 10, 2008 1:27 am

Seems like you could add a lot more datapoints if you used teammate-on-teammate matchups too.

Also, it really seems like there should be some kind of correction for teammates. I don't know how this would work, but it seems pretty obvious that this system is going to underrate players who play with good players.

Post by **Auks Ran Ova** » Mon Mar 10, 2008 5:00 am

555 95472 wrote:-Andrew and Rob of Minnesota?

Gautam and I are actually the missing ones.

pray for elves · Post by **pray for elves** » Mon Mar 10, 2008 10:55 am

Strangely enough, it's claimed that Penn Bowl 2008 prelim stats are included, but it only has me for ACF Fall and ACF Regionals -- I guess that's because I'm "Evan Nagler (UG)" in the Penn Bowl stats rather than just Evan.

Basically, I think you'll have to edit names by hand to match them up.

Schweizerkas · Post by **Schweizerkas** » Tue Mar 11, 2008 3:26 am

Due to great demand, I added Jonathan to the rankings. You can see the rankings are still fairly volatile due to the small statistics: the force of Jonathan's massive gravitational field was strong enough to propel Seth Teitler ahead of Mike Sorice into second place (you can see that Seth outscored Jonathan in their head-to-head matchup at CC, while Jonathan outscored Mike in their only head-to-head match). I also added Eric's Optimus alias to the database.

Yeah, Penn Bowl's stats aren't being read properly due to the (UG) markings and similar stuff. I'll try to get this, and the other issues with player's names not being recognized, sorted out in the next few weeks, although I'm not making any promises.

theMoMA wrote:Seems like you could add a lot more datapoints if you used teammate-on-teammate matchups too.

I was originally planning on doing this, but I decided against it. The reason I don't like the idea is that your ranking will then be hugely weighted according to your performance relative to your teammates. I could see this being a problem if you happen to only play with a certain teammate at some tournament that happens to favor that teammate's strengths (say, an NAQT tournament). Then, due to having 12 or so matchups versus that one teammate in that tournament, you're virtually guaranteed to be ranked below him, even if you might be a better player in most other tournament styles/difficulties. I think by not using teammate-on-teammate matchups, you'll get a fairer blend of statistics. I can see the desire for more datapoints, but actually, once I get my parser working properly, I think we'll see that there are plenty of datapoints available.

Sima Guang Hater · Post by **Sima Guang Hater** » Tue Mar 11, 2008 11:10 am

Schweizerkas wrote:The reason I don't like the idea is that your ranking will then be hugely weighted according to your performance relative to your teammates. I could see this being a problem if you happen to only play with a certain teammate at some tournament that happens to favor that teammate's strengths (say, an NAQT tournament).

Shouldn't be using NAQT tournaments in these rankings. In general sticking to tournaments of normal style (ie not TTGT11) and at least baseline competency in writing (ACF events, TIT, CC, what have you) would be best, since performance relative to teammates is fairly constant across them. At least this is true in my experience.

Schweizerkas wrote:Then, due to having 12 or so matchups versus that one teammate in that tournament, you're virtually guaranteed to be ranked below him, even if you might be a better player in most other tournament styles/difficulties. I think by not using teammate-on-teammate matchups, you'll get a fairer blend of statistics. I can see the desire for more datapoints, but actually, once I get my parser working properly, I think we'll see that there are plenty of datapoints available.

Once again, this effect averages out; if you want you could just introduce the correction for teammates that pair up for more than one tournament, this would pretty easily solve the problem you suggest. You could also count teammate-on-teammate match-ups less than match-ups against opponents.

If you have the time, it would be interesting to see the results of both models and compare; there's always the chance that correcting for shadowing just makes everything worse.

cdcarter · Post by **cdcarter** » Sun Apr 06, 2008 12:42 pm

Will the script ever be released?

The Quizbowl Resource Center

New Computer Ranking System

New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System

Re: New Computer Ranking System