New Computer Ranking System

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
New Computer Ranking System
Hi,
I've been working for a little over a month on a computer ranking system that ranks individual players using a Markov chain technique. I'll give a brief description of the technique, and the present results. If you're bored by details, and just want to see the results, scroll down to the bottom of the post.
Overview
This quizbowl rating system is a Markov chain based rating, heavily based on the LRMC (Logistic Regression/Markov Chain) model used to rank college basketball teams. It uses headtohead results between individual players to estimate the probability that Player A is better than Player B, after which these probabilities are fed into a transition matrix. Using standard Markov chain techniques, a vector of steadystate probabilities is then calculated, from which a rating for each player can be derived.
Basic Assumptions
There are several difficulties in trying to rate individual quizbowlers. Since quizbowl is by its nature a team game, we must somehow extract a player's individual performance from that of his team. The major issues are correcting for
strength of opponents and the "shadow effect" of one's teammates.
This rating tries to eliminate (or at least decrease) the effects of strength of schedule and the shadow effect by only relying on the results of headtohead matchups. For example, assume Team 1 (with Player A, B, C, and D) plays Team 2 (with Player E, F, G, and H). Then, to calculate how good Player A is, we can compare how many points he scores to how many points E, F, G, and H score. Each of these 4 comparisons can be treated as a "match", and for each "match", Player A has a certain "points difference", i.e., Points Scored(Player A)  Points Scored (opponent). My basic assumption in this ranking is that if A has a positive points difference over an opponent, then A is "better" than that opponent. Since we are only looking at headtohead matchups, A and B are both competing versus the same players for tossups, so this assumption should be approximately correct.
Of course this assumption is not perfect. For example, if A's teammates neg a lot, these negs will prevent A from scoring on that tossup, but will not prevent A's opponents from scoring. However, this effect should tend to average out, since A's opponents are also harmed by their own teammates negging. But if A has particularly negprone teammates, A could be underrated by this ranking scheme. A correctionfactor for negs may be applied to this ranking in the future, but for now, I expect this correction would be small and should not effect rankings much.
A further problem with this assumption is that A's teammates may "shadow" particular subjects that are A's specialties. For example, if A is a player who is great at history, but his teammates are also great at history, he will be hurt more by this overlap than his opponents (who on average have a wide variety of knowledge). However, I doubt that any computer rating can accurately measure this effect, and even if it could, we would still have to give a (fairly subjective) answer to the question of how much a rating should value specialized knowledge versus general knowledge of the entire canon. In short, I'm not really worried about this issue.
In any case, we should note that both the neggingeffect and the subjectspecificshadoweffect would probably tend to cause this rating method to slightly underrate players with good teammates, if anything. However, I would expect this underrating to be very small compared to the case if we were to simply rank players by PPG.
Calculation
In each game, Player A can have as many as four "matches." From the point difference in each match, I calculate a probability that A is better than his opponent (Player B). If you want to see details of how I calculate this probability, see this page:
http://www.stanford.edu/~blindqui/cgib ... theory.php
Now, if I have data for N different players, I can calculate the headtohead results for Player i vs. Player j for every i,j < N. From this, I can calculate a probability that Player i is better than Player j for every combination of (i,j). Using this information, I then fill an NxN transition matrix "t" with transition probabilities t_ij (see the LRMC paper for a detailed explanation of how these are calculated).
What is the interpretation of this transtion matrix t? Basically, let's imagine we have an ensemble of voters (voting for the best quizbowler), who initially have some arbitrary random choices for who they think is the best quizbowler. Let's say one random voter thinks Player i is the best at time=0. Then, at time=1, with probability t_ij, he will change his vote for best player to Player j. We then iterate this procedure over and over. Of course, an individual voter will just keep on moving his vote around over and over, but the entire ensemble of voters will eventually reach a steady state distribution, regardless of their initial distribution of votes. (This is a socalled Markov chain, which is explained in many textbooks, or lots of places on the Internet). If the steady state distribution is a vector denoted p_i, where Sum(p_i)=1, then p_i gives
a measure of how good Player i is. I then apply a correction to p_i to correct for how many different opponents Player i has played, and then I renormalize the p_i so that they add up to 100 (just for aesthetic reasons). The new p_i is defined as the rating for Player i.
Sorry if this all a little hard to understand. If you want more mathematical details (and a pretty plot), see here:
http://www.stanford.edu/~blindqui/cgib ... theory.php
That link also contains a link to the paper on the LRMC, which explains the basic method in much more detail than I have space for.
Results (What you're all really interested in)
I've written a Python script that reads in SQBS files and does all the fancy calculations. So far, I've read in about 17 tournaments (counting mirrors as multiple tournaments), and you can see the results on this page I've set up:
http://www.stanford.edu/~blindqui/cgib ... s_home.php
I've only done calculations for players who've played 16 games or more. Note that the Python script currently does a poor job of parsing player's names (for example, it doesn't realize that "Jonathan from Maryland" and "Jonathan Magin from Maryland" are the same player). So, some players who have actually played in at least 16 games may not currently be in the database (for example, Magin). This will be corrected in the near future. I have to say that I think the results of this ranking system are quite reasonable. Even with a limited amount of data (not limited by the number of tournaments, but rather by the stupidity of my parser), the rankings are generally not too far off from what I think the circuit's subjective ranking would be. There are a few weird rankings, yes, but they generally seem to be for players who have only played a very small number of matches versus other players in the database, so this is almost certainly just random fluctuation due to limited statistics.
Well, let me know what you think of this ranking system. If people like it, I'll work on it some more. Actually, I'll probably work on it some more even if people hate it. If people want to look over my code, I can make the Python script available for download.
I've been working for a little over a month on a computer ranking system that ranks individual players using a Markov chain technique. I'll give a brief description of the technique, and the present results. If you're bored by details, and just want to see the results, scroll down to the bottom of the post.
Overview
This quizbowl rating system is a Markov chain based rating, heavily based on the LRMC (Logistic Regression/Markov Chain) model used to rank college basketball teams. It uses headtohead results between individual players to estimate the probability that Player A is better than Player B, after which these probabilities are fed into a transition matrix. Using standard Markov chain techniques, a vector of steadystate probabilities is then calculated, from which a rating for each player can be derived.
Basic Assumptions
There are several difficulties in trying to rate individual quizbowlers. Since quizbowl is by its nature a team game, we must somehow extract a player's individual performance from that of his team. The major issues are correcting for
strength of opponents and the "shadow effect" of one's teammates.
This rating tries to eliminate (or at least decrease) the effects of strength of schedule and the shadow effect by only relying on the results of headtohead matchups. For example, assume Team 1 (with Player A, B, C, and D) plays Team 2 (with Player E, F, G, and H). Then, to calculate how good Player A is, we can compare how many points he scores to how many points E, F, G, and H score. Each of these 4 comparisons can be treated as a "match", and for each "match", Player A has a certain "points difference", i.e., Points Scored(Player A)  Points Scored (opponent). My basic assumption in this ranking is that if A has a positive points difference over an opponent, then A is "better" than that opponent. Since we are only looking at headtohead matchups, A and B are both competing versus the same players for tossups, so this assumption should be approximately correct.
Of course this assumption is not perfect. For example, if A's teammates neg a lot, these negs will prevent A from scoring on that tossup, but will not prevent A's opponents from scoring. However, this effect should tend to average out, since A's opponents are also harmed by their own teammates negging. But if A has particularly negprone teammates, A could be underrated by this ranking scheme. A correctionfactor for negs may be applied to this ranking in the future, but for now, I expect this correction would be small and should not effect rankings much.
A further problem with this assumption is that A's teammates may "shadow" particular subjects that are A's specialties. For example, if A is a player who is great at history, but his teammates are also great at history, he will be hurt more by this overlap than his opponents (who on average have a wide variety of knowledge). However, I doubt that any computer rating can accurately measure this effect, and even if it could, we would still have to give a (fairly subjective) answer to the question of how much a rating should value specialized knowledge versus general knowledge of the entire canon. In short, I'm not really worried about this issue.
In any case, we should note that both the neggingeffect and the subjectspecificshadoweffect would probably tend to cause this rating method to slightly underrate players with good teammates, if anything. However, I would expect this underrating to be very small compared to the case if we were to simply rank players by PPG.
Calculation
In each game, Player A can have as many as four "matches." From the point difference in each match, I calculate a probability that A is better than his opponent (Player B). If you want to see details of how I calculate this probability, see this page:
http://www.stanford.edu/~blindqui/cgib ... theory.php
Now, if I have data for N different players, I can calculate the headtohead results for Player i vs. Player j for every i,j < N. From this, I can calculate a probability that Player i is better than Player j for every combination of (i,j). Using this information, I then fill an NxN transition matrix "t" with transition probabilities t_ij (see the LRMC paper for a detailed explanation of how these are calculated).
What is the interpretation of this transtion matrix t? Basically, let's imagine we have an ensemble of voters (voting for the best quizbowler), who initially have some arbitrary random choices for who they think is the best quizbowler. Let's say one random voter thinks Player i is the best at time=0. Then, at time=1, with probability t_ij, he will change his vote for best player to Player j. We then iterate this procedure over and over. Of course, an individual voter will just keep on moving his vote around over and over, but the entire ensemble of voters will eventually reach a steady state distribution, regardless of their initial distribution of votes. (This is a socalled Markov chain, which is explained in many textbooks, or lots of places on the Internet). If the steady state distribution is a vector denoted p_i, where Sum(p_i)=1, then p_i gives
a measure of how good Player i is. I then apply a correction to p_i to correct for how many different opponents Player i has played, and then I renormalize the p_i so that they add up to 100 (just for aesthetic reasons). The new p_i is defined as the rating for Player i.
Sorry if this all a little hard to understand. If you want more mathematical details (and a pretty plot), see here:
http://www.stanford.edu/~blindqui/cgib ... theory.php
That link also contains a link to the paper on the LRMC, which explains the basic method in much more detail than I have space for.
Results (What you're all really interested in)
I've written a Python script that reads in SQBS files and does all the fancy calculations. So far, I've read in about 17 tournaments (counting mirrors as multiple tournaments), and you can see the results on this page I've set up:
http://www.stanford.edu/~blindqui/cgib ... s_home.php
I've only done calculations for players who've played 16 games or more. Note that the Python script currently does a poor job of parsing player's names (for example, it doesn't realize that "Jonathan from Maryland" and "Jonathan Magin from Maryland" are the same player). So, some players who have actually played in at least 16 games may not currently be in the database (for example, Magin). This will be corrected in the near future. I have to say that I think the results of this ranking system are quite reasonable. Even with a limited amount of data (not limited by the number of tournaments, but rather by the stupidity of my parser), the rankings are generally not too far off from what I think the circuit's subjective ranking would be. There are a few weird rankings, yes, but they generally seem to be for players who have only played a very small number of matches versus other players in the database, so this is almost certainly just random fluctuation due to limited statistics.
Well, let me know what you think of this ranking system. If people like it, I'll work on it some more. Actually, I'll probably work on it some more even if people hate it. If people want to look over my code, I can make the Python script available for download.
Brian
Stanford University
Stanford University
 Sima Guang Hater
 Auron
 Posts: 1879
 Joined: Mon Feb 05, 2007 1:43 pm
 Location: Nashville, TN
Re: New Computer Ranking System
EATEN BY THE BEAST FROSH R.I.P
May as well list out "weird" issues that you suggest:
already pointed out Magin isn't there (which is indeed a travesty)
Andrew and Rob of Minnesota?
The Dart A people should be on here somewhere, especially if the Dart B people are.
Optimus in the MIT 2007 Fall is me.
In general, I really like this ranking system, but incorporating something to counter the shadowing effects of teammates is important (I agree that the neg shadow effect cancels itself out). One way would be to have some corrective index for every game based on how many tossups your teammates either got or negged on  the larger this value, the larger the index.
May as well list out "weird" issues that you suggest:
already pointed out Magin isn't there (which is indeed a travesty)
Andrew and Rob of Minnesota?
The Dart A people should be on here somewhere, especially if the Dart B people are.
Optimus in the MIT 2007 Fall is me.
In general, I really like this ranking system, but incorporating something to counter the shadowing effects of teammates is important (I agree that the neg shadow effect cancels itself out). One way would be to have some corrective index for every game based on how many tossups your teammates either got or negged on  the larger this value, the larger the index.
Last edited by Sima Guang Hater on Mon Mar 10, 2008 12:08 am, edited 2 times in total.
Eric Mukherjee, MD PhD
Brown University, 2009
Perelman School of Medicine at the University of Pennsylvania, 2018
Medicine Intern, YaleWaterbury, 20189
Dermatology Resident, Vanderbilt University Medical Center, 2019
Writer, NAQT, NHBB, IQBT, ACF, PACE
Brown University, 2009
Perelman School of Medicine at the University of Pennsylvania, 2018
Medicine Intern, YaleWaterbury, 20189
Dermatology Resident, Vanderbilt University Medical Center, 2019
Writer, NAQT, NHBB, IQBT, ACF, PACE
 aestheteboy
 Tidus
 Posts: 570
 Joined: Sat May 13, 2006 5:07 pm
Re: New Computer Ranking System
Jonathan Magin not on the list? The horror! The horror!
Daichi  Walter Johnson; Vanderbilt; U of Chicago.
Daichi's Law of High School Quizbowl: the frequency of posting in the Quizbowl Resource Center is proportional to the likelihood of being overrated.
Daichi's Law of High School Quizbowl: the frequency of posting in the Quizbowl Resource Center is proportional to the likelihood of being overrated.
Re: New Computer Ranking System
If you read Brian's post, he explains that Python is currently unable to realize that "Jonathan" and "Jonathan Magin" are the same person, thus leaving me off.aestheteboy wrote:Jonathan Magin not on the list? The horror! The horror!
Also, this looks really, really good. Now all we need is that program that records when people buzz and automatically keeps score; we should offer Jerry's library to the first person to come up with such a useful thing.
Jonathan Magin
Montgomery Blair HS '04, University of Maryland '08
Editor: ACF
"noted difficulty controller"
Montgomery Blair HS '04, University of Maryland '08
Editor: ACF
"noted difficulty controller"
Re: New Computer Ranking System
Seems like you could add a lot more datapoints if you used teammateonteammate matchups too.
Also, it really seems like there should be some kind of correction for teammates. I don't know how this would work, but it seems pretty obvious that this system is going to underrate players who play with good players.
Also, it really seems like there should be some kind of correction for teammates. I don't know how this would work, but it seems pretty obvious that this system is going to underrate players who play with good players.
Andrew Hart
Minnesota alum
Minnesota alum
 Auks Ran Ova
 Forums Staff: Chief Administrator
 Posts: 4121
 Joined: Sun Apr 30, 2006 10:28 pm
 Location: Minneapolis
 Contact:
Re: New Computer Ranking System
Gautam and I are actually the missing ones.555 95472 wrote:Andrew and Rob of Minnesota?
Rob Carson
University of Minnesota '11, MCTC '??
Member, ACF
Member, PACE
Writer and Editor, NAQT
University of Minnesota '11, MCTC '??
Member, ACF
Member, PACE
Writer and Editor, NAQT
 pray for elves
 Auron
 Posts: 1048
 Joined: Thu Aug 24, 2006 5:58 pm
 Location: 20001
Re: New Computer Ranking System
Strangely enough, it's claimed that Penn Bowl 2008 prelim stats are included, but it only has me for ACF Fall and ACF Regionals  I guess that's because I'm "Evan Nagler (UG)" in the Penn Bowl stats rather than just Evan.
Basically, I think you'll have to edit names by hand to match them up.
Basically, I think you'll have to edit names by hand to match them up.
Evan
Georgetown Law Alum, Brandeis Alum, Oak Ridge High Alum
ExPACE, ExACF
Georgetown Law Alum, Brandeis Alum, Oak Ridge High Alum
ExPACE, ExACF

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: New Computer Ranking System
Due to great demand, I added Jonathan to the rankings. You can see the rankings are still fairly volatile due to the small statistics: the force of Jonathan's massive gravitational field was strong enough to propel Seth Teitler ahead of Mike Sorice into second place (you can see that Seth outscored Jonathan in their headtohead matchup at CC, while Jonathan outscored Mike in their only headtohead match). I also added Eric's Optimus alias to the database.
Yeah, Penn Bowl's stats aren't being read properly due to the (UG) markings and similar stuff. I'll try to get this, and the other issues with player's names not being recognized, sorted out in the next few weeks, although I'm not making any promises.
Yeah, Penn Bowl's stats aren't being read properly due to the (UG) markings and similar stuff. I'll try to get this, and the other issues with player's names not being recognized, sorted out in the next few weeks, although I'm not making any promises.
I was originally planning on doing this, but I decided against it. The reason I don't like the idea is that your ranking will then be hugely weighted according to your performance relative to your teammates. I could see this being a problem if you happen to only play with a certain teammate at some tournament that happens to favor that teammate's strengths (say, an NAQT tournament). Then, due to having 12 or so matchups versus that one teammate in that tournament, you're virtually guaranteed to be ranked below him, even if you might be a better player in most other tournament styles/difficulties. I think by not using teammateonteammate matchups, you'll get a fairer blend of statistics. I can see the desire for more datapoints, but actually, once I get my parser working properly, I think we'll see that there are plenty of datapoints available.theMoMA wrote:Seems like you could add a lot more datapoints if you used teammateonteammate matchups too.
Brian
Stanford University
Stanford University
 Sima Guang Hater
 Auron
 Posts: 1879
 Joined: Mon Feb 05, 2007 1:43 pm
 Location: Nashville, TN
Re: New Computer Ranking System
Shouldn't be using NAQT tournaments in these rankings. In general sticking to tournaments of normal style (ie not TTGT11) and at least baseline competency in writing (ACF events, TIT, CC, what have you) would be best, since performance relative to teammates is fairly constant across them. At least this is true in my experience.Schweizerkas wrote:The reason I don't like the idea is that your ranking will then be hugely weighted according to your performance relative to your teammates. I could see this being a problem if you happen to only play with a certain teammate at some tournament that happens to favor that teammate's strengths (say, an NAQT tournament).
Once again, this effect averages out; if you want you could just introduce the correction for teammates that pair up for more than one tournament, this would pretty easily solve the problem you suggest. You could also count teammateonteammate matchups less than matchups against opponents.Schweizerkas wrote:Then, due to having 12 or so matchups versus that one teammate in that tournament, you're virtually guaranteed to be ranked below him, even if you might be a better player in most other tournament styles/difficulties. I think by not using teammateonteammate matchups, you'll get a fairer blend of statistics. I can see the desire for more datapoints, but actually, once I get my parser working properly, I think we'll see that there are plenty of datapoints available.
If you have the time, it would be interesting to see the results of both models and compare; there's always the chance that correcting for shadowing just makes everything worse.
Eric Mukherjee, MD PhD
Brown University, 2009
Perelman School of Medicine at the University of Pennsylvania, 2018
Medicine Intern, YaleWaterbury, 20189
Dermatology Resident, Vanderbilt University Medical Center, 2019
Writer, NAQT, NHBB, IQBT, ACF, PACE
Brown University, 2009
Perelman School of Medicine at the University of Pennsylvania, 2018
Medicine Intern, YaleWaterbury, 20189
Dermatology Resident, Vanderbilt University Medical Center, 2019
Writer, NAQT, NHBB, IQBT, ACF, PACE
Re: New Computer Ranking System
Will the script ever be released?
Christian Carter
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)