A New, Complicated(!) Stat(?)

Old college threads.
Locked
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

A New, Complicated(!) Stat(?)

Post by cvdwightw »

This is the only fruitful part of a thought process that started with the idea that tossup conversion data is just a "masked" signal of a team's true ability, so all we'd have to do is figure out how much masking (from your negs and the other team's buzzes) is present, and we can transform back to a "threshold" of knowledge.

We compute two data sets as a function of tossup conversion rate. The first data set is whether the tossup was answered in the room with a certain team, and the second is whether the tossup was answered by that team. We can then run these through a binary logistic model and get two equations of the form P = (e^(a+bx))/(1+e^(a+bx)), where P is probability, a and b are coefficients, and x is the tossup conversion rate. By Bayes' theorem, P(T|R) = P(R|T)*P(T)/P(R), where R is the event "answered in room" and T is the event "answered by team." It should be obvious that P(R|T) = 1, so P(T|R) = P(T)/P(R).

For any tossup conversion rate x, then, P(T|R) = (e^(a_t+b_t*x)+e^(a_t+a_r+(b_t+b_r)*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). The amount of overall "masking" is then the probability that the other team got the question, given that someone in the room got it, which is just 1-P(T|R) = (e^(a_r+b_r*x)-e^(a_t+b_t*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). It would then make sense that your "unmasked" signal would be the probability that someone in the room got the question ("observed signal") minus the probability that given that someone got the question, it was the other team ("known masker signal"); or, P(R) - (1-P(T|R)), which is ridiculously complicated if you try to combine fractions so I'll just leave it as (e^(a_r+b_r*x))/(1+e^(a_r+b_r*x))-(e^(a_r+b_r*x)-e^(a_t+b_t*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). These "unmasked" values can then be computed for any tossup conversion rate x - that is, given an opposing team of empty chairs, it finds the probability that your team will answer a tossup that is converted at a certain rate.

It obviously doesn't take negs into account (the model can just be applied to tossups that are not negged, although that would result in excess complexity), but I think this has a significantly-greater-than-zero chance of working. Unfortunately, I don't have a complete set of data for any single tournament (I can see if I still have the Gaddis scoresheets), so I haven't had a chance to test this idea out. Obviously, questions/comments/criticism/etc. are welcome.

Apparently this page can perform (binary) logistic regression for you.

I had an even longer post about doing the same thing with bonus conversion, but that would require an ordinal logistic model, which is more complicated (still need a statistics package, e.g. R with the "Design" library, and lots of other assumptions you have to check).
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
User avatar
Sima Guang Hater
Auron
Posts: 1965
Joined: Mon Feb 05, 2007 1:43 pm
Location: Nashville, TN

Re: A New, Complicated(!) Stat(?)

Post by Sima Guang Hater »

Hey, for once, I'd like to be able to understand your math (and I think I'm capable of it). Can you LaTeX this and send it to me as a pdf.
Eric Mukherjee, MD PhD
Brown 2009, Penn Med 2018
Instructor/Attending Physician/Postdoctoral Fellow, Vanderbilt University Medical Center
Coach, University School of Nashville

“The next generation will always surpass the previous one. It’s one of the never-ending cycles in life.”
Support the Stevens-Johnson Syndrome Foundation
User avatar
Mechanical Beasts
Banned Cheater
Posts: 5673
Joined: Thu Jun 08, 2006 10:50 pm

Re: A New, Complicated(!) Stat(?)

Post by Mechanical Beasts »

The Quest for the Historical Mukherjesus wrote:Hey, for once, I'd like to be able to understand your math (and I think I'm capable of it). Can you LaTeX this and send it to me as a pdf.
Ditto.
Andrew Watkins
User avatar
The Friar
Wakka
Posts: 159
Joined: Fri Jul 10, 2009 2:39 pm

Re: A New, Complicated(!) Stat(?)

Post by The Friar »

Arsenoff (2009) has already created a logistic model of team ability.

References:
Arsenoff, Gordon. 2009. "FRIAR: An Exact Model for Ranking Quizbowl Teams using Question-Level Results." Paper presented to hsquizbowl.org forums in response to S-Value Revision call for proposals (viewtopic.php?f=9&t=8142,), August 17, 2009. URL: http://128.252.199.22/~friar/090816-FRIAR.pdf
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: A New, Complicated(!) Stat(?)

Post by cvdwightw »

The Friar wrote:Arsenoff (2009) has already created a logistic model of team ability.

References:
Arsenoff, Gordon. 2009. "FRIAR: An Exact Model for Ranking Quizbowl Teams using Question-Level Results." Paper presented to hsquizbowl.org forums in response to S-Value Revision call for proposals (viewtopic.php?f=9&t=8142,), August 17, 2009. URL: http://128.252.199.22/~friar/090816-FRIAR.pdf
Yeah, that threw me for a bit, because you're using Bayesian methods instead of likelihood estimates (which, as I understand, is what logistic regression actually does). Also, as I understand it, you're estimating the difficulty of each tossup and bonus part individually, whereas I'm saying quite simply that the conversion rate is the objective difficulty of the question.

I think that you're using bonus parts individually, whereas I'm looking at whether the team answers 0, 1, 2, or 3 bonus parts. In any case, I recognize that your model is (a) much more advanced right now and (b) quite similar, but I do think that there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
User avatar
The Friar
Wakka
Posts: 159
Joined: Fri Jul 10, 2009 2:39 pm

Re: A New, Complicated(!) Stat(?)

Post by The Friar »

cvdwightw wrote:using Bayesian methods instead of likelihood estimates (which, as I understand, is what logistic regression actually does)
Logistic regression is a probability model that you can estimate using any number of tactics.
cvdwightw wrote:I'm saying quite simply that the conversion rate is the objective difficulty of the question.
These should be identical as long as all players play all questions or if the sample of players that actually hears a question is representative of the whole population; otherwise conversion rate alone may be a biased estimator of difficulty.
cvdwightw wrote:I think that you're using bonus parts individually, whereas I'm looking at whether the team answers 0, 1, 2, or 3 bonus parts.
Negative. I adopted the latter format for bonus data and specifically advocated against using the former since some nonstandard bonus formats would be hard to capture that way.
cvdwightw wrote:your model is (a) much more advanced right now
I disagree! These are on very much the same level of sophistication.
cvdwightw wrote:and (b) quite similar
Citation is all I'm sayin', man.
cvdwightw wrote: there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).
Start selling me on this bit! I don't know what compression in simultaneous masking is just yet, but it sounds wicked cool. I will go look up Humes and Jesteadt (1989) when I have some time. Meanwhile, when you've got the LaTeX requested below, I'd love to request a copy.
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: A New, Complicated(!) Stat(?)

Post by cvdwightw »

The Friar wrote:
cvdwightw wrote: there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).
Start selling me on this bit! I don't know what compression in simultaneous masking is just yet, but it sounds wicked cool. I will go look up Humes and Jesteadt (1989) when I have some time. Meanwhile, when you've got the LaTeX requested below, I'd love to request a copy.
(I misspoke; reading other papers, e.g. Oxenham and Moore, 1994, shows that Humes and Jesteadt applied nonsimultaneous maskers to simultaneous masking, which isn't completely right, but it's better for what I want to do, since we're actually getting nonsimultaneous masking).

Essentially, the idea is that we have some intensity I_QT which is the threshold in quiet, and some threshold I_MTx which is the threshold with a masker x. These are related by the equation i_x = I_MTx^P - I_QT^P, where i_x is the masking effectiveness of masker x and P is the compression coefficient. Furthermore, the masking effectiveness is additive.

What we have is I_MTx, I_MTy, etc., where x, y, etc. are "maskers" - teams playing against you. We also have lumped data from any combination of 2 or more maskers (which are nonsimultaneous). So we have n+2 variables (I_QT, P, and i_n for n teams) and 2^n - 1 equations, and we should thus be able to estimate I_QT (the quiet threshold). Alternatively, we can calculate total i_all, and use that as a strength of schedule factor.

Well, since I_QT and I_MTx are just thresholds, and we can measure the threshold at any point along the psychometric curve, which is typically logistic, we should be able to run logistic regression to figure out what I_MTx looks like as a function of difficulty and then just pick a point (say, the 50% point) to look at. Since the room is "target+masker" data, that gives us a second psychometric curve, and we should be able to subtract the two. I'll readily admit that neither the psychophysics nor the math is really sound from this paragraph through the rest of the model, but I've gotten stuck on where to go from here (assistance appreciated).
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
User avatar
Sen. Estes Kefauver (D-TN)
Chairman of Anti-Music Mafia Committee
Posts: 5647
Joined: Wed Jul 26, 2006 11:46 pm

Re: A New, Complicated(!) Stat(?)

Post by Sen. Estes Kefauver (D-TN) »

What?
Charlie Dees, North Kansas City HS '08
"I won't say more because I know some of you parse everything I say." - Jeremy Gibbs

"At one TJ tournament the neg prize was the Hampshire College ultimate frisbee team (nude) calender featuring one Evan Silberman. In retrospect that could have been a disaster." - Harry White
User avatar
Mechanical Beasts
Banned Cheater
Posts: 5673
Joined: Thu Jun 08, 2006 10:50 pm

Re: A New, Complicated(!) Stat(?)

Post by Mechanical Beasts »

I'm not so sure I like the (OBC+YBC)/YBC approximation for k, as you outline in the paper. The chance that an opponent knows the answer at a point earlier than you do--which is essentially what the steal factor is trying to model, right?--presumably depends on the chance that your opponent knows at least one clue present in the tossup for that answer that is harder than the hardest clue you know for that answer. So doesn't it make sense that the best measure for this scale is something proportional to opponent's BC [generally/in that category/subcategory, depending on desired precision/cumbersome-ness] minus your BC [ditto]? Might be more than a linear proportion, of course, seeing as if your opponent knows that clue, then he's kind of close to guaranteed converting that tossup (it's only not a straight-up guarantee insofar as imperfect play (sitting) and imperfect questions are concerned, plus the factor that my Dylan Thomas knowledge doesn't imply other British lit knowledge (or even other Welsh lit knowledge) and certainly not other lit knowledge).

Anyway! I suppose I don't entirely know what I'm talking about, but this mostly does make sense, right?
Andrew Watkins
User avatar
Mechanical Beasts
Banned Cheater
Posts: 5673
Joined: Thu Jun 08, 2006 10:50 pm

Re: A New, Complicated(!) Stat(?)

Post by Mechanical Beasts »

Jeremy Gibbs Free Energy wrote:What?
neeeeeeeeeeeerd
Andrew Watkins
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: A New, Complicated(!) Stat(?)

Post by cvdwightw »

Norman the Lunatic wrote:I'm not so sure I like the (OBC+YBC)/YBC approximation for k, as you outline in the paper. The chance that an opponent knows the answer at a point earlier than you do--which is essentially what the steal factor is trying to model, right?--presumably depends on the chance that your opponent knows at least one clue present in the tossup for that answer that is harder than the hardest clue you know for that answer. So doesn't it make sense that the best measure for this scale is something proportional to opponent's BC [generally/in that category/subcategory, depending on desired precision/cumbersome-ness] minus your BC [ditto]? Might be more than a linear proportion, of course, seeing as if your opponent knows that clue, then he's kind of close to guaranteed converting that tossup (it's only not a straight-up guarantee insofar as imperfect play (sitting) and imperfect questions are concerned, plus the factor that my Dylan Thomas knowledge doesn't imply other British lit knowledge (or even other Welsh lit knowledge) and certainly not other lit knowledge).

Anyway! I suppose I don't entirely know what I'm talking about, but this mostly does make sense, right?
Well, first off, there's about four people who know what you're talking about, since I haven't uploaded that paper. Second, due to the mathematics of things, k has to be between 0 and 1 (not inclusive, since neither 0 nor 1 make real-world sense), since a negative steal factor makes no sense (your chances of getting a tossup don't increase if your opponent also knows the answer).

The other thing you're forgetting is that P_whatever represents a probability, and for now, it's an all-subjects-inclusive probability. So knowing one thing doesn't imply knowing another; rather, it's a probability that you know the answer (or answer the question, or whatever). If all you know about literature is Dylan Thomas, then the probability you know the answer is quite low - but still nonzero.

My best idea for the k-factor is that each team has a distribution of bonus scores, with higher scores representing more knowledge. Since clues are discrete quanta, we can discretize the resulting distributions (probably a gamma distribution where the variable is either BC or 30-BC, depending on skew) to approximate a "buzz distribution" and compute the resulting probabilities that either you buzz off an earlier clue or you buzz off the same clue. Then the k-factor is P(later) + 0.5*P(same). Obviously this doesn't take into account things like buzzer speed or lateral thinking, and unless we define P_Y and P_O as the probabilities of knowing the answer and not negging, it doesn't account for negs either. Without that data (because here I'm trying to work with ensemble data, not question-level data like in my previous explanations), the best estimate we have is the mean of both distributions; i.e. the team bonus conversion.

Then, perhaps a better k-factor is, as you suggest, along the lines of (0.5 + (OBC-YBC)/30), since that makes some real world sense. But this implies that a team with 20 BC playing a team with 15 BC has an equal k-factor to a team with 10 BC playing a team with 5 BC, which I don't think is right, and it also implies that a team with 25 BC playing a team with 5 BC has a k-factor < 0, which doesn't make real-world sense.

Actually, the best thing to do might be to transform from raw BC to yet another probability; this time, the probability is that the team can beat an "average" team with bonus conversion of 15. I like exponential-form distributions, so I'm going to arbitrarily decide it looks somewhat exponential. It's an ugly transformation, but we get that the probability distribution is

P = .5*e^(.5*(e^(0.5)-e^(-0.5))*X-1), where X = BC*e^(0.5)/(15*(e-1)). Then the k-factor is 0.5+P(Opponent > 15)-P(You > 15). Then, a team with 20 BC vs a team with 15 BC has a k-factor of ~0.41, 15 BC vs 10 BC has a k-factor of ~0.42, 10 BC vs 5 BC has a k-factor of ~0.43 (remember, higher k-factors are worse; a k-factor > 0.5 means your opponent is stealing more from you than you are stealing from him). For 20 BC vs 5 BC, that's a k-factor of ~0.26; 25 BC vs 5 BC, k-factor of ~0.16. We're also not really stretching the boundaries of reality - k-factors < 0 are essentially a ~29 PPB team playing a ~0 PPB team, and we can always round to 0 if something that weird happens. I think this distribution overestimates the k-factor a little bit, since a team with a bonus conversion of 0 still has about a 30 percent chance of stealing the tossup from a 15 PPB opponent if the team knows the answer, so I may try to tweak it a little more.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
Locked