A New, Complicated(!) Stat(?)

cvdwightw · Post by **cvdwightw** » Tue Sep 01, 2009 9:04 pm

This is the only fruitful part of a thought process that started with the idea that tossup conversion data is just a "masked" signal of a team's true ability, so all we'd have to do is figure out how much masking (from your negs and the other team's buzzes) is present, and we can transform back to a "threshold" of knowledge.

We compute two data sets as a function of tossup conversion rate. The first data set is whether the tossup was answered in the room with a certain team, and the second is whether the tossup was answered by that team. We can then run these through a binary logistic model and get two equations of the form P = (e^(a+bx))/(1+e^(a+bx)), where P is probability, a and b are coefficients, and x is the tossup conversion rate. By Bayes' theorem, P(T|R) = P(R|T)*P(T)/P(R), where R is the event "answered in room" and T is the event "answered by team." It should be obvious that P(R|T) = 1, so P(T|R) = P(T)/P(R).

For any tossup conversion rate x, then, P(T|R) = (e^(a_t+b_t*x)+e^(a_t+a_r+(b_t+b_r)*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). The amount of overall "masking" is then the probability that the other team got the question, given that someone in the room got it, which is just 1-P(T|R) = (e^(a_r+b_r*x)-e^(a_t+b_t*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). It would then make sense that your "unmasked" signal would be the probability that someone in the room got the question ("observed signal") minus the probability that given that someone got the question, it was the other team ("known masker signal"); or, P(R) - (1-P(T|R)), which is ridiculously complicated if you try to combine fractions so I'll just leave it as (e^(a_r+b_r*x))/(1+e^(a_r+b_r*x))-(e^(a_r+b_r*x)-e^(a_t+b_t*x))/(e^(a_r+b_r*x)+e^(a_r+a_t+(b_r+b_t)*x)). These "unmasked" values can then be computed for any tossup conversion rate x - that is, given an opposing team of empty chairs, it finds the probability that your team will answer a tossup that is converted at a certain rate.

It obviously doesn't take negs into account (the model can just be applied to tossups that are not negged, although that would result in excess complexity), but I think this has a significantly-greater-than-zero chance of working. Unfortunately, I don't have a complete set of data for any single tournament (I can see if I still have the Gaddis scoresheets), so I haven't had a chance to test this idea out. Obviously, questions/comments/criticism/etc. are welcome.

Apparently this page can perform (binary) logistic regression for you.

I had an even longer post about doing the same thing with bonus conversion, but that would require an ordinal logistic model, which is more complicated (still need a statistics package, e.g. R with the "Design" library, and lots of other assumptions you have to check).

Sima Guang Hater · Post by **Sima Guang Hater** » Tue Sep 01, 2009 9:06 pm

Hey, for once, I'd like to be able to understand your math (and I think I'm capable of it). Can you LaTeX this and send it to me as a pdf.

Mechanical Beasts · Post by **Mechanical Beasts** » Tue Sep 01, 2009 9:32 pm

The Quest for the Historical Mukherjesus wrote:Hey, for once, I'd like to be able to understand your math (and I think I'm capable of it). Can you LaTeX this and send it to me as a pdf.

Ditto.

The Friar · Post by **The Friar** » Wed Sep 02, 2009 1:53 pm

Arsenoff (2009) has already created a logistic model of team ability.

References:
Arsenoff, Gordon. 2009. "FRIAR: An Exact Model for Ranking Quizbowl Teams using Question-Level Results." Paper presented to hsquizbowl.org forums in response to S-Value Revision call for proposals (viewtopic.php?f=9&t=8142,), August 17, 2009. URL: http://128.252.199.22/~friar/090816-FRIAR.pdf

cvdwightw · Post by **cvdwightw** » Wed Sep 02, 2009 2:57 pm

The Friar wrote:Arsenoff (2009) has already created a logistic model of team ability.

References:
Arsenoff, Gordon. 2009. "FRIAR: An Exact Model for Ranking Quizbowl Teams using Question-Level Results." Paper presented to hsquizbowl.org forums in response to S-Value Revision call for proposals (viewtopic.php?f=9&t=8142,), August 17, 2009. URL: http://128.252.199.22/~friar/090816-FRIAR.pdf

Yeah, that threw me for a bit, because you're using Bayesian methods instead of likelihood estimates (which, as I understand, is what logistic regression actually does). Also, as I understand it, you're estimating the difficulty of each tossup and bonus part individually, whereas I'm saying quite simply that the conversion rate is the objective difficulty of the question.

I think that you're using bonus parts individually, whereas I'm looking at whether the team answers 0, 1, 2, or 3 bonus parts. In any case, I recognize that your model is (a) much more advanced right now and (b) quite similar, but I do think that there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).

The Friar · Post by **The Friar** » Wed Sep 02, 2009 3:13 pm

cvdwightw wrote:using Bayesian methods instead of likelihood estimates (which, as I understand, is what logistic regression actually does)

Logistic regression is a probability model that you can estimate using any number of tactics.

cvdwightw wrote:I'm saying quite simply that the conversion rate is the objective difficulty of the question.

These should be identical as long as all players play all questions or if the sample of players that actually hears a question is representative of the whole population; otherwise conversion rate alone may be a biased estimator of difficulty.

cvdwightw wrote:I think that you're using bonus parts individually, whereas I'm looking at whether the team answers 0, 1, 2, or 3 bonus parts.

Negative. I adopted the latter format for bonus data and specifically advocated against using the former since some nonstandard bonus formats would be hard to capture that way.

cvdwightw wrote:your model is (a) much more advanced right now

I disagree! These are on very much the same level of sophistication.

cvdwightw wrote:and (b) quite similar

Citation is all I'm sayin', man.

cvdwightw wrote: there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).

Start selling me on this bit! I don't know what compression in simultaneous masking is just yet, but it sounds wicked cool. I will go look up Humes and Jesteadt (1989) when I have some time. Meanwhile, when you've got the LaTeX requested below, I'd love to request a copy.

cvdwightw · Post by **cvdwightw** » Wed Sep 02, 2009 9:12 pm

The Friar wrote:
cvdwightw wrote: there's some parts that differ quite significantly between the two models (not least in the psychophysical bases of our theory - you're using Item Response Theory, while I started from a modified power-law model of compression in simultaneous masking (Humes and Jesteadt, 1989)).
Start selling me on this bit! I don't know what compression in simultaneous masking is just yet, but it sounds wicked cool. I will go look up Humes and Jesteadt (1989) when I have some time. Meanwhile, when you've got the LaTeX requested below, I'd love to request a copy.

(I misspoke; reading other papers, e.g. Oxenham and Moore, 1994, shows that Humes and Jesteadt applied nonsimultaneous maskers to simultaneous masking, which isn't completely right, but it's better for what I want to do, since we're actually getting nonsimultaneous masking).

Essentially, the idea is that we have some intensity I_QT which is the threshold in quiet, and some threshold I_MTx which is the threshold with a masker x. These are related by the equation i_x = I_MTx^P - I_QT^P, where i_x is the masking effectiveness of masker x and P is the compression coefficient. Furthermore, the masking effectiveness is additive.

What we have is I_MTx, I_MTy, etc., where x, y, etc. are "maskers" - teams playing against you. We also have lumped data from any combination of 2 or more maskers (which are nonsimultaneous). So we have n+2 variables (I_QT, P, and i_n for n teams) and 2^n - 1 equations, and we should thus be able to estimate I_QT (the quiet threshold). Alternatively, we can calculate total i_all, and use that as a strength of schedule factor.

Well, since I_QT and I_MTx are just thresholds, and we can measure the threshold at any point along the psychometric curve, which is typically logistic, we should be able to run logistic regression to figure out what I_MTx looks like as a function of difficulty and then just pick a point (say, the 50% point) to look at. Since the room is "target+masker" data, that gives us a second psychometric curve, and we should be able to subtract the two. I'll readily admit that neither the psychophysics nor the math is really sound from this paragraph through the rest of the model, but I've gotten stuck on where to go from here (assistance appreciated).

Sen. Estes Kefauver (D-TN) · Wed Sep 02, 2009 9:35 pm

What?

Mechanical Beasts · Post by **Mechanical Beasts** » Wed Sep 02, 2009 9:44 pm

I'm not so sure I like the (OBC+YBC)/YBC approximation for k, as you outline in the paper. The chance that an opponent knows the answer at a point earlier than you do--which is essentially what the steal factor is trying to model, right?--presumably depends on the chance that your opponent knows at least one clue present in the tossup for that answer that is harder than the hardest clue you know for that answer. So doesn't it make sense that the best measure for this scale is something proportional to opponent's BC [generally/in that category/subcategory, depending on desired precision/cumbersome-ness] minus your BC [ditto]? Might be more than a linear proportion, of course, seeing as if your opponent knows that clue, then he's kind of close to guaranteed converting that tossup (it's only not a straight-up guarantee insofar as imperfect play (sitting) and imperfect questions are concerned, plus the factor that my Dylan Thomas knowledge doesn't imply other British lit knowledge (or even other Welsh lit knowledge) and certainly not other lit knowledge).

Anyway! I suppose I don't entirely know what I'm talking about, but this mostly does make sense, right?

Mechanical Beasts · Post by **Mechanical Beasts** » Wed Sep 02, 2009 9:44 pm

Jeremy Gibbs Free Energy wrote:What?

neeeeeeeeeeeerd

cvdwightw · Post by **cvdwightw** » Thu Sep 03, 2009 2:19 pm

Norman the Lunatic wrote:I'm not so sure I like the (OBC+YBC)/YBC approximation for k, as you outline in the paper. The chance that an opponent knows the answer at a point earlier than you do--which is essentially what the steal factor is trying to model, right?--presumably depends on the chance that your opponent knows at least one clue present in the tossup for that answer that is harder than the hardest clue you know for that answer. So doesn't it make sense that the best measure for this scale is something proportional to opponent's BC [generally/in that category/subcategory, depending on desired precision/cumbersome-ness] minus your BC [ditto]? Might be more than a linear proportion, of course, seeing as if your opponent knows that clue, then he's kind of close to guaranteed converting that tossup (it's only not a straight-up guarantee insofar as imperfect play (sitting) and imperfect questions are concerned, plus the factor that my Dylan Thomas knowledge doesn't imply other British lit knowledge (or even other Welsh lit knowledge) and certainly not other lit knowledge).

Anyway! I suppose I don't entirely know what I'm talking about, but this mostly does make sense, right?

Well, first off, there's about four people who know what you're talking about, since I haven't uploaded that paper. Second, due to the mathematics of things, k has to be between 0 and 1 (not inclusive, since neither 0 nor 1 make real-world sense), since a negative steal factor makes no sense (your chances of getting a tossup don't increase if your opponent also knows the answer).

The other thing you're forgetting is that P_whatever represents a probability, and for now, it's an all-subjects-inclusive probability. So knowing one thing doesn't imply knowing another; rather, it's a probability that you know the answer (or answer the question, or whatever). If all you know about literature is Dylan Thomas, then the probability you know the answer is quite low - but still nonzero.

My best idea for the k-factor is that each team has a distribution of bonus scores, with higher scores representing more knowledge. Since clues are discrete quanta, we can discretize the resulting distributions (probably a gamma distribution where the variable is either BC or 30-BC, depending on skew) to approximate a "buzz distribution" and compute the resulting probabilities that either you buzz off an earlier clue or you buzz off the same clue. Then the k-factor is P(later) + 0.5*P(same). Obviously this doesn't take into account things like buzzer speed or lateral thinking, and unless we define P_Y and P_O as the probabilities of knowing the answer and not negging, it doesn't account for negs either. Without that data (because here I'm trying to work with ensemble data, not question-level data like in my previous explanations), the best estimate we have is the mean of both distributions; i.e. the team bonus conversion.

Then, perhaps a better k-factor is, as you suggest, along the lines of (0.5 + (OBC-YBC)/30), since that makes some real world sense. But this implies that a team with 20 BC playing a team with 15 BC has an equal k-factor to a team with 10 BC playing a team with 5 BC, which I don't think is right, and it also implies that a team with 25 BC playing a team with 5 BC has a k-factor < 0, which doesn't make real-world sense.

Actually, the best thing to do might be to transform from raw BC to yet another probability; this time, the probability is that the team can beat an "average" team with bonus conversion of 15. I like exponential-form distributions, so I'm going to arbitrarily decide it looks somewhat exponential. It's an ugly transformation, but we get that the probability distribution is

P = .5*e^(.5*(e^(0.5)-e^(-0.5))*X-1), where X = BC*e^(0.5)/(15*(e-1)). Then the k-factor is 0.5+P(Opponent > 15)-P(You > 15). Then, a team with 20 BC vs a team with 15 BC has a k-factor of ~0.41, 15 BC vs 10 BC has a k-factor of ~0.42, 10 BC vs 5 BC has a k-factor of ~0.43 (remember, higher k-factors are worse; a k-factor > 0.5 means your opponent is stealing more from you than you are stealing from him). For 20 BC vs 5 BC, that's a k-factor of ~0.26; 25 BC vs 5 BC, k-factor of ~0.16. We're also not really stretching the boundaries of reality - k-factors < 0 are essentially a ~29 PPB team playing a ~0 PPB team, and we can always round to 0 if something that weird happens. I think this distribution overestimates the k-factor a little bit, since a team with a bonus conversion of 0 still has about a 30 percent chance of stealing the tossup from a 15 PPB opponent if the team knows the answer, so I may try to tweak it a little more.

The Quizbowl Resource Center

A New, Complicated(!) Stat(?)

A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)

Re: A New, Complicated(!) Stat(?)