Floating two modeling assumptions for FRIAR 2.0

Old college threads.
Locked
User avatar
The Friar
Wakka
Posts: 158
Joined: Fri Jul 10, 2009 2:39 pm

Floating two modeling assumptions for FRIAR 2.0

Post by The Friar »

Since I am getting ready to do another big slug of rating system coding, I thought it the appropriate time to ask whether a couple assumptions I would really like to use (but don't absolutely have to) in the design of FRIAR 2.0 meet the smell test in the community. I'm going to begin by using them, and only alter them if reaction here is negative. Altering them will cause delayed release.

First, would you accept a question-level model in which difficulty parameters were not estimated for every single question, but only for each division (DI only, crossover, DII only)? This assumption has already failed to raise eyebrows in DP-FRIAR, although maybe it's just been drowned out by even more serious issues with that model, but I have not asked about incorporating it into a model where data points are at the question level rather than the whole-game level.

This modeling strategy would be equivalent to an assumption that each type of difficulty (i.e., of powering, of answering correctly, of not negging) is approximately normally distributed within division, and works best if the within-division variance in question difficulties is small compared to the between-division variance. The model I am considering contains an appropriate "sink" for the within-division variance that is being considered random error instead of being explicitly modeled -- it should drive down the specificity parameter (the slope of the item response curve) with respect to what that parameter would be if individual question difficulties were estimated.

Not having to estimate difficulty levels for each individual question should reduce the number of parameters estimated by an order of magnitude. Seriously. My best guess is FRIAR or FRIAR 2 with individual question difficulties will have about 3300 parameters for a full SCT of data, while with divisional difficulties it will have about 300 (or however many total teams there are, plus about 13). This will make it possible to optimize the model much faster once coded. Further, it will make it much easier to code the model; with individual question difficulties, it is necessary to check which parameters can actually be estimated for each question and work around those that cannot, because, on several questions (and just one is enough to wreck the optimizer!), not all of the possible outcomes even occurred, owing to the small number of times each one was played. Finally, if the assumptions do indeed hold, estimating fewer parameters means that the ones that are estimated are estimated more accurately (though not necessarily more precisely).

Second, would you be comfortable saying that, net of the effect of the other team -- as if it were playing against hypothetical buzzer rocks -- a better team is more likely than a worse one:
  • to score zero rather than neg;
  • to score ten rather than zero; and
  • to power rather than score ten?
Would you agree with the same statement if "better" and "worse" were replaced, say, with "more knowledgeable" and "less knowledgeable"?

If this ordinal relationship holds -- that is, that power is better evidence that you're a strong team than ten points, ten points better evidence than zero, and zero better evidence than a neg -- then I don't have to break out the multinomial logit (or probit -- FRIAR 2.0 may end up with a probit link) setup at any point. (The spoiler: there is a step in which each team's latent or "natural" tossup outcome is drawn simultaneously, though the latent outcome is observed perfectly for only the first team to buzz in most cases, for neither in a couple of cases, and for both only when both latent outcomes are zero.)
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!

User avatar
cvdwightw
Auron
Posts: 3446
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: Floating two modeling assumptions for FRIAR 2.0

Post by cvdwightw »

The Friar wrote:Second, would you be comfortable saying that, net of the effect of the other team -- as if it were playing against hypothetical buzzer rocks -- a better team is more likely than a worse one:
  • to score zero rather than neg;
I might address some other things later, but this isn't a valid assumption. Teams neg independently of their ability to score points. As an extreme example, Maryland negged more than any other team in the nation at ICT and finished atop the second bracket, and all three of the top-negging teams in D2 finished in the top bracket. I don't know that there's a correlation between the number of negs and a team's ability to score points; it has to deal far more with a team's aggression, which can help or hurt a team but probably zeroes out in the end.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry

User avatar
The Friar
Wakka
Posts: 158
Joined: Fri Jul 10, 2009 2:39 pm

Re: Floating two modeling assumptions for FRIAR 2.0

Post by The Friar »

[/quote]I might address some other things later, but this isn't a valid assumption. Teams neg independently of their ability to score points. As an extreme example, Maryland negged more than any other team in the nation at ICT and finished atop the second bracket, and all three of the top-negging teams in D2 finished in the top bracket. I don't know that there's a correlation between the number of negs and a team's ability to score points; it has to deal far more with a team's aggression, which can help or hurt a team but probably zeroes out in the end.[/quote]So, I agree, the observed neg/zero ratio does go up as teams get better.* However, I'm looking at latent, not observed, buzzes, net of things like which team actually succeeds in buzzing first. My intuition is that just buzzing first, whether with a neg or the right answer, is itself associated with being a stronger team, and that, controlling for that appropriately, we can hope to consider negs (whether observed or masked by the other team buzzing first) as suggestive of a weaker team than a zero on the same question would be.

Nonetheless I can get out my Fall '07 notes and look at breaking the independence of irrelevant alternatives assumption with multinomial probit. It should be much less of a challenge to do that than to work around the other assumption as I would like to.

*This was what brought down my original version of FRIAR with negs and powers. That model converged only when the parameters for the effect of observed tens (given no power) and of observed negs on rating were constrained to have opposite sign -- that is, when stronger teams were observed to neg less often (since, of course, stronger teams are observed to score ten more often given no one having powered). Entering SCT after SCT worth of scoresheets got it through my head that the opposite was true. I thought I observed a more subtle pattern in those sheets, too: strong teams could sneak their negs in there before weak ones buzzed with anything more than the other way around.
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!

User avatar
The King's Flight to the Scots
Auron
Posts: 1495
Joined: Mon Jan 26, 2009 11:11 pm

Re: Floating two modeling assumptions for FRIAR 2.0

Post by The King's Flight to the Scots »

So...in layman's terms, what exactly is this program, anyway?
Matt Bollinger
UVA '14, UVA '15
Communications Officer, ACF

User avatar
cvdwightw
Auron
Posts: 3446
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: Floating two modeling assumptions for FRIAR 2.0

Post by cvdwightw »

The Friar wrote:So, I agree, the observed neg/zero ratio does go up as teams get better.* However, I'm looking at latent, not observed, buzzes, net of things like which team actually succeeds in buzzing first. My intuition is that just buzzing first, whether with a neg or the right answer, is itself associated with being a stronger team, and that, controlling for that appropriately, we can hope to consider negs (whether observed or masked by the other team buzzing first) as suggestive of a weaker team than a zero on the same question would be.
I don't think you quite grasp what I'm saying here. I'm saying that the frequency at which a team negs is independent of a team's ability to score points. That of course means that a team that negs at a certain rate will neg at a certain rate regardless of how good that team is. Furthermore, teams tend to change strategies depending on the quality of their opponents (many teams play more passively against lower-level teams, knowing that they can on average "sit" for longer and still get the tossup rather than neg and remove all chance) or difficult-to-model factors (for instance, I often neg much more in the first few rounds of tournaments while I am feeling out the difficulty and my ability to figure things out, then neg much less in the afternoon). Therefore any model that includes negs (observed or latent) needs to model them as a strength-independent decrease in the number of tossups a team has a "chance" of answering, not as an inherent indicator of team strength.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry

User avatar
The Friar
Wakka
Posts: 158
Joined: Fri Jul 10, 2009 2:39 pm

Re: Floating two modeling assumptions for FRIAR 2.0

Post by The Friar »

cvdwightw wrote:any model that includes negs (observed or latent) needs to model them as a strength-independent decrease in the number of tossups a team has a "chance" of answering, not as an inherent indicator of team strength.
So, in other words, it would be much more realistic to just treat negs as occasions where a team happened to get minus five points and worry only about power, ten, and zero in the modeling of team strength.
over a year ago the Friar wrote:FRIAR looks at negs as simply instances in which a team happened to lose five points (a datum it ignores) while not correctly answering a tossup.
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!

Charbroil
Auron
Posts: 1145
Joined: Fri Jun 09, 2006 11:52 am
Location: St. Charles, MO

Re: Floating two modeling assumptions for FRIAR 2.0

Post by Charbroil »

Cernel Joson wrote:So...in layman's terms, what exactly is this program, anyway?
It's an evaluator of team strength, along the lines of the new NAQT D-Value (or the old S-Value).
Charles Hang
Francis Howell Central '09
St. Charles Community College '14
Washington University in St. Louis '19 (President, 2017-19)

Owner, Olympia Academic Competition Questions, LLC
Question Writer, National Academic Quiz Tournaments, LLC and National History Bee and Bowl

User avatar
The Friar
Wakka
Posts: 158
Joined: Fri Jul 10, 2009 2:39 pm

Re: Floating two modeling assumptions for FRIAR 2.0

Post by The Friar »

Cernel Joson wrote:So...in layman's terms, what exactly is this program, anyway?
Matt, I'm so sorry. I missed what you asked. Charles got to it before I did. His answer is correct.
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!

User avatar
The Friar
Wakka
Posts: 158
Joined: Fri Jul 10, 2009 2:39 pm

Re: Floating two modeling assumptions for FRIAR 2.0

Post by The Friar »

Just realized the first proposed simplification would get rid of the need to constrain FRIAR 1.0 specificity parameters to have opposite sign. If there is only a division-wide tossup difficulty, then getting any tossup right ever means you can't be an infinitely bad team. With difficulty parameters per question, some simulated teams
  • didn't ever get any tossups right for which the ten-point difficulty itself was greater than minus infinity, and also
  • never got any bonus points except for bonus points every team got, and
  • didn't neg anything,
which meant negging had to be bad for teams' ratings unless we wanted some teams to be rated minus infinity, causing the optimization routing to crash (and to do so in ways surprising and not always informative).

I still think the latent-outcome specification might be more appropriate conceptually, but if others (Dwight? Seth? Jeff?) buy in to the idea that we can live with a lack of modeled intra-division variation in question hardnesses, I can finally put the finishing touches on FRIAR 1.0 (Lite) and ship it in the next week or so, as I promised I would by about last weekend.

The second assumption is just totally not necessary, I've decided, and doesn't make my modeling life any simpler than doing things a more agreeable way, with latent zero/ten/power outcomes, still considered ordinal, drawn conditional on not drawing a latent neg from a separate binomial distribution, depending in a different way on team characteristics... not that the FRIAR 2 project should be on the front burner presently anyhow.

I apologize again for jerking everyone around. I wish these things would occur to me in a more timely fashion.
Gordon Arsenoff
Rochester '06
WUStL '14 (really)

Developer of WUStL Updates Statistics Live!

User avatar
Important Bird Area
Forums Staff: Administrator
Posts: 5609
Joined: Thu Aug 28, 2003 3:33 pm
Location: San Francisco Bay Area
Contact:

Re: Floating two modeling assumptions for FRIAR 2.0

Post by Important Bird Area »

The Friar wrote:I still think the latent-outcome specification might be more appropriate conceptually, but if others (Dwight? Seth? Jeff?) buy in to the idea that we can live with a lack of modeled intra-division variation in question hardnesses, I can finally put the finishing touches on FRIAR 1.0 (Lite) and ship it in the next week or so, as I promised I would by about last weekend.
The math on this passed me by ages ago. Dwight?
Jeff Hoppes
President, Northern California Quiz Bowl Alliance
former HSQB Chief Admin (2012-13)
VP for Communication and history subject editor, NAQT
Editor emeritus, ACF

"I wish to make some kind of joke about Jeff's love of birds, but I always fear he'll turn them on me Hitchcock-style." -Fred

Locked