First, would you accept a question-level model in which difficulty parameters were not estimated for every single question, but only for each division (DI only, crossover, DII only)? This assumption has already failed to raise eyebrows in DP-FRIAR, although maybe it's just been drowned out by even more serious issues with that model, but I have not asked about incorporating it into a model where data points are at the question level rather than the whole-game level.
This modeling strategy would be equivalent to an assumption that each type of difficulty (i.e., of powering, of answering correctly, of not negging) is approximately normally distributed within division, and works best if the within-division variance in question difficulties is small compared to the between-division variance. The model I am considering contains an appropriate "sink" for the within-division variance that is being considered random error instead of being explicitly modeled -- it should drive down the specificity parameter (the slope of the item response curve) with respect to what that parameter would be if individual question difficulties were estimated.
Not having to estimate difficulty levels for each individual question should reduce the number of parameters estimated by an order of magnitude. Seriously. My best guess is FRIAR or FRIAR 2 with individual question difficulties will have about 3300 parameters for a full SCT of data, while with divisional difficulties it will have about 300 (or however many total teams there are, plus about 13). This will make it possible to optimize the model much faster once coded. Further, it will make it much easier to code the model; with individual question difficulties, it is necessary to check which parameters can actually be estimated for each question and work around those that cannot, because, on several questions (and just one is enough to wreck the optimizer!), not all of the possible outcomes even occurred, owing to the small number of times each one was played. Finally, if the assumptions do indeed hold, estimating fewer parameters means that the ones that are estimated are estimated more accurately (though not necessarily more precisely).
Second, would you be comfortable saying that, net of the effect of the other team -- as if it were playing against hypothetical buzzer rocks -- a better team is more likely than a worse one:
- to score zero rather than neg;
- to score ten rather than zero; and
- to power rather than score ten?
If this ordinal relationship holds -- that is, that power is better evidence that you're a strong team than ten points, ten points better evidence than zero, and zero better evidence than a neg -- then I don't have to break out the multinomial logit (or probit -- FRIAR 2.0 may end up with a probit link) setup at any point. (The spoiler: there is a step in which each team's latent or "natural" tossup outcome is drawn simultaneously, though the latent outcome is observed perfectly for only the first team to buzz in most cases, for neither in a couple of cases, and for both only when both latent outcomes are zero.)