A Different Way to Quantify Buzzing

Old college threads.
Locked
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

A Different Way to Quantify Buzzing

Post by evilmonkey »

I just typed up a longer post explaining what these numbers are and how/why I came up with them, but my internet died while submitting, and that post disappeared into the ether. I don't have time to retype that before Good Friday services, so maybe I'll do it later, but here they are for now. They are based on data from ICT 2012; the middle column is the probability of buzzing first on any given question when playing an average team in the field - the probability of buzzing first on any given question against any other team can be calculated with the Bill James log5 formula (with these numbers acting as the percent of time a team "wins" by buzzing first). The last column is the probability of a neg given that the team buzzed in first.

Code: Select all

               Team P[Buzz First] P[Neg|Buzz First]
29       Virginia A     0.8817422        0.15639810
32             Yale     0.8189777        0.09142857
15         Maryland     0.8041403        0.24705882
16       Michigan A     0.7965477        0.15469613
13       Illinois A     0.7932358        0.20858896
6         Chicago A     0.7906004        0.25170068
27     UC San Diego     0.7891636        0.18644068
23             Penn     0.7764012        0.25465839
12          Harvard     0.7538409        0.20408163
3             Brown     0.7054706        0.32061069
18        Minnesota     0.6534550        0.20869565
1           Alabama     0.6359473        0.20689655
19              MIT     0.6186067        0.12380952
9          Columbia     0.5989403        0.20149254
10     Georgia Tech     0.5770207        0.26315789
31            WUSTL     0.5628401        0.31868132
30       Virginia B     0.5363577        0.27966102
21       Ohio St. A     0.5262415        0.28571429
4  Carleton College     0.4967302        0.24509804
7         Chicago B     0.4079407        0.27433628
17       Michigan B     0.3845222        0.35064935
20     Northwestern     0.3745670        0.36363636
28              VCU     0.3666246        0.18556701
26          Toronto     0.3497183        0.43023256
24              RPI     0.3361205        0.19753086
22       Ohio St. B     0.3350613        0.40540541
14       Illinois B     0.3152851        0.35802469
5   Carnegie Mellon     0.3114949        0.27777778
25        Texas A&M     0.2364728        0.50769231
2       Arizona St.     0.1732829        0.51428571
11           Guelph     0.1529954        0.42857143
8         Chicago C     0.0179671        0.83333333


Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Things I would appreciate responses on:

1) Has anyone done work along this line of thinking?

2) Given the limitations in data available, I calculated each teams' "First Buzzes" as Powers+Negs+max(0,Tens-Opp.Negs). I'm aware that this ignores certain situations. Would anyone care to comment on what they believe the effect would be, and is there a better way to estimate those values?

3) Does anything look particularly out of the ordinary?
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
Sam
Rikku
Posts: 338
Joined: Sat Nov 07, 2009 2:35 am

Re: A Different Way to Quantify Buzzing

Post by Sam »

This looks potentially interesting, but what new information does it provide, exactly? Based on the numbers I would guess UVA has a better chance of beating Chicago C than vice versa, but it seems PPG would show the same thing.
Sam Bailey
Minnesota '21
Chicago '13
bradleykirksey
Wakka
Posts: 187
Joined: Sat Nov 12, 2011 5:09 pm

Re: A Different Way to Quantify Buzzing

Post by bradleykirksey »

This is a total shot in the dark, but here's my guess.

With this information, an team of relatively unknown ability (UCF for example) would know that they, if they want any chance of winning, really need to get aggressive on the buzzers to beat UVA, but against Chicago C, we could probably sit back and let them neg. Why it provides better information than PPG is that we could sit back and play more conservatively against Brown, which negs a third of the time they buzz in first, than MIT which negs 12% of the time, even though Brown finished higher with presumably more PPG.

Of course, I stopped at college algebra. If some of you that can count beyond 11 want to correct me, I'll gladly recant.
Bradley Kirksey
Mayor of quiz bowl at the University of Central Florida (2010-2015)
The club at Reformed Theology Seminary Orlando (2017 - 2021)
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Sam, you bring up a good point. Is this measure at all informative by itself? I honestly hadn't considered that too closely. My intial motivation was the desire to run more accurate simulations of 2013 ICT. I realized that before that task could be undertaken, I would have to create a set of statistics to quantify buzzing. When I came up with some measure, I excitedly and impulsively posted it.

So, is the measure useful? I'd actually like the community's help with this. I am a fairly bad quizbowl player, and for this reason I have no frame of reference with which to judge whether this measure is helpful. Translated using the log5 formula, the numbers would indicate that in a game between last year's incarnations of UVA A and Yale, Virginia would be expected to buzz first on 62% of the tossups. Does that seem to jive with what you have seen? In the next sections of the post, I go a bit more in depth into what I created, and then I fleshed out a post spit-balling thoughts on how it could be useful. Most of the remarks are off-the-cuff attempts to support the measure. However, even if this measure fails to be useful on its own, I think it still works as a stepping stone for simulation.

The Probability of First Buzz was calculated using an optimization function. The objective function that was optimized was -Sum(log(P[x=FB_ki, n=FB_ki+FB_kj, p=log5(p_i,p_j)])). In words, the negative sum of the natural log of the binomial pmf for the calculated total and team i "first buzzes" during game k, with probability of success given as p=log5(p_i,p_j) where p_i and p_j are the respective probability that Team i and Team j would buzz first against an "average" team, and log5 is a function that outputs (p_i*(1-p_j))/((p_i*(1-p_j))+(p_j*(1-p_i)). The 32 p_i are the parameters over which the function was optimized. I did transform the parameters so that they optimized over the entire real line, and then could be transformed meaningfully to a (0,1) interval.

I think that this measure tells us something different than points per game, although it can be easily translated to that. It is instead a measure of aggression. This informs our understanding of what "average team" means - it isn't the team with an average amount of knowledge, but instead an average amount of aggression.

While I don't know much about different styles of play among great players, one characteristic that I have heard that Chris Ray is very aggressive. During last year's ICT, despite having only 1.5 PPB less and 2 fewer powers per game than Yale, Maryland had 80 fewer PP20TUH. My measure tells us that Maryland buzzes in first on a question just as often as Yale does. In a 20 tossup game against an average ICT team, both teams would expect to buzz in first on 16 of the 20 tossups. The difference between them is that Yale is only wrong about 9% of the time, while Maryland negs 25% of the time. Yale also has a higher power rate (displayed below). Maryland would be expected to go 4/8/4 on their 16 tossups, while Yale would be expected to go 6/8/2. Using their respective PPB, the expected points per twenty tossups against an average team would be 440 for Yale, and 340 for Maryland - indicating that the simpler PP20TUH from ICT slightly understates how much better Yale is. It may be beneficial for Chris Ray to be a tad less aggressive.

On my end of the tournament, it yields the result that the bottom teams tend to give the game away through negging. While A&M would be expected to buzz first on 4-5 tossups per 20 TUH against the average team, we also expect to neg 2-3 of those. It is likely that when we end up winning at ICT, it is either because the other team negs themself out of the game (like Illinois B did), or because they simply don't know enough to pick up what we've negged (like Guelph and Chicago C). Trevor Davis, on the other hand, greatly over-achieved through sticking with what he knows, and not negging. By points per bonus, he ranked 26th; by aggression, he ranked 28th. He tied for 19th, however, because he was consistent, and negged very little. Looking more closely at these stats, however, it is clear that dead tossups play a much larger role at this end than at the top end.

This exercise is a bit discouraging, since I reluctantly would conclude that this measure is not very useful by itself. Even combined with power rate and neg rate, it doesn't tell us an extraordinary amount of information. I think it is too early, however, to completely dismiss it. It is definitely an improvement for the purpose of running simulations, should that ever happens.

Code: Select all

               Team P[Buzz First] P[Neg|Buzz First] P[Correct First Buzz] P[Power|First Buzz]
32             Yale     0.8189777        0.09142857           0.744099716           0.3771429
29       Virginia A     0.8817422        0.15639810           0.743839357           0.4170616
16       Michigan A     0.7965477        0.15469613           0.673324855           0.3480663
27     UC San Diego     0.7891636        0.18644068           0.642031367           0.2316384
13       Illinois A     0.7932358        0.20858896           0.627775571           0.2944785
15         Maryland     0.8041403        0.24705882           0.605470345           0.2705882
12          Harvard     0.7538409        0.20408163           0.599995849           0.3741497
6         Chicago A     0.7906004        0.25170068           0.591605741           0.3537415
23             Penn     0.7764012        0.25465839           0.578684110           0.3478261
19              MIT     0.6186067        0.12380952           0.542017278           0.4380952
18        Minnesota     0.6534550        0.20869565           0.517081802           0.3826087
1           Alabama     0.6359473        0.20689655           0.504371993           0.2241379
3             Brown     0.7054706        0.32061069           0.479289159           0.2977099
9          Columbia     0.5989403        0.20149254           0.478258272           0.1567164
10     Georgia Tech     0.5770207        0.26315789           0.425173136           0.2807018
30       Virginia B     0.5363577        0.27966102           0.386359354           0.2118644
31            WUSTL     0.5628401        0.31868132           0.383473482           0.2197802
21       Ohio St. A     0.5262415        0.28571429           0.375886798           0.2307692
4  Carleton College     0.4967302        0.24509804           0.374982621           0.2843137
28              VCU     0.3666246        0.18556701           0.298591151           0.1649485
7         Chicago B     0.4079407        0.27433628           0.296027760           0.1946903
24              RPI     0.3361205        0.19753086           0.269726295           0.1358025
17       Michigan B     0.3845222        0.35064935           0.249689747           0.1948052
20     Northwestern     0.3745670        0.36363636           0.238360850           0.2500000
5   Carnegie Mellon     0.3114949        0.27777778           0.224968555           0.1944444
14       Illinois B     0.3152851        0.35802469           0.202405259           0.1481481
26          Toronto     0.3497183        0.43023256           0.199258095           0.2325581
22       Ohio St. B     0.3350613        0.40540541           0.199225666           0.2432432
25        Texas A&M     0.2364728        0.50769231           0.116417388           0.2461538
11           Guelph     0.1529954        0.42857143           0.087425962           0.3095238
2       Arizona St.     0.1732829        0.51428571           0.084165957           0.3428571
8         Chicago C     0.0179671        0.83333333           0.002994517           0.1666667
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
Sam
Rikku
Posts: 338
Joined: Sat Nov 07, 2009 2:35 am

Re: A Different Way to Quantify Buzzing

Post by Sam »

evilmonkey wrote: This exercise is a bit discouraging, since I reluctantly would conclude that this measure is not very useful by itself. Even combined with power rate and neg rate, it doesn't tell us an extraordinary amount of information. I think it is too early, however, to completely dismiss it. It is definitely an improvement for the purpose of running simulations, should that ever happens.
I certainly didn't mean to dismiss it, especially if it is just a part of a larger simulation scheme. I just wasn't sure what to do with the data as initially presented. "A measure of aggression" seems a fair way to describe it, though I suspect there would still be problems differentiating between "aggression" and "knowing the answer first."
Sam Bailey
Minnesota '21
Chicago '13
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Did anyone else keep track of their scores by rounds at the 2013 ICT? Would any of you like to count up and PM me your bonus distribution (0s, 10s, 20s, 30s)? It took me about 5-10 minutes to do this for A&M (23-48-25-6). As mentioned previously, I'm interested in bonus distribution as a function of points per bonus.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
Frater Taciturnus
Auron
Posts: 2463
Joined: Mon Dec 12, 2005 1:26 pm
Location: Richmond, VA

Re: A Different Way to Quantify Buzzing

Post by Frater Taciturnus »

evilmonkey wrote:Did anyone else keep track of their scores by rounds at the 2013 ICT? Would any of you like to count up and PM me your bonus distribution (0s, 10s, 20s, 30s)? It took me about 5-10 minutes to do this for A&M (23-48-25-6). As mentioned previously, I'm interested in bonus distribution as a function of points per bonus.
VCU went 17,53,28,6
Janet Berry
[email protected]
she/they
--------------
J. Sargeant Reynolds CC 2008, 2009, 2014
Virginia Commonwealth 2010, 2011, 2012, 2013,
Douglas Freeman 2005, 2006, 2007
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Howdy!

So, in the absence of bonus distribution data to play with (thank you to VCU and Delaware for sharing yours; anyone else would be
cool as well), I set about to try and see whether I could make something out of the numbers that I had created. After banging my
head against the wall some more considering the problem of trying to estimate the proportion of questions that a team knows by the
end, I decided to take a stab at my eventual end product: expected points per game against an average team. I calculated this
using the 2013 Division I ICT information. I calculated it with:

ExpPP20TUH = 20*(P[Correct First Buzz]*(10+PPB)+5*P[Power|FirstBuzz}*P[BuzzFirst]-P[Neg|FirstBuzz}*P[BuzzFirst]*5).

For reference, the mean ExpPP20TUH is 204.06. I am uncertain as to whether this number can be interpreted by saying "UVA would
beat Penn by 18 points on average" - my gut instinct is to say no. The comparisons might be accurate for MIT and Rice,
who are right below the mean.

It should also be noted that, since I have not yet found a way to account for dead tossups, these numbers are a bit high; moreover,
the degree of overstatement is highly correlated with expectation: in general, better teams have a lower number of dead tossups
that have not been accounted for.


What do you guys think? I think it has to be an awful measure, since it claims that Texas A&M isn't really a third bracket team. (That statement is tongue-in-cheek, obviously) EDIT: Well, it no longer claims that. Shoot. My non-joke is really not funny in the slightest now.

Code: Select all

              Team Exp PP20TUH P[First Buzz] P[Neg|FB] P[Correct FB] P[Power|FB]   PPB
32            Yale   466.56315     0.8612615 0.1061453    0.76984267   0.2960894 19.24
29      Virginia A   448.42339     0.8239365 0.1213873    0.72392109   0.4046243 19.36
23            Penn   430.71585     0.8409878 0.1767956    0.69230487   0.3591160 20.00
10        Illinois   378.99053     0.7915298 0.1666667    0.65960813   0.3397436 17.69
14      Michigan A   376.56188     0.7644029 0.1851852    0.62284684   0.4148148 18.82
17     Minnesota A   329.96662     0.7538020 0.1782946    0.61940322   0.2170543 16.40
20             NYU   326.31126     0.7519802 0.1400000    0.64670294   0.2000000 14.88
11        Maryland   293.19202     0.7616013 0.2587413    0.56454365   0.2657343 15.92
9          Harvard   292.85322     0.7637375 0.2711864    0.55662227   0.2881356 16.19
13            MCTC   265.43743     0.5958966 0.1491228    0.50703481   0.2368421 15.66
4        Chicago A   243.43050     0.6375332 0.2826087    0.45736076   0.3188406 16.36
1          Alabama   241.81836     0.6325986 0.2389381    0.48144669   0.2212389 15.23
21      Ohio State   232.58818     0.5930035 0.2336449    0.45445131   0.2336449 15.59
6         Columbia   231.38623     0.6161853 0.2368421    0.47024669   0.1929825 14.89
5        Chicago B   214.98576     0.5423773 0.2325581    0.41624307   0.3023256 15.37
19             MIT   200.26021     0.5489205 0.2446809    0.41461015   0.2553191 14.08
24            Rice   200.21999     0.5492562 0.2327586    0.42141211   0.1293103 14.43
16    Michigan St.   186.63238     0.4609169 0.2253521    0.35704827   0.3098592 15.59
27     UC Berkeley   184.55544     0.5871718 0.3515625    0.38074421   0.2265625 15.20
30      Virginia B   166.09881     0.4976684 0.3008850    0.34792747   0.3539823 13.49
22          Ottawa   149.17790     0.4802894 0.2909091    0.34056887   0.2272727 12.35
28             VCU   115.76472     0.3577402 0.2352941    0.27356601   0.0882353 12.12
3  Central Florida   108.23132     0.3545036 0.2597403    0.26242471   0.1948052 11.06
25       Texas A&M    83.68603     0.3668786 0.4305556    0.20891700   0.2777778 11.37
31           WUSTL    82.97499     0.3245447 0.3536585    0.20976669   0.2073171 10.91
15      Michigan B    80.73792     0.3245334 0.3815789    0.20069826   0.1842105 11.71
7          Cornell    64.97371     0.3138682 0.4603175    0.16938920   0.1428571 12.12
18     Minnesota B    64.39908     0.2882893 0.3835616    0.17771256   0.1232877 10.23
8         Delaware    33.25007     0.1626171 0.3947368    0.09842617   0.1842105  8.63
26    Truman State    29.48192     0.1050581 0.2500000    0.07879354   0.2812500  8.50
12        Mcmaster    25.10165     0.1186248 0.3783784    0.07373975   0.2702703  7.89
2          Buffalo    21.58816     0.1236810 0.4411765    0.06911588   0.3235294  6.67
Last edited by evilmonkey on Mon Apr 22, 2013 5:29 pm, edited 2 times in total.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
Fond du lac operon
Wakka
Posts: 228
Joined: Tue Jan 31, 2012 8:02 pm

Re: A Different Way to Quantify Buzzing

Post by Fond du lac operon »

Michigan A is shockingly low, probably because their PPB is listed as under 12. (I'm assuming this is an input error, though). Other than that, it seems to pretty closely track actual performance, and seems solid.
Harrison Brown
Centennial '08, Alabama '13

"No idea what [he's] talking about."
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Fond du lac operon wrote:Michigan A is shockingly low, probably because their PPB is listed as under 12. (I'm assuming this is an input error, though).
Oops, yea, that's wrong. Thanks for the catch. It is fixed above. Funny how alphabetizing fails to simplify data entry when you type Minneapolis CTC in one place, and MCTC in another.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Different Way to Quantify Buzzing

Post by theMoMA »

This seems heavily schedule-dependent. For example, the top teams in the second bracket will have half a tournament in which they're much more likely to get first correct buzzes, because their schedule will consist of teams less likely to buzz before them. This means that comparing those teams to the lower teams in the top bracket probably isn't entirely fair. (For example, the comparison between NYU and Ohio State is probably affected by the fact that Ohio State played a lot more teams capable of beating them to being the first team to buzz, thus driving down their first-buzz percentage.)
Andrew Hart
Minnesota alum
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

theMoMA wrote:This seems heavily schedule-dependent. For example, the top teams in the second bracket will have half a tournament in which they're much more likely to get first correct buzzes, because their schedule will consist of teams less likely to buzz before them. This means that comparing those teams to the lower teams in the top bracket probably isn't entirely fair. (For example, the comparison between NYU and Ohio State is probably affected by the fact that Ohio State played a lot more teams capable of beating them to being the first team to buzz, thus driving down their first-buzz percentage.)
Theoretically, the teams ought to be interconnected enough that the optimization procedure self-corrects for the schedule. I'm going to spend the rest of this post trying to come up with a rationalization of why NYU and Harvard are ranked above Ohio State.

We can't look directly at NYU vs. Ohio State, so we'll get their in two steps. First, we can look at Harvard vs. Ohio State. Despite Ohio State winning both games against Harvard, Harvard is listed 60 points higher. This is due, in part, to Harvard having a much higher first buzz score. Looking over their actual statistics, I have no problem with that. Over the course of their shared prelim bracket, Harvard in general had more buzzes than Ohio State. In fact, Harvard had more buzzes than Ohio St in both games they played, 13-10 and 18-10, even though Ohio State did not neg (meaning that some of OSU's buzzes are second buzzes after negs, while none of Harvard's are).

It is also worth noting that in both of those games, Harvard had an inordinate number of negs, and Ohio State outperformed their average PPB in both games. (Really, if you haven't already, you should go look at how ridiculous the UG championship was. Despite 8 negs from Harvard, and Ohio State having their second best bonus conversion of the day, Ohio State only won by 40). In fact, I think you might conclude that the two teams are fairly similar strength, and had Harvard not played with such reckless abandon in those two games, they would have been the team in the top bracket or the UG champions.

We can also compare Harvard and NYU directly. Looking over the game-by-game stats from the second playoff bracket, it seems like NYU should have a somewhat smaller first buzz percentage, but their low neg rate indicates that Yogesh and Aaron just know when they ought to be buzzing. 5 of NYU's 7 games against second bracket teams were triple-digit wins; 3 of Harvard's were. So I conclude that NYU is a stronger team than Harvard, and therefore likely a stronger team than Ohio State.

I'm not sure I've managed to convince myself that Harvard ought to be higher than Ohio State. I mean, I see the argument now, but I'm also aware that people tend to accept those facts that support their beliefs, and reject those oppose those beliefs. If Harvard is in fact better than Ohio State, then this statistic is vindicated, so I'm going to tend to observe those statistics which support my case. The only solution is to have NAQT just write another full ICT (for free), and have Harvard and Ohio State play all 15 packets against each other, so we have a more adequate sample size. :grin:

There can also be a few things awry with the numbers: It could be that Ohio State shouldn't be expected to have 3 fewer buzzes per game, and the optimization procedure is being influenced by the differing schedules, when teams are in different playoff brackets. There could also be issues with the way I am defining Neg Rate or Power Rate, leading teams to get too many or too few tossups. Or the issue may also arise with the Exp PP20TUH's failure to account for dead tossups, if for example Ohio State happens to have a wider breadth of knowledge than Harvard.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

I also think that there are some places where you can point to the algorithm accounting for schedule differences. If you look at Cal and Ohio State, over the course of the day, Cal had 74 Powers or Negs, and 172 total buzzes in 13 games; Ohio State had 50 Powers or Negs, and 170 total buzzes, in 14 games. Yet the algorithm set Ohio State as the more aggressive team, in part because it is accounting for the schedule. In the six prelim games not against each other, Cal had 33 Powers or Negs and 45 other tossups, while Ohio State had 26 Powers or Negs, and 62 other tossups. I would guess that these teams are fairly similar in aggression, and are distinguished by the fact that Ohio State actually knows more stuff.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Different Way to Quantify Buzzing

Post by theMoMA »

I'm not sure I understand what you mean by optimization procedure. (I'm not completely familiar with the log5 formula, so perhaps there's some built-in correction that I don't quite understand.)

I also wonder if this stat punishes negs enough. It seems like a neg is the ultimate bad event in quizbowl because it basically guarantees the other team 15 points (your -5 and their 10) plus the other team's bonus conversion, while also locking out all other players on your team from getting a shot at answering the tossup later. Imagine you have two teams that both buzz in first (correctly) 60% of the time; a team that buzzes first 85% but negs 25% will probably lose way more games than a team that buzzes first 60% and negs 0%. (At least it seems that way to me intuitively; I could be wrong though.)
Andrew Hart
Minnesota alum
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

So, I'm not sure how much statistical theory you know, so bear with me if you know any of this.

The log5 formula comes from baseball. Let p1 be the probability that a team wins a game, where you could be playing against any team in the league. Let p2 be that probability for another team. The log5 formula simply gives a method for calculating the probability of team 1 winning a game against team two, given those two probabilities.

In this case, the "game" of interest is a single question. I'm using a simplified model of a game, making the assumption that every tossup between a given set of two teams has the same chance of team 1 buzzing in first. That probability is equal to log5(p1,p2). In this case, p1 is the probability that team 1 buzzes first on the tossup, if you don't know anything about the question category or the opposing team. Every game at ICT can be seen as the result of n Bernoulli trials, where n is the number of tossups heard, and the expected number of first buzzes for team 1 in the whole game follows a binomial distribution with parameters n=n and p=log5(p1,p2).

Now, obviously, all of our p_i's are unknown, for i=1,...,32. However, given what we already know, we can use maximum likelihood estimation to figure out values for our p_i's (and, given those values, then be able to determine what the probability of getting a tossup given two specific teams is). Normally, we think of a probability function as the chance of a given set of outcomes happening, given specified parameters. Since each game is independent of every other game (more or less), if we knew our 32 p_i's, then the probability that our specific numbers of first buzzes occurred would be the product of the binomial(n_{ij},log5(p_i,p_j)) probabilities for every (i,j) such that a game between those teams happened.

The likelihood function, then, turns that around. It states that, given our specific outcomes, and these unknown parameter values, that we can calculate the likelihood that it arose from those parameter values. Using an optimization algorithm, we can then let the computer figure what parameter values maximize this likelihood function; i.e., for what parameter values is the probability of getting these specific values the highest. This is the most common way of estimating an unknown parameter, and under certain conditions, has very nice properties.

While I haven't sat down and proved that this particular likelihood function fulfills those regularity conditions, I have a feeling that it does. In particular, I do know that the parameter p for a Bernoulli trial has an unbiased Maximum Likelihood Estimate (MLE), so I think we're in the clear. This means that asymptotically, the distribution of sample parameter estimates should follow a normal distribution, with mean equal to the "true" parameter value, and with some standard deviation that could potentially be calculated.

Moreover, as long as every team is well-connected (as the teams at ICT are), I believe that this procedure should be robust against varying schedule strengths. I suppose I can put some time into researching whether that's actually true (though it may be simpler to just simulate a sample of infinite ICTs, under a set of known parameters and varying schedule strength conditions, take those simulations, calculate the parameter estimates using my procedures, and note the error).

Also, delving into your proposed scenario provided a somewhat surprising result. For an 85% first buzzer vs. a 60% first buzzer, the log5 formula indicates that first team will buzz first on approximately 79% of tossups (So, like, Penn with a single additional neg per game, versus MCTC with fewer negs). It that scenario, Penn would buzz first on 16 TU, negging 4; Rob Carson, besides swooping in to pick up the remains of Eric's imaginary failure, would also have buzzed first on 4 tossups. That still leaves Penn with an expected 12 tossups to Fake MCTC's 8, so Penn would win many of these games (71%, according to a simulation study I ran).

I anticipated that people might argue against the 79% number in the previous paragraph, so I just compared the two against average team. Team 1 buzzes first on 17 tossups, negs 4; so they still get 13, and lose 20 points to negs, while the opponent (assuming no negs) gets 7. Team 2 buzzes first on 12 tossups, negs 0; so they get 12, and the opponent gets 8. While Team 1 will still win more often, the average margin is not too far off from that of Team 2. It is possible that there still may be a similar game in win percentage; I was too lazy to code up this particular simulation.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

A full justification of the log5 formula. Section 1 (A and B) are the relevant sections.

Also, when you were mentioning the "built-in schedule correction" - my entire reason for using the more complex likelihood-based procedure was because, when coming up with a measure, my first instinct was to use FirstBuzzRate=FirstBuzzes/TotalTUH, but I realized that that statistic was, in fact, very schedule dependent. I do think that this should auto-correct for schedule.

Andrew, thank you for posting your comments. My whole impetus for posting was to find things that people questioned (and be able to come up with a valid justification), results that didn't seem accurate (and then determine why it presented those results), and determine the usefulness of the statistic. If I've come across as too defensive, I apologize.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
bradleykirksey
Wakka
Posts: 187
Joined: Sat Nov 12, 2011 5:09 pm

Re: A Different Way to Quantify Buzzing

Post by bradleykirksey »

evilmonkey wrote:Howdy!


For reference, the mean ExpPP20TUH is 204.06. I am uncertain as to whether this number can be interpreted by saying "UVA would
beat Penn by 18 points on average" - my gut instinct is to say no. The comparisons might be accurate for MIT and Rice,
who are right below the mean.

[/code]
First off, I'm a big fan of this metric. It's interesting. It's particularly great because it has UCF in the third bracket, and not failing at life. The rankings, at least the ones for the teams I'm familiar with, pass the eye test. Even though we beat Ottawa and they're ahead of us, they had an abysmal PPB that game and we won by 10. They got 11 tossups to our 7. We lost to Minnesota B by 5 even though they're below us, but I had my bell rung at lunch and I single handedly cost us that game. Your metric predicted the rest of our games right, meaning that at least for the bottom bracket, it might actually be a better metric of ability than record.

I can say pretty well, I think, that that wouldn't mean UVA would beat Penn by 18 on average. In a hypothetical tournament where UVA and Penn got to play high school teams on high school packets, and they averaged 850 PP20H and 830 PP20TH, then obviously UVA couldn't beat Penn 850-830 since there aren't enough points in the game, and that last 20 PP20H probably counts for a bigger jump in skill than say 80 to 100 PP20H since it leaves so few answers un-missed.

I do think you're right when you say that the closer you are to the mean that the closer that it will be. I think what you can say is that if the mean EXPPP20H, is 250, and UCF's EXPPP20H is 108, then UCF can expect to lose the game against that average team by 142. Of course, if Jake or one of those math people want to say I'm wrong, then I'm OK with that.
Bradley Kirksey
Mayor of quiz bowl at the University of Central Florida (2010-2015)
The club at Reformed Theology Seminary Orlando (2017 - 2021)
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Different Way to Quantify Buzzing

Post by theMoMA »

No problem. I appreciate your explanation; I don't have much statistical training (I did take AP stats way back in the day, and did a lot of baseball stat research when I was in high school), so it's nice to know a little more about what I don't understand very well.
Andrew Hart
Minnesota alum
User avatar
The Ununtiable Twine
Auron
Posts: 1058
Joined: Fri Feb 02, 2007 11:09 pm
Location: Lafayette, LA

Re: A Different Way to Quantify Buzzing

Post by The Ununtiable Twine »

evilmonkey wrote:I think it has to be an awful measure.
Your measure has us 12th, which is, at the very worst, a start. If your measure had us anything but 12th, I would have suggested that you throw it away immediately.
Jake Sundberg
Louisiana, Alabama
retired
User avatar
Fond du lac operon
Wakka
Posts: 228
Joined: Tue Jan 31, 2012 8:02 pm

Re: A Different Way to Quantify Buzzing

Post by Fond du lac operon »

So, is this measure reasonably robust when you consider different populations? I'm not sure how to phrase exactly what I mean, so here's an example:

Let's suppose that, somehow, Penn and UVA were invited to play the CCCT field on DI ICT questions. Since Matt and Eric are two of the three best players in the country, Penn and Virginia would demolish everyone else. (Sorry, Paul Kelson). In addition, the P(buzz first) numbers for both teams would dramatically increase, to maybe like 0.95 each. (That part's not robust, but whatever.) But since Penn and UVA are the same teams playing on the same questions as at DI ICT, Penn should have the same probability of winning the match as in the actual tournament. Would the log5 formula give the same estimate? (Taking into consideration the match(es) Penn and UVA play against each other).
Harrison Brown
Centennial '08, Alabama '13

"No idea what [he's] talking about."
evilmonkey
Yuna
Posts: 964
Joined: Mon Sep 25, 2006 11:23 am
Location: Durham, NC

Re: A Different Way to Quantify Buzzing

Post by evilmonkey »

Fond du lac operon wrote:So, is this measure reasonably robust when you consider different populations? I'm not sure how to phrase exactly what I mean, so here's an example:

Let's suppose that, somehow, Penn and UVA were invited to play the CCCT field on DI ICT questions. Since Matt and Eric are two of the three best players in the country, Penn and Virginia would demolish everyone else. (Sorry, Paul Kelson). In addition, the P(buzz first) numbers for both teams would dramatically increase, to maybe like 0.95 each. (That part's not robust, but whatever.) But since Penn and UVA are the same teams playing on the same questions as at DI ICT, Penn should have the same probability of winning the match as in the actual tournament. Would the log5 formula give the same estimate? (Taking into consideration the match(es) Penn and UVA play against each other).

That's actually an excellent question, and one that I hadn't considered. I would have to run some simulations of that scenario to be certain, but I would guess that in that situation (where the relative beatings they gave to the other teams would give little information, other than possibly the Chipola/Valencia matches), the relationship between those two would end up largely decided by the game(s) between them. Since game results have such high variability, that's an issue. It'd still be an unbiased estimate, just one with high variability.

The solution to that, of course, is to extend the measure to incorporate results from every tournament. Of course, other things that this measure is not currently robust against: line-up changes and distributional changes. So I'm trying to figure out a way to account for the former. The latter is solvable pretty much only if we know P[FB|Category], although I suppose, if we have distributional information for each tournament, we could actually use that fact and the change in P[FB] to estimate P[FB|Category] for certain categories.

I'm not a good enough player to be able to speculate whether a shift in difficulty would also result in different probabilities of buzzing in first. I'd like some of the better, more experienced players to consider whether that is something that needs to be accounted for.
Bryce Durgin
Culver Academies '07
University of Notre Dame '11
Texas A&M '15
fett0001
Tidus
Posts: 707
Joined: Wed Feb 27, 2008 11:50 am

Re: A Different Way to Quantify Buzzing

Post by fett0001 »

I'd speculate that questions that are too easy make for more buzzer races.
Mike Hundley
PACE Member
Virginia Tech
Locked