2018 ACF Regs packets, recordings, and detailed stats survey

Old college threads.
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: Packets, recordings, and detailed stats warm-up survey

Post by Fado Alexandrino »

Something on my to do list is to look at reaction time. In the aforementioned tossup on Islam where a third of the buzzes occurred after a particular clue, those buzzes were all within three words past the offending clue. Note that the offending clue ended a sentence.

Personally I tend to reaction buzz at clues I’ve heard before but wait on it for earlier clues that I’m trying to connect the dots in my brain, especially two-step processes like 2017 Nobel -> CryoEM -> Getting frozen
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
User avatar
Auroni
Auron
Posts: 3145
Joined: Thu Nov 15, 2007 6:23 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by Auroni »

Victor Prieto wrote:I think that eliminating the first line or two in tossups could automatically correct this curve without sacrificing ability to differentiate between lower-tier teams. I urge writers to consider shortening their questions in the future.
I'd just like to point out that shortening questions to 6 lines makes it a lot harder to do the following (significantly reducing player empathy):
Aaron Manby (ironmaster) wrote: Personally I tend to reaction buzz at clues I’ve heard before but wait on it for earlier clues that I’m trying to connect the dots in my brain, especially two-step processes like 2017 Nobel -> CryoEM -> Getting frozen
Auroni Gupta (she/her)
User avatar
ryanrosenberg
Auron
Posts: 1890
Joined: Thu May 05, 2011 5:48 pm
Location: Palo Alto, California

Re: Packets, recordings, and detailed stats warm-up survey

Post by ryanrosenberg »

I made the above graphs for each major category and put them all in an album here. They're large images, so you might need to right-click and open the image in a new tab to really get up close.
Ryan Rosenberg
North Carolina '16
ACF
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: Packets, recordings, and detailed stats warm-up survey

Post by Fado Alexandrino »

Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
User avatar
Good Goblin Housekeeping
Auron
Posts: 1100
Joined: Sun May 23, 2010 10:03 am

Re: Packets, recordings, and detailed stats warm-up survey

Post by Good Goblin Housekeeping »

Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
:w-hat: happened to matt lehmann
Andrew Wang
Illinois 2016
User avatar
Mahavishnu
Lulu
Posts: 59
Joined: Sun Jul 17, 2016 11:00 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by Mahavishnu »

Borel hierarchy wrote:
Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
:w-hat: happened to matt lehmann
And on that note, any teams from the UCF site?
Tracy Mirkin
South Fork '17
Florida '22
User avatar
Corry
Rikku
Posts: 331
Joined: Fri Feb 10, 2012 11:54 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by Corry »

I just want to say that from a writers’ perspective, these are like, god-tier stats. I’ve always personally preferred to write for NAQT because they were traditionally the only ones to provide after-the-fact conversion data - and therefore, the only ones who could offer empirical data to address familiarity bias and difficulty confirmation bias among writers. But this system really takes things to the next level. Thank you Ophir!

As a total aside:
Periplus of the Erythraean Sea wrote:A 15-20% power rate is, to my understanding, what NAQT aims for in its tournaments - not sure what median SCT power rate is, but the median ICT and HSNCT power rates usually fall within that range.
This isn’t exactly accurate. While I’ve periodically heard of NAQT “theoretically” aiming for a 15-20% power rate, in practice, the median HSNCT and SCT power rates tend to cluster around 20-25%. So purely from a powers basis, this set would probably count as marginally harder than SCT. (ICT is more along the lines of 15-20% power rate, although that also fluctuates.)
Corry Wang
Arcadia High School 2013
Amherst College 2017
NAQT Writer and Subject Editor
User avatar
naan/steak-holding toll
Auron
Posts: 2515
Joined: Mon Feb 28, 2011 11:53 pm
Location: New York, NY

Re: Packets, recordings, and detailed stats warm-up survey

Post by naan/steak-holding toll »

Corry wrote: This isn’t exactly accurate. While I’ve periodically heard of NAQT “theoretically” aiming for a 15-20% power rate, in practice, the median HSNCT and SCT power rates tend to cluster around 20-25%. So purely from a powers basis, this set would probably count as marginally harder than SCT. (ICT is more along the lines of 15-20% power rate, although that also fluctuates.)
I guess so. However, this year's SCT looks like it was pretty hard to power from the numbers coming in.
Will Alston
Dartmouth College '16
Columbia Business School '21
User avatar
thebluehawk1
Lulu
Posts: 29
Joined: Mon Nov 16, 2015 2:51 am
Location: College Park

Re: Packets, recordings, and detailed stats warm-up survey

Post by thebluehawk1 »

I am interested in looking at the different ways in which writers ask about lit questions, to see the differences in how the are converted. For example a common format for asking about lit would be a "this author question", which typically has less deep clues about individual works, and more basic clues about obscure works. I think these questions will on average be converted earlier than the next type of question, because it is easier to read a brief wikipedia summary of several works than lock down all the clues in a full work. The next type would be a "this work" tossup. It is harder to lock down deep clues for a work you haven't read, and there are a lot of works that are tossup-able at regionals level. Therefore I think overall these questions would be converted later by the field, but because you can generally get a good buzz on a work you have read, and you are more likely to read a work that is tossup-able (because it is more famous) these questions will have a higher percentage of first buzzes. I don't really have much of a prediction for how common links will play, but I would be interested to look at that as well.
Justin Hawkins
John Carroll HS '15
University of Maryland '19
Indiana University '24 (at best)
User avatar
ThisIsMyUsername
Auron
Posts: 1005
Joined: Wed Jul 15, 2009 11:36 am
Location: New York, NY

Re: Packets, recordings, and detailed stats warm-up survey

Post by ThisIsMyUsername »

thebluehawk1 wrote:I am interested in looking at the different ways in which writers ask about lit questions, to see the differences in how the are converted. For example a common format for asking about lit would be a "this author question", which typically has less deep clues about individual works, and more basic clues about obscure works. I think these questions will on average be converted earlier than the next type of question, because it is easier to read a brief wikipedia summary of several works than lock down all the clues in a full work. The next type would be a "this work" tossup. It is harder to lock down deep clues for a work you haven't read, and there are a lot of works that are tossup-able at regionals level. Therefore I think overall these questions would be converted later by the field, but because you can generally get a good buzz on a work you have read, and you are more likely to read a work that is tossup-able (because it is more famous) these questions will have a higher percentage of first buzzes. I don't really have much of a prediction for how common links will play, but I would be interested to look at that as well.
Whether this is "typically" true or not really depends on the writer/editor. Some prefer to include a large proportion of author tossups that interleave clues from works that are themselves tossupable at the same difficulty level; and some prefer to write author tossups that clue from works that are not. Likewise, a tossup on a work could begin by using mainly secondary-source clues, which are sometimes drawn from the same Wikipedia/Google-type sources that you say mostly populate author tossups.

What would be more pertinent (but far more labor-intensive) would be to tag questions according to what type of early clue they use (rather than their answer-line type), and to see how that affects buzzing. I think you may be right that one or the other might have typically earlier buzzpoints. But I think, above all, one would also find that some individual players are better at one type and some at the other (depending on their balance between reading and studying).
John Lawrence
Yale University '12
King's College London '13
University of Chicago '20

“I am not absentminded. It is the presence of mind that makes me unaware of everything else.” - G.K. Chesterton
khannate
Lulu
Posts: 26
Joined: Sat Oct 10, 2015 10:16 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by khannate »

I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
Attachments
team_comparison.pdf
(35.59 KiB) Downloaded 329 times
Samir Khan
UChicago '19
User avatar
wcheng
Wakka
Posts: 165
Joined: Mon May 26, 2014 12:02 pm
Location: Palo Alto, CA

Re: Packets, recordings, and detailed stats warm-up survey

Post by wcheng »

khannate wrote:I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
I think it'd be really interesting to see how the top teams by A-Value stack up against each other by this metric!
Weijia Cheng (they/them)
Centennial '15
BS @ Maryland '18 (Fall)
MDiv @ BU '27
A Dim-Witted Saboteur
Yuna
Posts: 973
Joined: Tue Aug 02, 2016 12:31 pm
Location: Indiana

Re: Packets, recordings, and detailed stats warm-up survey

Post by A Dim-Witted Saboteur »

khannate wrote:I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
Another interesting thing to look at would be what percentage of games that actually took place at regs were upsets (actual winner lost the majority of simulated games) or strong upsets (actual winner lost more than 65? 70? 75? % of simiulated games).
Jakob M. (they/them)
Michigan State '21, Indiana '2?
"No one has ever organized a greater effort to get people interested in pretending to play quiz bowl"
-Ankit Aggarwal
User avatar
Maxwell Sniffingwell
Auron
Posts: 2164
Joined: Sun Feb 12, 2006 3:22 pm
Location: Des Moines, IA

Re: Packets, recordings, and detailed stats warm-up survey

Post by Maxwell Sniffingwell »

Stupid question, but are the stats actually posted anywhere?
Greg Peterson

Northwestern University '18
Lawrence University '11
Maine South HS '07

"a decent player" - Mike Cheyne
User avatar
naan/steak-holding toll
Auron
Posts: 2515
Joined: Mon Feb 28, 2011 11:53 pm
Location: New York, NY

Re: Packets, recordings, and detailed stats warm-up survey

Post by naan/steak-holding toll »

cornfused wrote:Stupid question, but are the stats actually posted anywhere?
We have not released them publicly yet - we were hoping to get numerous voices from within the community, at many levels of skill, to voice their opinions and ask questions. In the interest of fostering this sort of discussion, we've withheld stats for a bit so that people don't retreat to their own silos / groupchats and noodle around with things themselves, answer their own questions, and never talk about things in public channels.

Perhaps it's ironic that withholding of some information should be necessary to foster a public discourse, but I doubt this thread would have taken off if we simply immediately released the stats, since there was very little discussion about them for this year's EFT or This Tournament is a Crime.
Will Alston
Dartmouth College '16
Columbia Business School '21
User avatar
a bird
Wakka
Posts: 164
Joined: Sun Feb 26, 2012 3:50 pm
Location: College Park, MD

Re: Packets, recordings, and detailed stats warm-up survey

Post by a bird »

Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
These are a very nice way of looking at the buzzpoint data for a given team. Thanks for making them! This (along with the plots Ryan made) got me wondering how the difficulty of different categories could affect these curves (both in general and in the specific case of this tournament). For example, say (hypothetically) the lit had easier early clues than the history. If player A buzzed mostly on lit while their teammate player B buzzed mostly on history, the different shapes of the A and B buzzpoint curves would be influenced by the players' knowledge of their respective categories and the difficulty of those categories. A and B could have buzzed on clues of comparable difficulty, but ended up with different buzzpoint curves due primarily to the cluing in the set.

Do people think this had a substantial effect on the buzzpoint curves, or was it negligible? Of course most players buzz on multiple categories anyway, so the effect I'm describing might be hard to detect in most cases, even if it did happen. It might also be interesting to make buzzpoint distribution plots for specific categories of interest. It might also be interesting to make buzzpoint plot that somehow included average performance on a per subject, or even per question basis.

Has anyone analyzed which categories had the most early buzzes? I didn't find any category subjectively harder by a large amount, but I wonder what the data say about the difficulty of early clues in different categories.
Graham R.

Maryland
Tejas
Rikku
Posts: 258
Joined: Sun May 29, 2011 9:51 pm
Location: Chicago

Re: Packets, recordings, and detailed stats warm-up survey

Post by Tejas »

I've gone through the stats and tried testing some of the hypotheses people have posted in this thread that have not yet been answered.
CPiGuy wrote:"Bad" teams (let's say <12PPB) will have a higher percentage of 30'd bonuses than buzzes in the first two lines, and "good" teams (>18PPB) will have a higher percentage of buzzes in the first two lines than 30'd bonuses.
I wasn't sure exactly what was meant by buzzes in the first two lines, I took it to mean correct buzzes in the first two lines as a percentage of all of that team's buzzes. Based on that definition, I found that in total, "bad" teams 30'd 1.2% of bonuses and buzzed in the first 30% of a tossup(as a proxy for question length) 0.8% of the time. On the other hand, "good" teams 30'd 23.0% of bonuses and buzzed in the first 25% of the tossup 4.1% of the time.

You probably underestimated the ease with which top teams can 30 regular difficulty bonuses in particular, especially compared to getting early buzzes on a set with somewhat tougher leadins.
nsb2 wrote:I would predict that a large majority of music buzzes (maybe up to 90%) were after the third line or so, even more than for other categories.
You were correct that more than 90% of music buzzes came after the third line (I used 40% of the tossup as a proxy). However, this was not especially high compared to other categories.

Code: Select all

 Subcategory                                                   Buzzes After 3rd Line
                                         Philosophy            0.984
                                  Miscellaneous Lit            0.982
                                              Drama            0.972
                                            Biology            0.970
                                    Non-Epic Poetry            0.969
 British, Canadian, Australian, New Zealand History            0.963
                                          Chemistry            0.957
                                     Social Science            0.956
                                              Music            0.949
                                     Other Academic            0.949
                                      Other Science            0.946
                                          Other Art            0.939
                                      Other History            0.939
                                         US History            0.937
                                       Long Fiction            0.925
                                      Short Fiction            0.925
                                            Physics            0.920
                                 Painting/Sculpture            0.915
         Continental European History (post-600 CE)            0.910
   Continental or Near Eastern History (pre-600 CE)            0.905
                                          Geography            0.901
                                           Religion            0.890
                     Historiography and Archaeology            0.885
                                     Current Events            0.853
                                          Mythology            0.846
                                              Trash            0.766
cwest123 wrote:At the Southeast (Georgia Tech) site, the majority of music buzzes were VERY late in the question, generally close or in to the last line. Percentage wise, I'll guess that the average buzz was beyond the 75% point.
This is correct, of all the sites I found that the Georgia Tech one had the latest mean and median buzzpoints. This is only considering correct buzzes and ignoring vulches.

Code: Select all

  Site             Average Buzz %  Median Buzz %
      Minnesota    0.684           0.720
           UCSD    0.700           0.723
   Kansas State    0.713           0.664
 Oxford Brookes    0.715           0.754
     Penn State    0.723           0.723
           UIUC    0.765           0.813
    Connecticut    0.771           0.791
        Toronto    0.783           0.876
            UCF    0.843           0.887
       Virginia    0.863           0.940
           Rice    0.899           0.924
   Georgia Tech    0.936           0.996
Sima Guang Hater wrote: -Hard parts were significantly harder than middle parts, leading to a "wall effect" around 20 ppb
-Science bonuses were, on average, harder than literature bonuses
I don't know if there's a good way to measure a "wall effect", overall middle bonus parts were converted around 50% of the time and hard parts about 15%. This seems about what's expected, and any measure of "significantly harder" would probably require a comparison to other regular difficulty tournaments. You were correct that science bonuses were harder, although I was surprised to see that science easy parts were actually converted pretty well, while the middle and hart parts were converted the least of any category. I've attached a plot below showing conversion by category and difficulty.

Image
geremy wrote:I predict that out of all the science subcategories, physics has the latest average buzz point and biology the earliest, but the average PPB will be pretty close.
Physics did have not the latest buzz point, but it did have the lowest PPB.

Code: Select all

 Subcategory   PPB      Average Buzz %  Median Buzz %
       Biology 14.68    0.771           0.814
     Chemistry 14.59    0.755           0.770
 Other Science 14.28    0.727           0.746
       Physics 13.08    0.761           0.814
I'll get back to some of the other ones, PM me if you catch any errors I made here.

EDIT: fixed incorrect statement
Tejas Raje
Cornell '14
User avatar
ErikC
Rikku
Posts: 288
Joined: Sat Sep 24, 2016 12:44 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by ErikC »

It's interesting that the Toronto site's median and average buzz are quite different.

I'm not surprised science hard parts were the hardest. I think science easy parts being converted well is because it is often a concept almost everyone is familiar with (like gravity) even if they don't understand it fundamentally (like Rein Otsason).
Erik Christensen
University of Waterloo - School of Planning Class of '18
Defending VETO top scorer
Locked