2018 ACF Regs packets, recordings, and detailed stats survey

Elaborate on the merits of specific tournaments or have general theoretical discussion here.
User avatar
Benin Rebirth Party
Tidus
Posts: 733
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: Packets, recordings, and detailed stats warm-up survey

Post by Benin Rebirth Party » Fri Feb 02, 2018 3:40 am

Something on my to do list is to look at reaction time. In the aforementioned tossup on Islam where a third of the buzzes occurred after a particular clue, those buzzes were all within three words past the offending clue. Note that the offending clue ended a sentence.

Personally I tend to reaction buzz at clues I’ve heard before but wait on it for earlier clues that I’m trying to connect the dots in my brain, especially two-step processes like 2017 Nobel -> CryoEM -> Getting frozen
Joe Su
Lisgar 2012, McGill 2015, McGill 20--

FINALIST -- 2017 ILQBM MEME OF THE YEAR

User avatar
Auroni
Auron
Posts: 2957
Joined: Thu Nov 15, 2007 6:23 pm
Location: ann arbor

Re: Packets, recordings, and detailed stats warm-up survey

Post by Auroni » Fri Feb 02, 2018 12:45 pm

Victor Prieto wrote:I think that eliminating the first line or two in tossups could automatically correct this curve without sacrificing ability to differentiate between lower-tier teams. I urge writers to consider shortening their questions in the future.
I'd just like to point out that shortening questions to 6 lines makes it a lot harder to do the following (significantly reducing player empathy):
Aaron Manby (ironmaster) wrote: Personally I tend to reaction buzz at clues I’ve heard before but wait on it for earlier clues that I’m trying to connect the dots in my brain, especially two-step processes like 2017 Nobel -> CryoEM -> Getting frozen
Auroni Gupta
Michigan '17
"I love Milf Money" - Will Nediger

User avatar
Geriatric trauma
Auron
Posts: 1130
Joined: Thu May 05, 2011 5:48 pm
Location: Chicago, Illinois

Re: Packets, recordings, and detailed stats warm-up survey

Post by Geriatric trauma » Fri Feb 02, 2018 2:20 pm

I made the above graphs for each major category and put them all in an album here. They're large images, so you might need to right-click and open the image in a new tab to really get up close.
Ryan Rosenberg
North Carolina '16 | Ardsley '12
PACE | ACF

User avatar
Benin Rebirth Party
Tidus
Posts: 733
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: Packets, recordings, and detailed stats warm-up survey

Post by Benin Rebirth Party » Fri Feb 02, 2018 4:00 pm

Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
Joe Su
Lisgar 2012, McGill 2015, McGill 20--

FINALIST -- 2017 ILQBM MEME OF THE YEAR

User avatar
Banned Tiny Toon Adventures Episode
Tidus
Posts: 641
Joined: Sun May 23, 2010 10:03 am

Re: Packets, recordings, and detailed stats warm-up survey

Post by Banned Tiny Toon Adventures Episode » Fri Feb 02, 2018 10:05 pm

Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
:w-hat: happened to matt lehmann
Andrew Wang
Illinois 2016

User avatar
Mahavishnu
Lulu
Posts: 17
Joined: Sun Jul 17, 2016 11:00 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by Mahavishnu » Fri Feb 02, 2018 10:11 pm

Borel hierarchy wrote:
Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
:w-hat: happened to matt lehmann
And on that note, any teams from the UCF site?
Tracy Mirkin
South Fork '17
Florida '21
- 2018-19 Club President

User avatar
Corry
Rikku
Posts: 328
Joined: Fri Feb 10, 2012 11:54 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by Corry » Sat Feb 03, 2018 10:58 am

I just want to say that from a writers’ perspective, these are like, god-tier stats. I’ve always personally preferred to write for NAQT because they were traditionally the only ones to provide after-the-fact conversion data - and therefore, the only ones who could offer empirical data to address familiarity bias and difficulty confirmation bias among writers. But this system really takes things to the next level. Thank you Ophir!

As a total aside:
Periplus of the Erythraean Sea wrote:A 15-20% power rate is, to my understanding, what NAQT aims for in its tournaments - not sure what median SCT power rate is, but the median ICT and HSNCT power rates usually fall within that range.
This isn’t exactly accurate. While I’ve periodically heard of NAQT “theoretically” aiming for a 15-20% power rate, in practice, the median HSNCT and SCT power rates tend to cluster around 20-25%. So purely from a powers basis, this set would probably count as marginally harder than SCT. (ICT is more along the lines of 15-20% power rate, although that also fluctuates.)
Corry Wang
Arcadia High School 2013
Amherst College 2017
NAQT Writer and Subject Editor

User avatar
Periplus of the Erythraean Sea
Auron
Posts: 1665
Joined: Mon Feb 28, 2011 11:53 pm
Location: Falls Church, VA

Re: Packets, recordings, and detailed stats warm-up survey

Post by Periplus of the Erythraean Sea » Sat Feb 03, 2018 7:30 pm

Corry wrote: This isn’t exactly accurate. While I’ve periodically heard of NAQT “theoretically” aiming for a 15-20% power rate, in practice, the median HSNCT and SCT power rates tend to cluster around 20-25%. So purely from a powers basis, this set would probably count as marginally harder than SCT. (ICT is more along the lines of 15-20% power rate, although that also fluctuates.)
I guess so. However, this year's SCT looks like it was pretty hard to power from the numbers coming in.
Will Alston
Bethesda Chevy Chase HS '12, Dartmouth '16
"...should be treated as the non-stakeholding troll he is" -Matt Weiner

User avatar
thebluehawk1
Lulu
Posts: 17
Joined: Mon Nov 16, 2015 2:51 am
Location: College Park

Re: Packets, recordings, and detailed stats warm-up survey

Post by thebluehawk1 » Sun Feb 04, 2018 12:36 pm

I am interested in looking at the different ways in which writers ask about lit questions, to see the differences in how the are converted. For example a common format for asking about lit would be a "this author question", which typically has less deep clues about individual works, and more basic clues about obscure works. I think these questions will on average be converted earlier than the next type of question, because it is easier to read a brief wikipedia summary of several works than lock down all the clues in a full work. The next type would be a "this work" tossup. It is harder to lock down deep clues for a work you haven't read, and there are a lot of works that are tossup-able at regionals level. Therefore I think overall these questions would be converted later by the field, but because you can generally get a good buzz on a work you have read, and you are more likely to read a work that is tossup-able (because it is more famous) these questions will have a higher percentage of first buzzes. I don't really have much of a prediction for how common links will play, but I would be interested to look at that as well.
Justin Hawkins
John Carroll HS '15
University of Maryland '19

User avatar
ThisIsMyUsername
Yuna
Posts: 775
Joined: Wed Jul 15, 2009 11:36 am
Location: New York, NY

Re: Packets, recordings, and detailed stats warm-up survey

Post by ThisIsMyUsername » Sun Feb 04, 2018 4:35 pm

thebluehawk1 wrote:I am interested in looking at the different ways in which writers ask about lit questions, to see the differences in how the are converted. For example a common format for asking about lit would be a "this author question", which typically has less deep clues about individual works, and more basic clues about obscure works. I think these questions will on average be converted earlier than the next type of question, because it is easier to read a brief wikipedia summary of several works than lock down all the clues in a full work. The next type would be a "this work" tossup. It is harder to lock down deep clues for a work you haven't read, and there are a lot of works that are tossup-able at regionals level. Therefore I think overall these questions would be converted later by the field, but because you can generally get a good buzz on a work you have read, and you are more likely to read a work that is tossup-able (because it is more famous) these questions will have a higher percentage of first buzzes. I don't really have much of a prediction for how common links will play, but I would be interested to look at that as well.
Whether this is "typically" true or not really depends on the writer/editor. Some prefer to include a large proportion of author tossups that interleave clues from works that are themselves tossupable at the same difficulty level; and some prefer to write author tossups that clue from works that are not. Likewise, a tossup on a work could begin by using mainly secondary-source clues, which are sometimes drawn from the same Wikipedia/Google-type sources that you say mostly populate author tossups.

What would be more pertinent (but far more labor-intensive) would be to tag questions according to what type of early clue they use (rather than their answer-line type), and to see how that affects buzzing. I think you may be right that one or the other might have typically earlier buzzpoints. But I think, above all, one would also find that some individual players are better at one type and some at the other (depending on their balance between reading and studying).
John Lawrence
Yale University '12
King's College London '13
University of Chicago '19

“I am not absentminded. It is the presence of mind that makes me unaware of everything else.” - G.K. Chesterton

khannate
Lulu
Posts: 23
Joined: Sat Oct 10, 2015 10:16 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by khannate » Sun Feb 04, 2018 4:52 pm

I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
Attachments
team_comparison.pdf
(35.59 KiB) Downloaded 154 times
Samir Khan
UChicago '19

User avatar
wcheng
Wakka
Posts: 113
Joined: Mon May 26, 2014 12:02 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by wcheng » Sun Feb 04, 2018 5:05 pm

khannate wrote:I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
I think it'd be really interesting to see how the top teams by A-Value stack up against each other by this metric!
Weijia Cheng
Centennial '15
Maryland '18 (Fall)

User avatar
An Economic Ignoramus
Memerator
Posts: 523
Joined: Tue Aug 02, 2016 12:31 pm
Location: East Lansing, MI or Naperville, IL

Re: Packets, recordings, and detailed stats warm-up survey

Post by An Economic Ignoramus » Sun Feb 04, 2018 5:23 pm

khannate wrote:I spent some time playing with the stats from the UIUC site, and more specifically trying to find some meaningful way to compare and rank teams. What I ended up doing was constructing an estimate of the distribution of buzz points for each team within each category by starting with Will's proposed ideal distribution of buzz points and Bayesian updating based on the actual buzzes a team got within the category. This ends up looking like a weighted average of the ideal distribution and the empirical distribution of a teams buzzes, weighted based on a tuning parameter and the number of buzzes in the category the team got.

Based on these distributions, you can simulate two teams playing a tossup in a category by drawing a buzz from each teams distribution in that category, determining what would have happened, and giving the team who got the tossup their ppb as bonus points. By doing this the right number of times for each category, you can simulate a full game. I did this 100 times for each pair of teams at the UIUC site and plotted the results in the graph attached. The entry in (row, column) is the fraction row team beat column team.

I think this sort of model can be useful for thinking about the outcomes of tournaments and how surprising or unsurprising they are. For example, (at least at Chicago), there's a perception that Chicago teams upset each other an unusual amount, but the graph suggests this isn't actually the case. At a tournament where all of Chicago A, B, and C play each other, the probability of at least one team losing to a lower-lettered team is about 44%.

This could also be used to do forecasting for Nats by simulating all the matches from the Nats schedule, running through the tournament 1000 times, and seeing the distribution of each teams placing.

If people are interested in seeing this sort of thing for other sites or for say the top 25 teams by ppb, just let me know and I'd be happy to do it.
Another interesting thing to look at would be what percentage of games that actually took place at regs were upsets (actual winner lost the majority of simulated games) or strong upsets (actual winner lost more than 65? 70? 75? % of simiulated games).
Jakob Myers
MSU '21, Naperville North '17
"No one has ever organized a greater effort to get people interested in pretending to play quiz bowl"
-Ankit Aggarwal
Member, PACE
Memerator

User avatar
cornfused
Auron
Posts: 2147
Joined: Sun Feb 12, 2006 3:22 pm
Location: Chicago, IL

Re: Packets, recordings, and detailed stats warm-up survey

Post by cornfused » Tue Feb 06, 2018 5:20 pm

Stupid question, but are the stats actually posted anywhere?
Greg Peterson
Northwestern University '18 (co-in-charge in 2017-18)
Lawrence University '11
Maine South HS '07

"a decent player" - Mike Cheyne

User avatar
Periplus of the Erythraean Sea
Auron
Posts: 1665
Joined: Mon Feb 28, 2011 11:53 pm
Location: Falls Church, VA

Re: Packets, recordings, and detailed stats warm-up survey

Post by Periplus of the Erythraean Sea » Tue Feb 06, 2018 5:49 pm

cornfused wrote:Stupid question, but are the stats actually posted anywhere?
We have not released them publicly yet - we were hoping to get numerous voices from within the community, at many levels of skill, to voice their opinions and ask questions. In the interest of fostering this sort of discussion, we've withheld stats for a bit so that people don't retreat to their own silos / groupchats and noodle around with things themselves, answer their own questions, and never talk about things in public channels.

Perhaps it's ironic that withholding of some information should be necessary to foster a public discourse, but I doubt this thread would have taken off if we simply immediately released the stats, since there was very little discussion about them for this year's EFT or This Tournament is a Crime.
Will Alston
Bethesda Chevy Chase HS '12, Dartmouth '16
"...should be treated as the non-stakeholding troll he is" -Matt Weiner

User avatar
a bird
Wakka
Posts: 105
Joined: Sun Feb 26, 2012 3:50 pm
Location: College Park, MD

Re: Packets, recordings, and detailed stats warm-up survey

Post by a bird » Tue Feb 06, 2018 8:40 pm

Aaron Manby (ironmaster) wrote:Buzzpoints by team, split by player

EDIT: The area under a player's curve is equal to the number of tossups answered correctly. Y axis is thus in arbitrary units of points*probability.

Someone should design an interactive website with custom results and graphs but that I unfortunately don't have the skills to be that person.
These are a very nice way of looking at the buzzpoint data for a given team. Thanks for making them! This (along with the plots Ryan made) got me wondering how the difficulty of different categories could affect these curves (both in general and in the specific case of this tournament). For example, say (hypothetically) the lit had easier early clues than the history. If player A buzzed mostly on lit while their teammate player B buzzed mostly on history, the different shapes of the A and B buzzpoint curves would be influenced by the players' knowledge of their respective categories and the difficulty of those categories. A and B could have buzzed on clues of comparable difficulty, but ended up with different buzzpoint curves due primarily to the cluing in the set.

Do people think this had a substantial effect on the buzzpoint curves, or was it negligible? Of course most players buzz on multiple categories anyway, so the effect I'm describing might be hard to detect in most cases, even if it did happen. It might also be interesting to make buzzpoint distribution plots for specific categories of interest. It might also be interesting to make buzzpoint plot that somehow included average performance on a per subject, or even per question basis.

Has anyone analyzed which categories had the most early buzzes? I didn't find any category subjectively harder by a large amount, but I wonder what the data say about the difficulty of early clues in different categories.
Graham Reid
Kenyon 2017
Maryland Physics 20??

Nice hockey Cote d'Azur
Wakka
Posts: 179
Joined: Sun May 29, 2011 9:51 pm
Location: Chicago

Re: Packets, recordings, and detailed stats warm-up survey

Post by Nice hockey Cote d'Azur » Fri Feb 09, 2018 11:05 pm

I've gone through the stats and tried testing some of the hypotheses people have posted in this thread that have not yet been answered.
CPiGuy wrote:"Bad" teams (let's say <12PPB) will have a higher percentage of 30'd bonuses than buzzes in the first two lines, and "good" teams (>18PPB) will have a higher percentage of buzzes in the first two lines than 30'd bonuses.
I wasn't sure exactly what was meant by buzzes in the first two lines, I took it to mean correct buzzes in the first two lines as a percentage of all of that team's buzzes. Based on that definition, I found that in total, "bad" teams 30'd 1.2% of bonuses and buzzed in the first 30% of a tossup(as a proxy for question length) 0.8% of the time. On the other hand, "good" teams 30'd 23.0% of bonuses and buzzed in the first 25% of the tossup 4.1% of the time.

You probably underestimated the ease with which top teams can 30 regular difficulty bonuses in particular, especially compared to getting early buzzes on a set with somewhat tougher leadins.
nsb2 wrote:I would predict that a large majority of music buzzes (maybe up to 90%) were after the third line or so, even more than for other categories.
You were correct that more than 90% of music buzzes came after the third line (I used 40% of the tossup as a proxy). However, this was not especially high compared to other categories.

Code: Select all

 Subcategory                                                   Buzzes After 3rd Line
                                         Philosophy            0.984
                                  Miscellaneous Lit            0.982
                                              Drama            0.972
                                            Biology            0.970
                                    Non-Epic Poetry            0.969
 British, Canadian, Australian, New Zealand History            0.963
                                          Chemistry            0.957
                                     Social Science            0.956
                                              Music            0.949
                                     Other Academic            0.949
                                      Other Science            0.946
                                          Other Art            0.939
                                      Other History            0.939
                                         US History            0.937
                                       Long Fiction            0.925
                                      Short Fiction            0.925
                                            Physics            0.920
                                 Painting/Sculpture            0.915
         Continental European History (post-600 CE)            0.910
   Continental or Near Eastern History (pre-600 CE)            0.905
                                          Geography            0.901
                                           Religion            0.890
                     Historiography and Archaeology            0.885
                                     Current Events            0.853
                                          Mythology            0.846
                                              Trash            0.766
cwest123 wrote:At the Southeast (Georgia Tech) site, the majority of music buzzes were VERY late in the question, generally close or in to the last line. Percentage wise, I'll guess that the average buzz was beyond the 75% point.
This is correct, of all the sites I found that the Georgia Tech one had the latest mean and median buzzpoints. This is only considering correct buzzes and ignoring vulches.

Code: Select all

  Site             Average Buzz %  Median Buzz %
      Minnesota    0.684           0.720
           UCSD    0.700           0.723
   Kansas State    0.713           0.664
 Oxford Brookes    0.715           0.754
     Penn State    0.723           0.723
           UIUC    0.765           0.813
    Connecticut    0.771           0.791
        Toronto    0.783           0.876
            UCF    0.843           0.887
       Virginia    0.863           0.940
           Rice    0.899           0.924
   Georgia Tech    0.936           0.996
Sima Guang Hater wrote: -Hard parts were significantly harder than middle parts, leading to a "wall effect" around 20 ppb
-Science bonuses were, on average, harder than literature bonuses
I don't know if there's a good way to measure a "wall effect", overall middle bonus parts were converted around 50% of the time and hard parts about 15%. This seems about what's expected, and any measure of "significantly harder" would probably require a comparison to other regular difficulty tournaments. You were correct that science bonuses were harder, although I was surprised to see that science easy parts were actually converted pretty well, while the middle and hart parts were converted the least of any category. I've attached a plot below showing conversion by category and difficulty.

Image
geremy wrote:I predict that out of all the science subcategories, physics has the latest average buzz point and biology the earliest, but the average PPB will be pretty close.
Physics did have not the latest buzz point, but it did have the lowest PPB.

Code: Select all

 Subcategory   PPB      Average Buzz %  Median Buzz %
       Biology 14.68    0.771           0.814
     Chemistry 14.59    0.755           0.770
 Other Science 14.28    0.727           0.746
       Physics 13.08    0.761           0.814
I'll get back to some of the other ones, PM me if you catch any errors I made here.

EDIT: fixed incorrect statement
Tejas Raje
Cornell '14

User avatar
ErikC
Wakka
Posts: 162
Joined: Sat Sep 24, 2016 12:44 pm

Re: Packets, recordings, and detailed stats warm-up survey

Post by ErikC » Sat Feb 10, 2018 1:48 am

It's interesting that the Toronto site's median and average buzz are quite different.

I'm not surprised science hard parts were the hardest. I think science easy parts being converted well is because it is often a concept almost everyone is familiar with (like gravity) even if they don't understand it fundamentally (like Rein Otsason).
Erik Christensen
University of Waterloo - School of Planning Class of '18
I write trash
Defending VETO top scorer

Post Reply