2018 ACF Regionals detailed stats

hftf · Post by **hftf** » Thu Feb 08, 2018 6:20 pm

I am releasing detailed stats for all 12 sites of ACF Regionals 2018. Please read all of the information below before you view the spreadsheets.

Overview

The aim of detailed stats is to provide more data than standard stats, which only collects summary data (player statlines and final team scores) for each game. Once each question is tagged and each complete scoresheet is entered, it becomes possible to compile aggregate data by question, category, author, team, or player. This form of detailed stats, often called conversion stats, has been produced for a few tournaments over the last few years. Besides bringing gratification to players, editors have been able to use them to make many improvements to packets between mirrors. Recently, a new form of detailed stats (which has no established name yet) goes even further than conversion stats by tracking exactly where players buzzed on each tossup question.

ACF Regionals 2018 was the most widely played tournament yet to record buzz points, and the first to have detailed stats with so many teams playing on the exact same questions. There were 16854 buzzes (93% of which have a buzz location) and 13015 bonuses recorded in 737 games (using up to 69 rooms across 12 sites). About 12 packets’ worth of questions (240 tossups and 206 bonuses) were heard more than 30 times. Some questions, however, have rather smaller sample sizes, so please carefully consider that the data may not be entirely representative of the question set. For example, a team that might have performed well on a bonus did not get the opportunity to answer it.

Slight changes were made to the question set after the main group of mirrors, some of which were informed by the detailed stats and the private discussion forum.

Spreadsheets

There is a separate detailed stats spreadsheet for each site, as well as an “All sites combined” spreadsheet¹. Compared to previous iterations, there are no major changes to the spreadsheet structure. The Categories sheet lists aggregate tossup and bonus conversion by subcategory. Tossups and Bonuses list conversion for every question. The performance of each team by subcategory is in Teams–Categories. The performance of each player is in Player, which is broken down by category and subcategory in the remaining tabs.

The spreadsheets are mainly intended for casual observation and only contain rudimentary reports and breakdowns.

Raw data

The raw tossup and bonus data have been published (in TSV format and denormalized for easier ad-hoc analysis) in a GitHub repository under an open license as part of the launch of a new open quizbowl data initiative. If you need more data than this for your research, please email me.

Solicitation of research

Last week, I posted a warm-up thread as a prelude to releasing the detailed stats. I wanted to prepare some concluding remarks on how I felt about the outcome of that exercise, but lengthy posts like these take quite a while to write (though you are of course welcome to start that discussion yourself). Here I will briefly describe how the “deal” worked out and explain my motive, because it was a frequently asked question and intentionally somewhat veiled (which I have already elaborated on to some people in private).

I wrote: Originally, the people who emailed me were asked to post concrete predictions in the thread in exchange for early access to the draft spreadsheets, then test their hypotheses against the data, and finally post their findings. I thought this was a fair request, but only three people ever ended up emailing me. (I really just needed your email to give you access!) So instead of asking for predictions, I then gave access to anyone who managed to contact me and made any sort of productive contribution to the thread.

My true motive was to promote progress in quizbowl discourse. The incentive was meant to encourage people to explore the purportedly broad possibilities of this long-desired kind of dataset – to do some real science! – and not just scroll passively through a straightforward spreadsheet. I hoped to use early stats access as bait to trick people into investing some effort digging into data, placing value in their own discoveries, and exchanging ideas in a centralized public forum.

For a while I’ve been demoralized by the stagnation and balkanization of quizbowl discourse in general. It seems as if many of the most knowledgeable people rarely participate in mature, intelligent discussion here (not necessarily theorizing). I tend to enjoy quizbowl more when I hear and learn from more of those voices. I don’t really care for daft banter or whatever the bizarre topic du jour is. I do believe it is important to expand quizbowl to a wider audience (can there be a group reading of “The Big Vision” each year?) and that people might stick around longer if the discourse were less off-putting and more analytical, inclusive, and introspective. For an ostensibly intellectually curious activity, I expected there to be more engagement from the quizbowl community, especially during this pivotal opportunity.

Despite my pessimism, I still wanted to try to bring people together and provoke some positive change. It’s not that I think boosting engagement with detailed stats will just advance the discourse, but also the activity as a whole. Quizbowl isn’t a lifelong activity; most alumni soon drift apart from it. I can’t be relied on to have this role forever, so if the community wants to be sustainable, it must continue to pick up the technological slack.

So I implore you to explore the data. Harness its potential – these spreadsheets barely scratch the surface of possibilities of the underlying dataset. I don’t particularly care how you interpret the data. Just use it!

And I’m not speaking only to the technically competent: anyone can learn Excel or similar tools. I’m happy that the incentive has motivated a moderate amount of people to participate – it may not be the most enthusiastic or enlightening discourse – but I still hope to include as much of the community as possible. I can’t be satisfied until I hear what everyone has to say, even the cagey non-posters like myself. Getting involved in this exercise shouldn’t be risky or embarrassing (as long as you didn’t defeat the whole purpose by looking at the stats before posting predictions!).

Please post your results here, along with any methodology, code, spreadsheet work, or whatever other resources are needed to make your research reproducible. (I recommend using a design that makes it easy to plug in an updated version of the data.) Make sure to fully label your plots (annotations can be very useful) so that readers can easily understand the context. If you have been accumulating tables, graphs, and analyses, or only circulating them in chat rooms or silly social media groups, I invite you to please share them publicly in this thread. I want everyone (including future quizbowl historians) to be able to see what others have investigated, build upon each other’s creativity, and make something new.

Errata

I am certain that the scoresheets have numerous mistakes, most of which involve incomplete and misattributed data. A few rooms did not record any buzz points at all, while others just forgot to paste them in at the end of a match (and could not be reached after the tournament). A small number of moderators logged incorrect tossup answers in their scoresheets as an optional enhancement to help the authors better understand how questions played, but there is a good chance that some wild guesses are assigned to the wrong team.

Please do not report corrections or problems accessing the spreadsheets in this thread. I will accept errata via email only. This is because I expect to receive a large volume of reports and I may need to give you access to the scoresheet. Your report must contain a full description of the errata (including the site, room, round, packet, question number, team, player, etc.) so that I can easily find it, or link directly to the relevant scoresheet. Referring only to the answerline (“I didn’t get credit for my buzz on Cervantes!”) will not be sufficient. If you have some kind of proof or reliable source (such as a notebook, recording, or witness) to corroborate the report, you may provide that as well. Please try to determine whether a mistake encompasses several questions (thus indicating a misalignment nearby) or only a single question.

Here is a list of errata so far. Some changes have already been made, and there may be more over time.

Thanks

I would like to thank all tournament directors, moderators, and volunteers for staffing and for putting up with an unusual scorekeeping system. My goal is to improve the experience of every player and every moderator – not to make quizbowl, a fun activity, into a burden.

Thanks as well for your patience. I spent a lot of time developing the system for creating these detailed stats and much appreciate your support. Here are the spreadsheets.

Links to spreadsheets

It may take a long time to load these large web pages. They are static “published” views of complicated spreadsheets that may update occasionally, but are not interactive like before.

All sites combined¹ · Connecticut · Georgia Tech · Kansas State · Minnesota · Oxford Brookes · Penn State · Rice · Toronto · UCF · UCSD · UIUC · Virginia

Tips

If you would like to support me creating new quizbowl technology, you can donate to my Patreon. Please don’t feel like you have to give me anything. Thank you to Aseem, Jakob, Rob, Will, and all my patrons for your generous support!

Future

I already wrote a bit about the future of the current scorekeeping system, which I would now like to discontinue. I’m not that proud of these miserable spreadsheets (I detest working with them, honestly), and they obviously can’t become the standard method for quizbowl scorekeeping.

The reason I chose to implement a scorekeeping system in this esoteric, improvised manner was for relative flexibility: it enabled the collection of valuable data without interfering much with established procedures (normal packets but with clickable words, normal scoresheets but with extra columns). If moderators could just modify a scoresheet as usual, I could worry less about business logic or unrepresentable edge cases and making a fully reliable and robust app. Collaboration, infrastructure, real-time updates, browser support/accessibility, and familiarity are pretty compelling existing advantages offered by Google Sheets (not mentioning the many disadvantages) that would have to be rebuilt in whatever successor technology comes along.

Of course, I am not content with the current presentation of the detailed stats. I did plan to make a nice web application for viewing all the varieties of stats you could wish for, but that major project is on hold in part due to a backlog of responsibilities, injury, and travel. Some elements of the existing software will eventually be open source, so follow me on GitHub if interested.

_____
¹ Unlike the others, the “All sites combined” spreadsheet only includes selected sheets due to the constraints of a single Google Sheets document (maximum number of cells, maximum number of simultaneously calculating formulas, maximum number of sheets to import external data from). Be aware that this spreadsheet does aggregate data from the Oxford Brookes site, which played a slightly different version of the set, and therefore cannot be strictly compared properly.

ryanrosenberg · Post by **ryanrosenberg** » Thu Feb 08, 2018 10:28 pm

hftf wrote:I did plan to make a nice web application for viewing all the varieties of stats you could wish for

I made a Shiny app through which you can compare different teams' buzz points in a given category, as well as a tab for players' buzzpoint distribution weighted for PPG (credit to Joe Su for the idea).

edit: I've also been doing a bunch of exploratory data analysis/answering questions with the data in the past couple weeks, and will post some of my results here.

Gen. Winfield Scott Hancock · Thu Feb 08, 2018 11:45 pm

Based on the extremely scientific method of copying this data into an Excel spreadsheet and running the average function on the data, it appears that the average first buzz on tossups in this set came at 51.33 words. Considering that the same function shows that the average amount of words per tossup was 139.84, I find it concerning that at this tournament, played by all the top teams and nearly all the top players, tossups were, on average, not buzzed on by anyone until more than a third of the way through.

I believe that this data supports the idea that the construction of ACF Regionals tossups needs to be changed for the easier. As I see it, this can be accomplished in one of two ways. Either questions can become shorter, as proposed by Victor, or they can perhaps stay the same length but be a bit more forgiving in their clueing. I would generally tend to favor the former model. While I see Auroni's point about shorter questions reducing the time for players to "connect the dots" with clues, as it were, I also believe it is empathetic to players to not sit through line after line of clues that they do not know. In the absence of a D2 Regionals set for newer or less-skilled players and teams, I believe it is imperative that we do not lose sight of the fact that ACF Regionals has a far wider audience than the top teams competing for nationals spots. I spoke with several teams at the Penn State site that were not in that echelon, and by and large, the difficulty of the question set ensured that 2018 ACF Regionals was not an enjoyable experience for them.

In this vein, I also do not feel that ACF Regionals is a good place for more than minimal canon expansion, but that is a topic I hope to address in a future post.

In sum, I think this data, along with other data such as average buzz points, demonstrates that we need to come to a consensus on what ACF Regionals is supposed to be and to whom it is supposed to be marketed. The current situation, of an increasingly difficult set that attracts teams of all skill levels, seems to me to be unsustainable.

armitage · Post by **armitage** » Fri Feb 09, 2018 12:11 am

gettysburg11 wrote:In the absence of a D2 Regionals set

I think this could be more feasible than people realize. In my view, this wouldn't require more than a few hands doing cleanup work to excise some earlier clues - not top editors, just people who are good at paying attention to detail. If there is a groundswell of demand for this, it could well happen. A few voices can carry a lot of weight.

It is cool that the first buzz metric allowed this particular observation to be made empirically for the first time.

Post by **theMoMA** » Fri Feb 09, 2018 12:19 am

The number of teams playing Regionals has gone from 101 to 127 to 133 to 143 over the past four years. 42% growth in four years is pretty good. The audience for people to play ACF questions is smaller than the whole world of college quizbowl, but there is clearly a large and growing audience for this style of competition. I think there are ways to make the tournament less difficult or grueling at the margins, such as stricter length limits and difficulty controls, but I'm not sure that a radical restructuring (if that's what you're suggesting) is warranted.

Jack · Post by **Jack** » Fri Feb 09, 2018 1:08 am

As someone who is "new" to collegiate quiz bowl and the quiz bowl scene in general, I have to say I find Ryan's post particularly striking. I did not play this year's ACF Regionals (thanks Princeton final exam schedule), but two members of our team managed to make the trek. I heard from them and others that the questions were particularly lengthy, and that rounds were a bit grueling. Now, I see that the average first buzz of a tossup was over a third of the way into the question, which, I assume, means that perhaps the first or second clues did not warrant many buzzes to even the best players of quiz bowl.

There were 235 teams at SCT this year, nearly 100 more than ACF Regionals. From my understanding, the last three SCTs have had over 200 total participants. In my perspective, as a "less-experienced quiz bowler," it's easy to see why: tossups that allow for more buzzes throughout the question are more engaging and rewarding, making playing on tossups (or just thinking about playing on them) that are longer without any meaningful difference seem, might I say, dreadful and tedious. Questions rewarding knowledge are, if I'm not mistaken, one of the core components of pyramidality of Quiz Bowl. I don't think one can confidently attribute a 42% growth to ACF's style outright when SCT has done consistently better. When you also consider there's really not a lot of options to play at large-scale tournaments such as Regs or SCT, and that NAQT has undertaken efforts to encourage former high schoolers to continue qb into college, I think the discrepancy between SCT and Regs is more meaningful: around 100 teams that are going to one of the major events don't go to the other. If I had to tell a college team of relatively inexperienced players to just go to one event, I'd recommend SCT over Regs.

I think what Ryan and, apparently, Victor are suggesting is not that there's anything wrong with the current style of ACF questions, but that it just makes more sense to have them be shorter, especially since Regs attracts teams of all abilities. If ACF Regs is supposed to have a similar target audience as NAQT's SCT (which it really ought to, in my eyes), it should adopt this change. It makes the tossups more approachable and accessible while not sacrificing, as indicated by the data, any earlier "buzzability" for better teams. If quiz bowl's growth and advocacy is (as it should be) a goal for the community, I think, in light of this analysis, it would make sense to alter how first and second clues are used in Regs tossups, probably just eliminating the harder clue outright -- which, as you suggested, Andrew, may mean shortening the length of tossups. I felt that is what Ryan was getting at, anyway.

touchpack · Post by **touchpack** » Sat Feb 10, 2018 3:24 am

Comparing SCT to Regionals numbers misses the point of what Andrew was saying--if Regionals is indeed too hard for large portions of its audience, the number of teams in attendance wouldn't be monotonically increasing every year. (and there are other reasons why SCT is larger than Regionals which are not directly related to question difficulty) I remain unconvinced that Regionals needs to be made radically easier--although I think reducing question length from 8 back down to 7 lines (not by cutting leadins, but by cutting everything proportionally) would help improve player experience in the lower brackets.

One thing to note about Ryan's stat: it's a little skewed, because no one ever buzzes before say, word 10 of a tossup. Looking at the average first buzz vs the average available buzz space (word 41 out of 140) reduces the %age from 36 to 31. Perhaps this %age should be a little smaller (maybe aiming for like, 25% would be more optimal), but I don't think that it's terribly unreasonable.

It's important to remember that tossups do not have the sole purpose of gradating between as many teams of different skill levels as finely as possible. Leadins are often the most intellectually interesting parts of questions--I personally think that uniformly cutting all leadins from a regular difficulty set, while it might improve the resolving power of the set slightly, would be a net negative by virtue of making the set less interesting. Quizbowl is a game, but it's also a learning experience! Even the best teams should have the opportunity to learn something from regular difficulty questions! (Personally, I don't play sets below regular difficulty anymore because I don't really get any enjoyment out of them.) I think the continued growth of Regionals shows that it's possible to do this while still giving new players a fun experience as well. (although again, I think my ideal Regs would be 1 line shorter than this one, but there's room to debate here)

Cheynem · Post by **Cheynem** » Sat Feb 10, 2018 12:48 pm

(caveat: I have not seen any of 2018 ACF Regionals--I am speaking in the abstract)

There's always going to be the question of how long quizbowl questions should be, particularly the beginnings of said questions. Ryan notes that the average buzz point was more than a third of the way in, suggesting that the majority of the lead-ins were not buzzed on. While I agree to some extent the questions could be shorter, I also think there's a certain enjoyment (which Billy is getting at) that these harder lead-ins have.

1. As opposed to NAQT questions, which are short and more fast-paced, these questions have time to saturate, to give you time to think instead of immediately buzzing in. I enjoy that pace.

2. If few people are buzzing on these lead-ins, it is eminently more satisfying when you are one of the people who buzzes there.

Perhaps these are elitist views--I know, obviously, that not everyone would agree with them. I do think questions could be shorter and I'm sympathetic to the idea of a D2 set, but I also am not sure if the set is really that much better if more people are buzzing earlier.

Jack · Post by **Jack** » Sat Feb 10, 2018 7:21 pm

In retrospect, and in response to Mike, I'll say that it might make more sense for a "DII" division to exist. While I certainly respect and support the notion that some early, harder clues can be used as a learning tool for even the best players, it is still empirically apparent that too many teams and sites did not buzz on questions until >~80% of the way through, according to the average and median buzzpoints. I feel like many teams are not going to learn much from listening to the first few lines and having no idea (whereas better teams will learn more from those first few lines). Regardless, all teams can study and review packets after the tournament.

Someone should correct me if I'm wrong, but from my analysis, the median buzzpoint at the Georgia Tech site, which had (according to forum post here) ten teams in attendance, was .996 (!!) (and an average above .90) Over half the players at that site were not buzzing until the end of an 8 line tossup. There is something (positive) to be said about a longer, more "drawn-out" tossup that gives the "connect the dots" type atmosphere, but I can't imagine that very many of the players at GT really had that experience -- not to mention the teams in the fields that didn't buzz until after the 80% point.

I too enjoy both fast and slow-paced tossups at times, but this slow pace is not giving a significant number of teams a worthwhile benefit when contrasted with the dread of slogging through 7 to 8 lines for most tossups. For those teams who aren't buzzing until the last couple of lines, I would think it's those mid level clues that are giving that same "learning experience" that Billy mentioned, whereas those first couple clues are giving that experience to the better teams.

So really, as I sit here and write this, it just makes more and more sense for the existence of a separate DII packet, especially if

armitage wrote: this could be more feasible than people realize. In my view, this wouldn't require more than a few hands doing cleanup work to excise some earlier clues - not top editors, just people who are good at paying attention to detail. If there is a groundswell of demand for this, it could well happen. A few voices can carry a lot of weight.

In conclusion, I think it's totally possible to "keep the set interesting" while still making improvements to its accessibility. It really might be a good idea to have a DII packet in order to have that "connect the dots" feel apply to more teams, since the earlier and more obscure clues are doing that to the good teams, and (I conjecture) the current mid-level clues are doing that with the more inexperienced teams. But I'll let the "Pros" discuss this -- I hope my perspective was refreshing.

Tejas · Post by **Tejas** » Sat Feb 10, 2018 8:40 pm

jacke wrote: Someone should correct me if I'm wrong, but from my analysis, the median buzzpoint at the Georgia Tech site, which had (according to forum post here) ten teams in attendance, was .996 (!!) (and an average above .90)

Just to be clear, this was only referring to music tossups, as Chandler's hypothesis was only about those.

Zealots of Stockholm · Sat Feb 10, 2018 9:23 pm

Marmaduke van Swearingen wrote:
jacke wrote: Someone should correct me if I'm wrong, but from my analysis, the median buzzpoint at the Georgia Tech site, which had (according to forum post here) ten teams in attendance, was .996 (!!) (and an average above .90)
Just to be clear, this was only referring to music tossups, as Chandler's hypothesis was only about those.

Also, buzzpoints from the Tech site should probably be taken with a grain of salt, as 2/5 rooms weren't recording them.

TheInventor · Post by **TheInventor** » Sat Feb 10, 2018 9:39 pm

I am also a freshman new to collegiate quizbowl, and I would like to make more qualitative complaints. When I went to regs, my team and I, which only consisted of freshman, did not have a good experience. In virtually every single game we played (with the exception of being destroyed by Penn A), an overwhelming majority of the tossups consisted of us listening to ridiculous clues we have never heard of before and struggling to pay attention to until the tossup became a high school tossup, at which point everyone woke up a massive buzzer race occurred. This continued until the tournament was over, at which point I, for the first time in my career, thought that I wanted to stop doing quizbowl. Of course, that was Irrational because I was beyond exhausted from listening to those 8 line tossups only to just wait for the cliff during the last third of the question. My teammates also shared this feeling of disgust, and honestly, I am not sure that I ever want to attend another ACF event (except maybe fall, but that's a big maybe) unless something is done to address this. As far as stats, all I will say is that I can find only 2 tossups that have average buzz points < 50%.

I would love to play if a DII set was made or the tossups were made with better pyramidality, lower difficulty, and stricter length limits. Honestly, the fact that we call this "regular difficulty" and that only very few of the top teams in the country were able even breech 20PPB is kind of ridiculous. Further the average buzz points for anyone who has not been a consistent ACF nats contender or been playing quizbowl for an amount of time somewhat comparable to the amount of time I have been alive serves as a strong precedent. If anyone who has neither been to ACF nats nor finished above T-10 at HSNCT nor been to ICT thinks the difficulty of this set was fine, I would like to hear why. This isn't even mentioning how exhausting and debilitating I can imagine was for moderators, where they had to read most of an ~135 word tossup almost every single time. Over the course of a 13 round tournament, that means every moderator ends up reading over 35,000 words (even still, excluding bonuses), which is longer than the entirety of hamlet, with the added twist of parsing pronunciations of obscure clues that almost everyone neither understands, nor has heard of. No wonder there seems to be a difficulty getting moderators to travel an extremely long distance and read even longer and more exhaustive questions for nats.

I honestly hope something is done. I don't think monotonically increasing participation implies the set is the appropriate difficulty, but instead I see it as a reflection of the recent growth of quizbowl as a whole.

Good Goblin Housekeeping · Sat Feb 10, 2018 10:11 pm

yo I get that playing long questions tougher than you're used to is hard but please explain but you mean by "better pyramidality"

additionally some people in fact like playing questions that have clues that haven't heard before. I'm not sure about your team but certainly when I read for teams at the Illinois site nobody seemed to think the clues were "ridiculous" and in fact some people were interested in learning more about them

ErikC · Post by **ErikC** » Sat Feb 10, 2018 10:19 pm

TheInventor wrote: I honestly hope something is done. I don't think monotonically increasing participation implies the set is the appropriate difficulty, but instead I see it as a reflection of the recent growth of quizbowl as a whole.

If you're not practicing at the difficulty of a tournament before playing it, you're going to have a bad time.

Jack · Post by **Jack** » Sat Feb 10, 2018 10:23 pm

Borel hierarchy wrote:yo I get that playing long questions tougher than you're used to is hard but please explain but you mean by "better pyramidality"

I assume he's probably thinking something like "all the tossups in the beginning of questions are clues that are all equally too difficult followed by a difficulty cliff, which is not pyramidal to me." That's what I read, at least ¯\_(ツ)_/¯

Sima Guang Hater · Post by **Sima Guang Hater** » Sat Feb 10, 2018 10:23 pm

Given the experiences of Sebastian and other new-to-quizbowl people on ACF regionals, and given the fact that first buzzes seem to be occurring at around 1/3 of the way through the question...I actually see no problem with driving down "regular" difficulty to something just above where EFT was this year.

More specifically I'd be fine with a regular difficulty in which the best teams are hitting 23-24 ppb and there are more buzzes in the first half of the question (maybe something like 30% of buzzes in the first half?). There's sufficient space there to grade the best teams against each other AND ensure that middling and new teams are still engaged and being distinguished from one another. Plus, the top teams still have ACF Nationals to fight it out anyway.

Good Goblin Housekeeping · Sat Feb 10, 2018 10:26 pm

jacke wrote:
Borel hierarchy wrote:yo I get that playing long questions tougher than you're used to is hard but please explain but you mean by "better pyramidality"
I assume he's probably thinking something like "all the tossups in the beginning of questions are clues that are all equally too difficult followed by a difficulty cliff, which is not pyramidal to me." That's what I read, at least ¯\_(ツ)_/¯

am i supposed to read the thing about people who have played quizbowl for comparable amounts of time to the time he's been alive to say that he's younger than 10 years old then?

Jack · Post by **Jack** » Sat Feb 10, 2018 10:41 pm

Borel hierarchy wrote:am i supposed to read the thing about people who have played quizbowl for comparable amounts of time to the time he's been alive to say that he's younger than 10 years old then?

I'd wager he's just taking the opportunity to voice his opinion, not to troll or anything. I say good for him since it says this is his only forum post (evidently the mods approved it so they must think he's serious). He clearly has a strong opinion on this and I think it's good that people be made aware of one of the responses to the event.

Borrowing 100,000 Arrows · Sat Feb 10, 2018 11:56 pm

Borel hierarchy wrote:yo I get that playing long questions tougher than you're used to is hard but please explain but you mean by "better pyramidality"

additionally some people in fact like playing questions that have clues that haven't heard before. I'm not sure about your team but certainly when I read for teams at the Illinois site nobody seemed to think the clues were "ridiculous" and in fact some people were interested in learning more about them

Yeah, this tournament was a little on the difficult side, but it seemed like pretty much a normal regular difficulty tournament. To all the freshmen in this thread, I think it's important to remember that, just like any other competitive collegiate-level activity, college quizbowl isn't easy. When I was a freshman, we got destroyed by Michigan and Chicago at our first tournament. Afterwards, I was pretty demoralized. I felt like JL, Auroni, and Will knew more than I'd ever know. However, instead quitting or complaining, I studied and got better (though sadly JL, Auroni, and Will still probably know more than I'll ever know).

touchpack · Post by **touchpack** » Sun Feb 11, 2018 12:18 am

Sima Guang Hater wrote:Given the experiences of Sebastian and other new-to-quizbowl people on ACF regionals, and given the fact that first buzzes seem to be occurring at around 1/3 of the way through the question...I actually see no problem with driving down "regular" difficulty to something just above where EFT was this year.

More specifically I'd be fine with a regular difficulty in which the best teams are hitting 23-24 ppb and there are more buzzes in the first half of the question (maybe something like 30% of buzzes in the first half?). There's sufficient space there to grade the best teams against each other AND ensure that middling and new teams are still engaged and being distinguished from one another. Plus, the top teams still have ACF Nationals to fight it out anyway.

...but the best teams already hit 23-25 PPB quite often on regular difficulty!

This year's poor stats on ACF regionals and SCT are particularly anomalous in that the best teams are weaker than the best teams have been in past years, which was compounded by the fact that the best team (Yale) didn't play either set this year. I agree that editors need to be aware of difficulty creep* (see below) and agree that regular difficulty could stand to be slightly easier. (look at the stats I just posted above from Terrapin 2016, which I edited, to see what my vision of regular difficulty is) But I'm don't think this requires a major philosophical overhaul on what regular should look like--just a little bit of trimming at the edges.

Also, while anecdotes are important and good to hear, the plural of "anecdote" is not "anecdata." (I could tell anecdotes about how I personally did not mind getting my ass kicked by hard questions in 2010-2011, but that wouldn't be relevant/productive to the discussion) I think ACF should consider sending out a survey (either this year, or with next year's regs if logistics prohibit doing it this year) to see what the teams actually think, since generally, the people with the strongest opinions either way are gonna be the ones drafting long forum posts. Before making any radical changes (like reducing regular difficulty to EFT level), I think it's important to see if people actually desire those changes.

*Difficulty creep is a phenomenon where writers/editors tend to assume that because topic X has come up with some frequency, that makes it OK to ask about at lower difficulty levels. When left unchecked, this can produce terrible, terrible questions that subject teams that aren't canon-savvy to an awful playing experience. A classic example of this would be the tossup on Rutherford scattering at ACF Fall 2016. If something is getting too trendy to ask about at a particular difficulty level, the solution is NOT to ask about it at a lower difficulty level, the solution is to stop asking about it until it isn't trendy anymore.

EDIT: I fully endorse Caleb's post.

Wartortullian · Post by **Wartortullian** » Sun Feb 11, 2018 12:25 am

TheInventor wrote:When I went to regs, my team and I, which only consisted of freshman, did not have a good experience. In virtually every single game we played (with the exception of being destroyed by Penn A), an overwhelming majority of the tossups consisted of us listening to ridiculous clues we have never heard of before and struggling to pay attention to until the tossup became a high school tossup, at which point everyone woke up a massive buzzer race occurred. This continued until the tournament was over, at which point I, for the first time in my career, thought that I wanted to stop doing quizbowl. Of course, that was Irrational because I was beyond exhausted from listening to those 8 line tossups only to just wait for the cliff during the last third of the question. My teammates also shared this feeling of disgust, and honestly, I am not sure that I ever want to attend another ACF event (except maybe fall, but that's a big maybe) unless something is done to address this.

No offense, but what did you expect? I'm also on a relatively weak team, but we ended up getting a lot out of the experience. Sure, it was exhausting. Sure, we got our asses kicked by stronger teams. Sure, we had to slog through hard lead-ins. However, that's the whole point of playing above your difficulty level. I agree that this was a relatively hard Regionals, but I walked away from it a list of new things to read about (including a couple Borges stories I hadn't read--there's no such thing as too much Borges). In my opinion, being confronted with interesting stuff that you've never heard of is the main appeal of quiz bowl.

It sounds like you were overwhelmed by the size of the collegiate canon. I totally understand how demoralizing that can be, but there are plenty of sub-regionals tournaments each year (especially in a place like the mid-Atlantic, where the nearest tournament isn't 6 hours away).

armitage · Post by **armitage** » Sun Feb 11, 2018 12:39 am

I only meant to assert in my post that creating a D2 set seems logistically straightforward, unless the Regionals set poses unique challenges I wouldn't know about. If creating a D2 set would necessitate, say, reworking the Nationals set, or field and qualification system, or something else I'm not thinking of, then I can understand the concern, but otherwise it wouldn't strike me as a very radical change. I guess I'm interested in what the specific limitations are.

edit: And yeah I don't really have any comment on the philosophical questions here, but I felt like it was (probably still is) worth hearing views from people who want a "flagship" midseason set with shorter questions. Also I think a survey with Regionals is an excellent idea and it would be worth hashing out the parameters for one.

Sima Guang Hater · Post by **Sima Guang Hater** » Sun Feb 11, 2018 12:46 am

touchpack wrote: ...but the best teams already hit 23-25 PPB quite often on regular difficulty!

You forgot this one, but UVA 2014 also got 25 ppb on Penn-ance. Your point is taken; I usually think of Penn Bowl as somewhere just above regular, but if that's the case Regionals this year was even harder.

touchpack wrote: This year's poor stats on ACF regionals and SCT are particularly anomalous in that the best teams are weaker than the best teams have been in past years, which was compounded by the fact that the best team (Yale) didn't play either set this year. I agree that editors need to be aware of difficulty creep* (see below) and agree that regular difficulty could stand to be slightly easier. (look at the stats I just posted above from Terrapin 2016, which I edited, to see what my vision of regular difficulty is) But I'm don't think this requires a major philosophical overhaul on what regular should look like--just a little bit of trimming at the edges.

This is also acceptable; note I said a notch ABOVE EFT this year, where Penn A (me/J^2/Aidan) cleared 25 ppb. That would settle us somewhere around the tournaments in the links you posted.

touchpack wrote:I think ACF should consider sending out a survey (either this year, or with next year's regs if logistics prohibit doing it this year) to see what the teams actually think, since generally, the people with the strongest opinions either way are gonna be the ones drafting long forum posts.

This is a good idea.

Monstruos de Bolsillo · Sun Feb 11, 2018 12:50 am

matt2718 wrote:
No offense, but what did you expect? I'm also on a relatively weak team, but we ended up getting a lot out of the experience. Sure, it was exhausting. Sure, we got our asses kicked by stronger teams. Sure, we had to slog through hard lead-ins. However, that's the whole point of playing above your difficulty level. I agree that this was a relatively hard Regionals, but I walked away from it a list of new things to read about.

The first thought I had when reading this, and one of the main points from this thread, is that, if this is "above your difficulty level," isn't that right there a problem? Especially if this was a relatively hard Regionals. If Regs is as accessible as people want (or at least as people are making it out to be), then the difficulty level wouldn't really be a problem. This thread has demonstrated that some people think it's a huge problem, while others have brushed it aside. I understand ACF Fall has a purpose, but ACF Regionals, while being a Nats qualifier, still exists to meet many of the the same goals, and that includes being accessible to wide range of teams and skill sets.

touchpack · Post by **touchpack** » Sun Feb 11, 2018 12:51 am

Sima Guang Hater wrote:
You forgot this one, but UVA 2014 also got 25 ppb on Penn-ance. Your point is taken; I usually think of Penn Bowl as somewhere just above regular, but if that's the case Regionals this year was even harder.
.

Dope! I remembered that 25 PPB performance, but forgot that it was Penn-ance, not Penn Bowl, and thus couldn't find it on the db.

Monstruos de Bolsillo · Sun Feb 11, 2018 1:05 am

touchpack wrote:I think ACF should consider sending out a survey (either this year, or with next year's regs if logistics prohibit doing it this year) to see what the teams actually think, since generally, the people with the strongest opinions either way are gonna be the ones drafting long forum posts. Before making any radical changes (like reducing regular difficulty to EFT level), I think it's important to see if people actually desire those changes.

This is a very good idea, and it will lend itself to the opinions of all of those who played, not just those who frequent these boards, which is undeniably more of the in-crowd who don't see a need for change.

touchpack · Post by **touchpack** » Sun Feb 11, 2018 1:05 am

armitage wrote:I only meant to assert in my post that creating a D2 set seems logistically straightforward, unless the Regionals set poses unique challenges I wouldn't know about. If creating a D2 set would necessitate, say, reworking the Nationals set, or field and qualification system, or something else I'm not thinking of, then I can understand the concern, but otherwise it wouldn't strike me as a very radical change. I guess I'm interested in what the specific limitations are.

edit: And yeah I don't really have any comment on the philosophical questions here, but I felt like it was (probably still is) worth hearing views from people who want a "flagship" midseason set with shorter questions. Also I think a survey with Regionals is an excellent idea and it would be worth hashing out the parameters for one.

Having done D2 work for NAQT, I can assure you that a proper D2 conversion will take longer than you think it will. Every bonus has to be ~half rewritten to match the new difficulty. Many, many tossups need to be thrown out entirely, or rewritten to be on easier answerlines. Even for tossups with easy answerlines, usually cutting a couple early clues is insufficient--new late clues need to be added to avoid difficulty cliffs. It would require a dedicated head editor overseeing the project and a few converters to do properly. It's not completely out of the question, but it'd be a lot of work, and only should be considered if there's a lot of demand for it.

Jack · Post by **Jack** » Sun Feb 11, 2018 1:07 am

Steph Curry-Dwight Howard Isomorphism wrote:When I was a freshman, we got destroyed by Michigan and Chicago at our first tournament. Afterwards, I was pretty demoralized. I felt like JL, Auroni, and Will knew more than I'd ever know. However, instead quitting or complaining, I studied and got better.

matt2718 wrote: Sure, it was exhausting. Sure, we got our asses kicked by stronger teams. Sure, we had to slog through hard lead-ins. However, that's the whole point of playing above your difficulty level. I agree that this was a relatively hard Regionals, but I walked away from it a list of new things to read about (including a couple Borges stories I hadn't read--there's no such thing as too much Borges). In my opinion, being confronted with interesting stuff that you've never heard of is the main appeal of quiz bowl.

(sorry about quote formatting, I can't figure out how to fix it)
Fixed. - Mgmt.

I think this plays into what I am trying to suggest. Certainly, it is not uncommon for many teams and players to feel bogged down by the style of questions at ACF Regionals, or even initially dislike them. In your cases, you (very admirably) strove to improve, and did.

My argument is that it doesn't have to be this way. Would it not be better if players of most backgrounds were able to participate in a "Regular" event and enjoy it to a reasonable degree? If there's something that can be changed about ACF Regionals, whether it be difficulty, TU length, DII, etc., and it has a greater benefit to quiz bowl than its downsides, why shouldn't we implement it? If players really are feeling "demoralized" or "exhaust[ed]," that's certainly a bad thing. Frankly, a set can be challenging, intellectually stimulating, and reasonably accomodating without making people feel like giving up or demoralized. I'm not going to claim that everyone feels this way or anything (that would be hard to support), but based on those buzz point stats, I think it's reasonable to assume too many teams are going through that "slog."

Like I said before, there is nothing (I think) terribly wrong with the questions now. By Billy's own admission though, there is disagreement on what constitutes the appropriate difficulty for the set. (For the record, I'd like to say the idea of sending a survey to gauge satisfaction with the set after the tournaments is a fantastic idea. This should totally be a thing people do routinely).

I found that the data this year (though I did use some of it from the wrong source earlier as some pointed out, but I found the actual data still supported my argument regardless) indicated that ACF Regionals isn't the best it could be. Clearly, (unless my interpretation is entirely flawed) there must exist some set of changes that can improve ACF Regionals' enjoyment for many teams without sacrificing the quality that makes it unique.

armitage · Post by **armitage** » Sun Feb 11, 2018 1:15 am

touchpack wrote:new late clues need to be added to avoid difficulty cliffs

Wow, what the heck, I completely didn't think of this part. I see now that it would need at least like half the resources of a full set.

Cheynem · Post by **Cheynem** » Sun Feb 11, 2018 1:17 am

I haven't seen the stats and SCT is still unclear, but how would people who played both tournaments compare the difficulty between ACF Regionals and (DI) SCT, which I think ideally should be around similar difficulty?

Gen. Winfield Scott Hancock · Sun Feb 11, 2018 2:57 am

We had a discussion in the Discord tonight in which we analyzed tossups from a packet (E. Delaware A + Kentucky A + Cambridge A) to try to put together qualitative thoughts about their clues with the buzzpoint histograms. There were a few major conclusions at which we seemed to arrive.

1. Probably a majority of these tossups could likely be converted to D2. Some would require more work than others, to be sure, but if people are willing to put in the effort it doesn't seem out of the question.

2. Some tossup answerlines seemed to be examples of taking a reasonable answerline and stretching it out too far. As an example, the tossup on "valedictions" was probably too much; are we really at the point where a tossup on "A Valediction: Forbidding Mourning" is too stale?

3. The perception that the tossups were too hard is at least somewhat backed up by the fact that the lead-ins seemed in some cases excessive. Examples included the tossup on Mendelssohn, which included 5 score clues in the first several lines that were barely buzzed on, the tossup on Oscar Wilde, which included several lines of criticism, non-fiction, and obscure stage directions before getting to the actual plot points of Wilde's works, and the tossup on the Department of Agriculture, whose difficult first sentences were followed by several "old CE"-type clues that caused many negs. These lead-ins received very few correct buzzes, raising the question of what their purpose is.

This was one of the major questions that the group present determined merits further discussion: what is a lead-in supposed to do? Is it supposed to be buzzable by only one or two people as a measure of who is the very best at the highest levels of various categories? Should they always introduce new clues? Should at least 3-4 people across all sites be able to score a first-line buzz? Are they supposed to serve as context for the later clues? (In the last of these cases, the lead-ins seem to have failed to do so, as several seem to have not been related to the later clues thematically or have just been relative throwaway "person x said thing y about the answerline" clues). I don't know that there's a single right or wrong answer to that question, but it seems to be one about which we should have a discussion.

Another possible solution to the lead-in question is to cut them down entirely and use more middle clues. The estimable Wang Anshi proposed the term "fanning" to describe this, in which more middle-difficulty clues relating to the later clues are used to build the tossup from the bottom up, thereby excising these seemingly-unhelpful lead-in sentences. This sort of building-up structure would also function to reduce difficulty cliffs while still gradating between teams.

We didn't come to any single answer to these questions, because it can vary among different editing teams. Articulation of what lead-ins should do and are expected to be, though, could likely help in changing the perception of ACF Regionals tossups as too difficult, whether that comes through reducing the size of lead-ins or of tossups altogether. It seems like these topics would be worth further discussion, which is why I thought it important to sum up these points and put them out for comment from the wider quizbowl world.

AGoodMan · Post by **AGoodMan** » Sun Feb 11, 2018 3:27 am

Cheynem wrote:I haven't seen the stats and SCT is still unclear, but how would people who played both tournaments compare the difficulty between ACF Regionals and (DI) SCT, which I think ideally should be around similar difficulty?

In my opinion, Regs was a bit harder than SCT D1 in terms of tossup difficulty.

1.82 · Post by **1.82** » Sun Feb 11, 2018 3:35 am

One issue that we sometimes see in quizbowl discourse (and in many other places, really, but this is the quizbowl forum so the context here is quizbowl discourse) is people taking their personal experiences and universalizing them even when that's unjustified or unwarranted. If you're a college freshman and were an active high school quizbowl player, particularly if you played a lot of NAQT questions, then ACF Regionals questions (being college quizbowl questions) may bemuse you, what with their additional length and their clues that you don't already know. You may even find other freshmen who agree with you, because the sort of freshman who posts their opinion of a question set online tends to have been active in the high school quizbowl scene. Unfortunately, that doesn't say anything about whether your opinion is worth assigning weight to.

I played ACF Regionals for the first time three years ago, and it was the third quizbowl tournament that I ever played. Neither I nor my only teammate had ever played any sort of quizbowl at any level prior to that year. We lost a lot of games because we didn't know much, and in the bottom bracket we waited for a lot of questions to go to the end to find a clue that one of us knew. At no point did I complain that the questions were too long, because my idea of what a college quizbowl question should look like wasn't shaped by playing IS sets but rather by playing ACF Fall (which, incidentally, is exactly the "ACF Regionals but easier" tournament that several people have expressed a wish for in this thread). Instead of whining about all the clues I didn't know, I took my newfound exposure to those clues as an opportunity to learn new things, and my quizbowl experience was significantly improved as a result.

People who aren't willing to play questions that are longer than what they were accustomed to in high school are welcome to play SCT every year and nothing else. I'm not sure why the rest of collegiate quizbowl would have to suit their whims by ridding itself of something that people enjoy.

Sima Guang Hater · Post by **Sima Guang Hater** » Sun Feb 11, 2018 3:59 am

Naveed wrote:Tells people not to generalize based on their experience

Also Naveed wrote:Generalizes based on his experience

Not everyone shares your outlook, and not everyone is galvanized into getting better by trudging through a set with 8-lines of unbuzzable (to them) clues. There's a significant number of people who are telling us their concerns, and I think it's totally reasonable to hear them out.

I also find it kind of bizarre to say that we shouldn't assign weight to people who played NAQT in high school and are "used" to short questions; they're a significant portion of the quizbowl-playing population nowadays (given the penetration of NAQT sets throughout the country). Other posters have offered several ways to shorten questions and make them more buzzable throughout without destroying the learning experience or the ability for the set to differentiate teams. There's clearly room to accommodate everyone's preferences here.

Sun Feb 11, 2018 4:06 am

gettysburg11 wrote:We had a discussion in the Discord tonight in which we analyzed tossups from a packet (E. Delaware A + Kentucky A + Cambridge A) to try to put together qualitative thoughts about their clues with the buzzpoint histograms. There were a few major conclusions at which we seemed to arrive.

1. Probably a majority of these tossups could likely be converted to D2. Some would require more work than others, to be sure, but if people are willing to put in the effort it doesn't seem out of the question.

2. Some tossup answerlines seemed to be examples of taking a reasonable answerline and stretching it out too far. As an example, the tossup on "valedictions" was probably too much; are we really at the point where a tossup on "A Valediction: Forbidding Mourning" is too stale?

3. The perception that the tossups were too hard is at least somewhat backed up by the fact that the lead-ins seemed in some cases excessive. Examples included the tossup on Mendelssohn, which included 5 score clues in the first several lines that were barely buzzed on, the tossup on Oscar Wilde, which included several lines of criticism, non-fiction, and obscure stage directions before getting to the actual plot points of Wilde's works, and the tossup on the Department of Agriculture, whose difficult first sentences were followed by several "old CE"-type clues that caused many negs. These lead-ins received very few correct buzzes, raising the question of what their purpose is.

This was one of the major questions that the group present determined merits further discussion: what is a lead-in supposed to do? Is it supposed to be buzzable by only one or two people as a measure of who is the very best at the highest levels of various categories? Should they always introduce new clues? Should at least 3-4 people across all sites be able to score a first-line buzz? Are they supposed to serve as context for the later clues? (In the last of these cases, the lead-ins seem to have failed to do so, as several seem to have not been related to the later clues thematically or have just been relative throwaway "person x said thing y about the answerline" clues). I don't know that there's a single right or wrong answer to that question, but it seems to be one about which we should have a discussion.

Another possible solution to the lead-in question is to cut them down entirely and use more middle clues. The estimable Wang Anshi proposed the term "fanning" to describe this, in which more middle-difficulty clues relating to the later clues are used to build the tossup from the bottom up, thereby excising these seemingly-unhelpful lead-in sentences. This sort of building-up structure would also function to reduce difficulty cliffs while still gradating between teams.

We didn't come to any single answer to these questions, because it can vary among different editing teams. Articulation of what lead-ins should do and are expected to be, though, could likely help in changing the perception of ACF Regionals tossups as too difficult, whether that comes through reducing the size of lead-ins or of tossups altogether. It seems like these topics would be worth further discussion, which is why I thought it important to sum up these points and put them out for comment from the wider quizbowl world.

This is perhaps tangential to the current discussion, but is there any chance we can get a log of this posted somewhere? I'd love to see a series of "close readings" of the Regs packets in line with the advanced stats we have access to (the inner nerd in me wants a podcast but that's prob too much work). It's wishful thinking but I think it could be fun to do if people have the time for it!

Cheynem · Post by **Cheynem** » Sun Feb 11, 2018 10:30 am

I sincerely believe ACF Regionals should be a "tournament for all." It is ACF's flagship regular difficulty tournament and as the phrase "regular difficulty" indicates, this is the kind of set that should be the "mean" (regular). It's one thing for some teams not to do particularly well (I don't think anyone is arguing that the teams that got stomped at Regionals would suddenly win all their games on an easier set). It's another thing if the experience seems muddled in pure frustration. We shouldn't knee jerk accept anyone's anecdote about what needs to be done, but I don't think we should dismiss all the concerns as "simply wanting high school level questions."

I'd like to narrow down the points of frustration, though. Typically, complaints about "too hard questions" mean:

a. the questions were too long, i.e. all of the early clues and lead-ins were unanswerable
b. the answerlines (including bonuses) were too hard and could not be converted
c. some combination of both (the dreaded too long question on too hard thing)

Based on some of the posts, it seems like a lot of the frustration over Regionals was over point a. (and maybe c), more so than point b. Certainly the post Ryan made suggests that at least in that packet, that was the issue. Oscar Wilde and Secretary of Agriculture aren't too hard (I don't know what the giveaway was for Ag Sec), but if you overstack the lead-ins, maybe they become too hard. There is an argument to perhaps cut such clues down. Easier said than done, of course.

Regarding podcasts and discussion, when I was a child in 1950, there was a show called "Author Meets the Critics," in which the author of a book went on TV and met critics, many of whom disliked his book, and they discussed things for 15-30 minutes. I wonder if something like that is repeatable today. I find a lot of collegiate discussions over tournaments on the forums in which the "author" is involved unfortunately become stilted--the author sets specific requirements for discussion and may become unnecessarily defensive, leading to some silly back and forths or a positive feedback loop. The "critics" have grievances regarding a question they didn't get or a neg they thought was unfair and take the discussion down rabbit holes. Some sort of moderated chat or podcast in which the editors or authors of questions as well as a few critics who have prepared thoughts and represent different skills and interests might be fascinating.

Victor Prieto · Post by **Victor Prieto** » Sun Feb 11, 2018 12:59 pm

What is this bizarre infatuation with the educational power of lead-ins? Like, at regular collegiate difficulty, there is plenty of new, interesting information in middle clues and bonuses. Clearly, the middle clues are full of new information because most of the field isn't buzzing on them! From a competitive standpoint, it makes more sense anyway to prioritize the middle clues rather than the lead-ins, unless you're a top 10 team making a run at a championship. Personally, I've never been so hugely intrigued by lead-ins to go look into them more, except for like the one or two that are in my specific field.

As far as I can tell, very few people in this thread are defending 8-line tossups, and several people are defending the difficulty of ACF Regionals (excluding tossups on Middletown and Rebecca Solnit) while suggesting they be shortened to 6-7 lines. I would place myself in this camp.

Cheynem wrote:I sincerely believe ACF Regionals should be a "tournament for all." It is ACF's flagship regular difficulty tournament and as the phrase "regular difficulty" indicates, this is the kind of set that should be the "mean" (regular). It's one thing for some teams not to do particularly well (I don't think anyone is arguing that the teams that got stomped at Regionals would suddenly win all their games on an easier set). It's another thing if the experience seems muddled in pure frustration. We shouldn't knee jerk accept anyone's anecdote about what needs to be done, but I don't think we should dismiss all the concerns as "simply wanting high school level questions."

During practices, I've gotten a lot more resistance to reading ACF Regionals packets from the last couple years versus DI SCT, DII ICT, or a straight up easier set. I think it's because people get frustrated by listening to most of a tossup before being able to buzz in, and I do think that it partly stems from being conditioned by the length of high school questions (but not entirely). I still think future editors of Regionals and other sets should consider mitigating the increase in tossup length from high school to college, to probably 6-7 lines. I think that will be more effective than radically restructuring the difficulty of Regionals looks like.

Based on some of the posts, it seems like a lot of the frustration over Regionals was over point a. (and maybe c), more so than point b. Certainly the post Ryan made suggests that at least in that packet, that was the issue.

I agree with this point: difficulty is not as much of an issue as having too many hard clues at the beginning of tossups. Here's a question: if you take the mean buzz point for all teams (e.g. Penn A 39%, Johns Hopkins B 57%, Carnegie Mellon C 87%), then removed the top 15 teams, then find the mean of the remaining teams, how much downwards does the mean buzz point go? I picked 15 teams because that seems to me like the echelon to whom regular difficulty should not apply.

Here's another question: between bottom bracket and middle bracket teams, what were the mean buzz points? If, out of 20 tossups, 5 are going dead and the remaining 15 don't get buzzed on until the 95% mark (partway through the giveaway), then it's going to be really frustrating for teams to sit through 8 lines tossups. I'm fine with 5 tossups going dead, because lower-tier teams are definitely going to have gaps in their knowledge, but teams will feel frustrated if they get their asses handed to them on the parts of the distribution they do know.

I actually can't access this information on my own because the spreadsheet can't load or just crashes when I try to use it.

Cheynem · Post by **Cheynem** » Sun Feb 11, 2018 1:04 pm

I'm fine with a 7 line cap for questions at the Regionals level.

How I think of a lead-in at regular difficulty is something that 2/3 of the field wouldn't buzz on, but would hopefully still find interesting and at least provide context. In a 7-line tossup, that's probably one clue (and ideally the best 1/3 teams are buzzing on it). Obviously that's not perfect, but hopefully these buzz point stats can help us get there (I'm aware that not everyone will share my 1/3 breakdown). I do agree middle clues are very important and that's something we could all do better on.

Jack · Post by **Jack** » Sun Feb 11, 2018 1:11 pm

Eric and Mike are hitting the nail on the head with their posts. I was following the discussion of the packet that was analyzed in the quiz bowl discord, and many of the questions had extremely skewed buzzpoints (and some even had major difficulty cliffs-- which are now represented as actual cliffs!

). To Mike's three points, the analysis of that one packet showed that some of the answerlines were probably too hard or too cheeky, and some of the answerlines had too many obscure clues loaded in the front before getting to more "common knowledge" to be acceptable for (what people discussing considered) an appropriate difficulty for ACF Regionals.

To Naveed's point, I think your post misses my point: the data we now have strongly implies that many people are experiencing the phenomena that Mike described. Contrary to your assertion, I was not really active in the high school quiz bowl scene. I would hope that my argument based on analysis of data isn't discredited because I've only ever managed 10ppg at a college tournament. That being said, I understand where you're coming from, and certainly an inexperienced player isn't in as good a position to comment on the theory behind solutions to quiz bowl. Rather, I'm trying to focus more on the analysis of the issues at hand.

Multiple people have described their poor experiences at an ACF Regionals, yet some of those same people have said something along the lines of "just git gud" to solve your problems. Certainly, not everyone who sucks their first game is going to eventually become some ACF Nationals all star, even if they improve. I agree with Mike (and I'd hope most people would, too) that a difficulty of "Regular" would mean that it is applicable to all skill levels, but still adequately rewards the best teams. Just because ACF (probably) does not meet that standard, doesn't mean it ought to be that way. Teams that get destroyed should walk away with a feeling of optimism, and with a feeling that they have much to learn. "Whining about clues" is probably warranted when too many (which, granted, is subjective) clues at a "regular" event do not actually get "regularly" used in buzzing. Where the line is to determine better clue distribution is debatable, but I think how it stands now is not the best way.

There's a difference between playing a set of six or six and a half lines and buzzing on the last clue and playing a set of eight/nine lines and buzzing on the last clue. It's easy to look at Sebastian's post and just discredit him as some salty frosh, but his post still echoes the concept of "nobody cared during the first few lines of the tossup" until closer to the end (which we can see happening in the stats). As Eric said, there's probably a way to make ACF Regionals more accessible and buzzable without ruining it for better teams. ACF Fall and ACF Regionals are different, but changing Regionals does not necessarily create unacceptable overlap between the two.

I think finding a way to disseminate that discussion on the QB Discord is essential. Looking at the frequency distributions of buzzpoints for each specific tossups is quite telling of how many of the tossups can be perceived as Mike's points a or b. Thanks to Ryan, also, for taking the time to share some of that analysis.

vinteuil · Post by **vinteuil** » Sun Feb 11, 2018 4:55 pm

Cheynem wrote:I'm fine with a 7 line cap for questions at the Regionals level.

How I think of a lead-in at regular difficulty is something that 2/3 of the field wouldn't buzz on, but would hopefully still find interesting and at least provide context. In a 7-line tossup, that's probably one clue (and ideally the best 1/3 teams are buzzing on it). Obviously that's not perfect, but hopefully these buzz point stats can help us get there (I'm aware that not everyone will share my 1/3 breakdown). I do agree middle clues are very important and that's something we could all do better on.

What's the point in having 7 lines if the first one doesn't differentiate between the top 1/3 of teams?

FWIW, I (as someone who's written many 9-line regular-difficulty tossups) agree that 7 lines is a good place for regular difficulty to go. I try to think about those lines as

Someone with a reasonably high level (varies depending on the target audience!) of expertise on this specific answerline will get it
Ditto, for this subdistribution
Ditto, for this 1/1 of the distribution
Ditto, for this larger category
and onward to differentiate between various levels of general knowledge.

In my experience, this is a heck of a lot more practical than trying to normalize tossup structures to some idea of "how many teams can buzz on this." That's partly because we decide to emphasize certain subdistributions because of what we think is their intrinsic importance, despite knowing that many fewer teams are likely to have high levels of expertise in them.

In essence I agree that clues that don't have an "audience" in mind (a hypothetical or real population of people that the writer is pretty sure will get it) are probably going to be very frustrating to play, particularly if they aren't funny, cool, or memorable in some other way. (Rereading my Rite of Spring TU from CO 2014 from this perspective is...something.) But I don't agree that we should even try to make tossups fit a template in terms of how many teams buzz by each line.

Post by **theMoMA** » Sun Feb 11, 2018 4:58 pm

The graphs that Tejas posted here seem to me to illustrate that this year's Regionals bonuses did a fairly exceptional job at hitting difficulty targets and maintaining cross-category consistency. Easy parts were converted around 85-90% of the time, middle parts were converted around 45-50% of the time, and hard parts were converted 15-20% of the time, regardless of category.

There is a similar "three-part structure," producing similar results, to tossups. In my view, tossups generally consist of the lead-in/hard clues (typically the first 2 or so lines, consisting of 1-2 sentences), the middle clues (typically the next 2-3 lines, consisting of 2-3 sentences), and the giveaway (typically the last 2-3 lines, consisting of 1-2 sentences). I'll address the purpose of these sections of the tossup from the top down.

The purpose of the lead-in/hard clues is to introduce the player to the topic and present 1-2 clues that really test the players' knowledge bases (and are ideally highly interesting). The lead-in/hard clues should clearly state what kind of answer is being asked for with non-misleading words pointing to the answer, such as "this country" or "this process" or "this author" (although these are often called "pronouns," that's not actually what they are in a strict sense, and you shouldn't use actual pronouns such as "he," "she," or "it" until the question has clearly stated what kind of answer is being asked for with terms like "this action" or "this battle").

The lead-in/hard clues serve a slightly different function than the hard parts of bonuses. Whereas a bonus part is pointless to write if literally no team stands a chance to get it, the lead-in/hard clues do play a major role in the tossup, even if no one buzzes on them. These clues set the table for the tossup: they introducing the player to the topic and give a knowledgeable player a chance to buzz on difficult clues, but even if no player knows the clues, the savvier players can still separate themselves from the less savvy by contextualizing the lead-in to narrow down the possible range of answers, leading to a confident buzz later in the question. It's also important to remember another key distinction between lead-ins/hard clues and hard parts of bonuses, namely that the penalty for guessing at a bonus part is nothing, while the penalty of guessing on a lead-in/hard clue is severe. I think it stands to reason that lots more early buzzes would occur if players got a free guess on the first two lines, and that, given quizbowl players' well-honed skills at contextualizing information, many of those buzzes would be successful, much as many good teams' guesses at hard parts succeed.

The middle clues are where the rubber meets the road, and the good teams typically compete with each other or pull away from the less-talented teams. Whereas the purpose of the lead-ins/hard clues is to introduce the topic with a clear reference to the answer and present very difficult information, the middle clues are where you start seeing information that is solidly "canonical." By "canonical," I mean that this information has likely come up before, or in a non-packet-studying sense, is something that is important enough to the understanding of the topic at hand that, if you expect that topic to come up, you should expect to be tested on this information. Players with mastery over the topic at hand separate themselves from generalists in the middle clues by putting to use their superior knowledge bases in conjunction with the context provided by the lead-ins/hard clues to generate early-middle-clue buzzes, while the generalists lurk, waiting for a question to fall through the cracks so they can put their broad knowledge bases and contextual knowledge skills into practice in the later-middle clues or the beginning of the giveaway.

Like middle parts, the middle clues are designed to allow passage to a player/team with sufficiently deep knowledge, while blocking the way of any player/team with a shallower understanding of the topic. And like middle parts, middle clues are where conversion rises from a trickle to a steady flow. If the conversion doesn't quite reach 50% in the middle clues, recall again that tossups incentivize patience while bonuses incentivize guessing.

The giveaway, in my view, doesn't just consist of the final line containing "for 10 points," because the purpose of the giveaway is to descend gradually to the easiest clue (although not always the easiest possible clue, depending on the difficulty level of the tournament); instead, I think of the giveaway as consisting of the last twoish lines, typically composed of two sentences. The penultimate sentence will sometimes present a second-easiest clue that is unrelated to the easiest clue ("This author argued that a Jewish prophet was actually born an Egyptian follower of Akhenaten in Moses and Monotheism. For 10 points, name this Austrian psychoanalyst who wrote The Interpretation of Dreams."), or sometimes the second-easiest clue will consist of a description of the easiest clue ("This author included the episode "Irma's Injection" in an 1899 book arguing that the title phenomena are forms of 'wish fulfillment.' For 10 points, name this Austrian psychoanalyst who wrote The Interpretation of Dreams."). Sometimes, the sentences will be combined so as to straddle the FTP ("An episode called "Irma's Injection" was described by, for 10 points, what psychoanalyst, who called the title phenomena a form of 'wish fulfillment' in his book The Interpretation of Dreams?").

Regardless of the form that the giveaway takes, I think it's analogous to the easy part of a bonus, and that somewhere in these lines, conversion is expected to reach the 85-90% mark, on average. Thus, teams that fall in the bottom part of the skill distribution will largely be battling over these clues, much as those teams' bonus conversion is largely a function of their converting the easy parts.

Here's how the three-part structure of tossups maps to the three-part structure of bonuses. An easy part to a bonus is designed to test a very basic fact about the topic or theme at hand to assess whether a team has any knowledge of the topic at all. The giveaway to a tossup does the same, except that it should be as pyramidally shaped as possible, given that bonus parts are all or nothing, while tossups are often still contested right down to the end.

A middle part to a bonus is designed to test a more difficult fact about the topic or theme at hand to assess whether a team has good "canonical" coverage of that topic. There are a wide range of possible middle parts to any topic or theme, and the main difference in the skill of bonus conversion is how many possible middle parts the team is prepared to answer across the spectrum of topics and categories. The middle clues serve a similar function--they are the place in the tossup where competitive teams are batting, and they are composed of canonical information about a topic--but with a few key differences: even if no one buzzes on the middle clues, they still provide additional context to inform a later buzz; there are usually 2-3 middle clues, arranged in descending order of difficulty, on which the competitive teams typically are competing; and middle clues are less likely to be converted than middle parts of bonuses because tossups incentivize risk aversion while bonuses incentivize guessing.

Finally, the hard part to a bonus is designed to stretch a team's knowledge base and see whether they have truly mastered the possible answer space for a particular theme or topic while introducing the teams to interesting clues and connections. There are so many possible hard parts for any given topic or theme that no team can be guaranteed to convert the hard part, so converting them requires a lot of skill and a bit of luck. The lead-ins/hard clues of tossups operate similarly--they are designed to stretch a team's knowledge and introduce interesting clues and connections. Like with middle clues above, there are some slight differences: lead-ins don't need to be converted to serve their function, and in fact are less likely to be converted than hard parts because teams have a strong incentive to be risk averse when buzzing on a tossup.

The effect of making lead-ins more buzzable would be similar to the effect of removing hard parts from the game. It would flatten the playing field, reduce the advantages of the most knowledgeable players, and cut down on the space for introducing new and interesting material. I think it would also be very difficult to do this in a way that didn't frequently err on the too-easy side; I see the lead-ins/hard clues as a courtesy to the most knowledgeable players, allowing them to get one or two pieces of information that the writer/editor judges to be quite difficult before the beginning of the material that's come up before and/or that players are effectively "on notice" that they should know. If you start with that material right away, you've got a tournament like ACF Fall, which is both quantitatively and qualitatively different to play. A tournament like Fall still tends to result in the best team winning, but the marginal space in the tossups and bonuses that separate the best from the rest is much smaller, and thus, the resolving power of those questions is lower.

Let me clarify that I'm not arguing for extremely long tossups. NAQT generally manages to keep the three-part structure of tossups intact despite being significantly shorter than circuit-style questions. But I think it's important to recognize the key role that lead-ins play in making the tournament a fair test for the teams in contention, even if the lead-ins aren't resulting in buzzes.

Cheynem · Post by **Cheynem** » Sun Feb 11, 2018 5:04 pm

1/3 was probably too high on my part, but I think even if 1/4 (let's say) of the best teams are buzzing on the first clues, that doesn't mean you can't gradate between the top teams. Not all of the top teams are buzzing on that line, the first/second line can have a few different clues, etc. I think your breakdown of lines is fine, so I'm probably just not expressing myself well.

Muriel Axon · Post by **Muriel Axon** » Sun Feb 11, 2018 5:22 pm

I think we all agree that a lead-in can have multiple functions: (1) Including clues that players very knowledgeable about the answer line can plausibly buzz on; (2) giving context for later clues and ruling out particular classes of answers; and (3) introducing knowledge that fascinates listeners and makes them want to learn more. (This list is not meant to be exhaustive; perhaps just being amusing could be a virtue in some contexts, just not at the expense of these more important characteristics.)

I think we agree that (1) and (2) are the most important functions of a lead-in, but the plausibility criterion of (1) is hard to operationalize, which makes it tempting to push the limits. Without commenting on whether the lead-ins in this set are too hard, or why, I think there are a few reasons why even skilled writers may make lead-ins too hard. For example, I have the habit -- which I try to moderate -- of building tossups around particular clues that I find exciting, even if those too clues may be too hard for the field.

Of course, one may also systematically overestimate (or just not think about) the number of people who have had the specific kind of experience one needs to get a certain clue. A particularly hard tossup on Ebola in one iteration of MUT arose from my (in retrospect, deluded) notion that, since two or three high-profile papers on Ebola had come out in Science recently, there would be a good chance that a few players may have seen and read those papers. I've seen lead-ins to science tossups at regular difficulty that refer to papers with 50 citations and offer very little other helpful context. If there are only 200 people in the world who have read a particular paper, chances are that none of them are quiz bowl players!* Recently, I've tried to keep in mind that -- for example -- when writing ecology questions, a lead-in should still be something that an ABD grad student in ecology could very likely get -- and so on, for other fields.

To me, it seems like the main solution is for writers to try consciously to empathize with players, and to temper their desire to stack clues that are quite hard just because the writer finds them cool. (Temper rather than eliminate, since I think a lot of writers get joy out of introducing fresh and interesting clues.) But it's a hard problem, because the distinction between a clue that 2-4 players can buzz on and a clue that nobody can buzz on, while often substantial, is hard to discern, which makes it easy to trick oneself into thinking that a clue that one particularly likes is among the former.

*One exception that was delightful to me, and perhaps only me: The leadin to an ACF Nationals TU on "tar pits" referred to a paper by "Gerhart et al." whose senior author gave a guest lecture on her tar pit research to the plant ecophysiology class I was TAing

Auroni · Post by **Auroni** » Sun Feb 11, 2018 7:11 pm

I may post a more detailed self-reflection as head editor but for now, I wanted to quickly address this:

2. Some tossup answerlines seemed to be examples of taking a reasonable answerline and stretching it out too far. As an example, the tossup on "valedictions" was probably too much; are we really at the point where a tossup on "A Valediction: Forbidding Mourning" is too stale?

This question (whose conceit was unexpectedly matched by a TAMU submission) was written because "A Valediction: Forbidding Mourning" is a little too short to write a tossup on with just lines from the poem, and because the other two poems mentioned early in the questions appear often in anthologies but have almost never come up in past questions.

Gen. Winfield Scott Hancock · Sun Feb 11, 2018 7:17 pm

Robert Williams Avenger wrote: This is perhaps tangential to the current discussion, but is there any chance we can get a log of this posted somewhere? I'd love to see a series of "close readings" of the Regs packets in line with the advanced stats we have access to (the inner nerd in me wants a podcast but that's prob too much work). It's wishful thinking but I think it could be fun to do if people have the time for it!

Here it is, apologies in advance as it's rather long. There are also some messages that will appear as blank; those cases were pastes of images, either the cumulative buzz diagrams from me or exact buzzpoint examples from Ryan Rosenberg (with a couple of others from Ophir, e.g.).

https://pastebin.com/MGmsj6N2

Fado Alexandrino · Post by **Fado Alexandrino** » Sun Feb 11, 2018 7:22 pm

I generally agree with Naveed, Mike and Billy, and as a player who only really started to play lots of quizbowl seriously in University, thought that "regular difficulty" was just how quizbowl is like. That being said, this Regionals set could have been easier in ways already mentioned above in terms of answerline. Generally, I think a rule of thumb of removing the first line and adding another middle or pre-FTP clue would have engaged the teams in the lower percentiles. The median buzzpoint for good teams is around 65% right now, and I think this could be reduced to a most of 50%, where the top 5-10 teams or whatever "power" half the tossups they get.

A calculation I really liked was Samir Khan's simulations of rematches of games at the UIUC site. Another related simulation I would find interesting is a clue-by-clue simulation of how a tossup would play, both at the 2018 regionals level and at different proposed easier levels. I've never played a tournament easier than regular on full McGill A since DII ICT in 2014, so I'm not sure how winning probabilities differ at different difficulties. Intuitively I feel that a problem with easier questions is that the advantage a good team has over a weak team is most prominent in the first several clues of a tossup.

Cheynem wrote:I sincerely believe ACF Regionals should be a "tournament for all." It is ACF's flagship regular difficulty tournament and as the phrase "regular difficulty" indicates, this is the kind of set that should be the "mean" (regular).

I think a lot of discourse on this topic, at least my opinions on it, need to be based on the goal that ACF has with their regionals set. I personally can get behind this idea, but I can also get behind the idea that the current definition that regular difficulty is too hard, all regular season tournaments should be MUT/EFT level, but regionals is a difficulty level that aspiring teams can strive for if they want to qualify for nationals.

ErikC · Post by **ErikC** » Mon Feb 12, 2018 11:45 am

One function of a lead-in that no one has mentioned yet is that it introduces material that could be tossed-up at a harder tournament. Someone who plays ACF Fall and looks up a bunch of interesting things they heard at the beginning of questions (who is this Mansa Musa guy and why did he have so much money? why do so many people know about this Achebe guy?). When ACF has three consecutive tournaments with a notable increase in difficulty, playing each one introduces new material that can be learned for the next tournaments, and the lead-ins act like a guide for getting better at the next level of difficulty.

ryanrosenberg · Post by **ryanrosenberg** » Tue Feb 13, 2018 5:43 pm

I put together graphs of buzzpoint distribution by subcategory. Overall, the categories with the highest distribution of early buzzes were trash, geography, and current events, and the categories with the lowest distribution of early buzzes were philosophy, physics, and early Continental/Near Eastern history.

I also split the distributions into parts by team PPB percentiles, so you can see, for instance, that music is relatively harder for bad teams than for good teams.