Introducing BPA, a new evaluation metric using detailed stats

Elaborate on the merits of specific tournaments or have general theoretical discussion here.
Post Reply
User avatar
ryanrosenberg
Auron
Posts: 1355
Joined: Thu May 05, 2011 5:48 pm
Location: Chicago, Illinois

Introducing BPA, a new evaluation metric using detailed stats

Post by ryanrosenberg » Wed Oct 17, 2018 3:52 pm

What is BPA?

BPA stands for Buzz Point AUC (area under the curve). It is the total area under the curve of [% of tossups gotten successfully] against [% of question elapsed].

The theoretical maximum is 100 (i.e., if all tossups were gotten near-instantly); however, top players will generally get somewhere between 10 and 15 at regular difficulty, which corresponds to preventing about 15% of the total words in the tournament's tossups from being read by getting the question. Top teams will generally get around 20-25 at regular difficulty. As an illustration, below is Chris Ray's buzz point graph from 2018 ACF Regionals.

Image

BPA can be calculated for any tournament that records buzz points.

How do I calculate BPA?

BPA is actually pretty easy to calculate, especially for an individual player. The below screenshot shows an example of calculating conversion percent at each buzz point (max gets is the number of possible tossups heard, so games played times 20), and BPA is simply the sum of column F (over all buzz points 0.01 to 1).

Image

What are the advantages of BPA over other quizbowl stats?

BPA is the first metric to take advantage of buzz point tracking and provide a more detailed view into how early people are getting questions. This reveals player skill that may be masked by traditional stats.

For example, let's look at the top two scorers from the Minnesota site of CMST: Shan Kothari and Auroni Gupta. Shan outscored Auroni by about a tossup per game, and recorded seven powers to Auroni's four. However, Auroni had a 6.9 BPA, while Shan comes in at 6.47, since Auroni was buzzing earlier on a higher percentage of tossups, particularly in the late-middle clues, before Shan overtakes him during giveaways. BPA ranking Auroni over Shan is in line with subjective appraisals of the two players (the player poll had Auroni as a top-5 player in grad school, and Shan in the 10-15 range), but neither of the traditional stats (PPG and powers) capture this difference in skill.

Image

What are BPA's shortcomings?

BPA is still, like PPG, a heavily context-dependent stat, and is not exactly comparable across fields of different strength (or even across different schedules in the same field). Teammate effects are also fairly strong; BPA does not incorporate the PATH adjustment for shadow effect since I believe that introduces more false positives than the false negatives it corrects.

Who does BPA say is good at quizbowl?

The top 10 players at CMST were Jordan Brownstein (18.04), Jacob Reed (11.36), Stephen Liu (10.51), Neil Gurram (10.21), Eric Mukherjee (9.05), John Lawrence (8.37), Rafael Krichevsky (7.99), Matt Bollinger (7.95), Will Alston (7.36), and Auroni Gupta (6.9).
The top 5 teams were Brownstein et al. (23.86), Yale (23.13), BHSU A (20.45), Bloq Mayus (18.95), and Chicago A (18.13).

The top 10 players at 2018 Regionals were Eric Mukherjee (17.15), Jakob Myers (15.68), Aseem Keyal (14.33), Evan Lynch (12.89), Rafael Krichevsky (12.84), Eric Wolfsberg (12.72), Adam Silverman (12.56), Chris Ray (12.18), John Lawrence (11.82), and Derek So (11.64).
The top 5 teams were Penn A (28.32), Berkeley A (27.6), Chicago A (25.35), Columbia A (25.05), and Maryland A (25.03).

There's also category-specific BPA! Here are overall and category-specific rankings for 2018 Regionals and CMST.
Ryan Rosenberg
North Carolina '16 | Ardsley '12
PACE | ACF

User avatar
vinteuil
Auron
Posts: 1356
Joined: Sun Oct 23, 2011 12:31 pm

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by vinteuil » Wed Oct 17, 2018 4:09 pm

I think this might be the most precise (and intuitively useful) non-PATH-like stat we've ever had—thanks to Ryan for the computations and visualizations!
Jacob Reed
Chicago ~'25
Yale '17, '19
East Chapel Hill '13
"...distant bayings from...the musicological mafia"―Denis Stevens

User avatar
Periplus of the Erythraean Sea
Auron
Posts: 2048
Joined: Mon Feb 28, 2011 11:53 pm
Location: New York, NY

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by Periplus of the Erythraean Sea » Wed Oct 17, 2018 4:17 pm

Auroni Gupta (6.9)
nice
Will Alston
Bethesda Chevy Chase HS '12, Dartmouth '16, Columbia Business School '21
NAQT Writer and Subject Editor

User avatar
t-bar
Tidus
Posts: 633
Joined: Sun Jan 25, 2009 4:12 pm
Location: Cambridge, MA

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by t-bar » Wed Oct 17, 2018 6:47 pm

This is awesome! Thanks for putting the work into coming up with this.

This is also an interesting statistic to look at on a game-by-game basis, though you have to take the results with a grain of salt. Here are the top 10 games from 2018 ACF Regionals by total BPA:

Code: Select all

Winner		Loser		Score		Winner BPA	Loser BPA	Total BPA
Berkeley A	UC San Diego B	500-80		37.915		6.7		44.615
Cambridge B	Oxford B	315-290		24.515		19.83		44.345
Penn A		Villanova	490-50		37.735		6.595		44.33
Penn A		Johns Hopkins A	375-240		31.585		12.31		43.895
Northwestern A	MSU A		320-285		15.855		26.835		42.69
Columbia A	Amherst		355-200		26.67		15.785		42.455
McGill A	McGill B	315-175		26.845		15.42		42.265
Penn A		Delaware	490-115		31.745		10.04		41.785
Northwestern A	Ohio State A	385-215		22.425		19.155		41.58
Columbia A	Harvard A	375-170		26.98		14.565		41.545
Note that in the fifth game, Northwestern A beat MSU A despite having a significantly lower BPA. This is partly due to the fact that Northwestern waited until the end on all three of MSU's negs, while not negging at all themselves. However, even on the 7 live tossups they converted, Northwestern had an average buzz location of 0.547, substantially later than MSU's average of 0.463.

Here are the five games with the closest margin of BPA, selected from among games with a total BPA of at least 30:

Code: Select all

Winner		Loser		Score		Winner BPA	Loser BPA	Total BPA
Ohio State A	Chicago B	325-240		15.1		15.26		30.36
Harvard A	Yale A		310-245		15.9		14.855		30.755
McGill A	Toronto A	270-240		14.95		16.84		31.79
MSU A		Chicago A	310-260		17.33		19.585		36.915
Berkeley B	Stanford	305-230		15.215		17.56		32.775
In all but one of these games, the winner had the lower BPA. However, only some of them can be chalked up to a negstorm by the losing team. For example, in the McGill-Toronto game, McGill went 9/4 to Toronto's 10/2 and won on the strength of their bonus conversion.

Interesting questions for future BPA analysis: what fraction of games are won by the team with the lower BPA? In these situations, can we discriminate between occurrences of (a) one team waiting to the end on a bunch of negs, (b) one team out-bonusing the other, (c) one team having a large advantage in certain categories and being able to sit on those questions, (d) something else? Perhaps it's fruitful to only consider tossups that were not negged, in order to restrict the analysis to situations in which both teams are playing each tossup live. This requires a bit more careful work to determine the number of tossups heard, but it's certainly possible with the data we have.
Stephen Eltinge
TJHSST 2011 | MIT 2015 | Yale 20??
ACF member | PACE member | NAQT writer

User avatar
AGoodMan
Rikku
Posts: 333
Joined: Sat Dec 20, 2014 10:25 pm
Location: Cambridge, MA or Warrenville, IL

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by AGoodMan » Wed Oct 17, 2018 11:58 pm

This is super cool! Is there any chance we can see similar metrics for EFT?
Jon Suh
Wheaton Warrenville South High School '16
Harvard '20 (Co-President)
PACE

User avatar
ryanrosenberg
Auron
Posts: 1355
Joined: Thu May 05, 2011 5:48 pm
Location: Chicago, Illinois

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by ryanrosenberg » Thu Oct 18, 2018 8:07 am

AGoodMan wrote:
Wed Oct 17, 2018 11:58 pm
This is super cool! Is there any chance we can see similar metrics for EFT?
Yes, I'll post EFT BPA later today.
Ryan Rosenberg
North Carolina '16 | Ardsley '12
PACE | ACF

User avatar
ryanrosenberg
Auron
Posts: 1355
Joined: Thu May 05, 2011 5:48 pm
Location: Chicago, Illinois

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by ryanrosenberg » Mon Oct 22, 2018 10:02 pm

Here's a public link to code used to generate overall BPA for last year's Regionals.
Ryan Rosenberg
North Carolina '16 | Ardsley '12
PACE | ACF

User avatar
ProfessorIanDuncan
Wakka
Posts: 195
Joined: Tue Dec 20, 2011 10:37 pm

Re: Introducing BPA, a new evaluation metric using detailed stats

Post by ProfessorIanDuncan » Wed Oct 24, 2018 1:40 pm

Does this metric factor in negs? Would that be a useful feature? It seems that adding a negative value, namely the difference between the minimum of question length and correct answer buzz point and the neg point, could shed some insight on how negs affect how much of the tournament is heard. I suppose that this would fail to take into account teams waiting until the end of the question to convert, so maybe its not that useful of an addition.
Alec Vulfson
Irvington High School '13

Post Reply