A Close Look into Dwight Wynne's PANTS

Old college threads.
Locked
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

A Close Look into Dwight Wynne's PANTS

Post by cvdwightw »

PANTS, or Points Against Normalized Team Statistic, is a relatively easy-to-calculate method for comparing teams playing on the same packet set at multiple sites.

Steps in PANTS:

1. Collect the number of powers (if applicable), 10-point tossups, and negs for each team. Also collect the number of games played or tossups heard for each team and each team's bonus conversion.

2. Obtain a "strength of schedule (SOS) factor." This can include:
  • Average Tossup Points per Tossup Heard (TPTH) of a site, divided by the average TPTH of all teams (or equivalently tossups per game, but I'm generalizing here).
  • Average Bonus Conversion of a site, divided by the average BC of all teams.
  • An average of Opponent's TPTH (calculated either using or not using games played against that team) weighted by the number of times a team plays each opponent, divided by the average Opponent's TPTH of all teams
  • Weighted average of Opponent's Powers per Tossup Heard, Tens per Tossup Heard, and Negs per Tossup Heard as calculated above (3 different SOS factors)
  • Some other statistic that you like using that accurately categorizes how strong each site's field is.
3. Multiply each team's powers, tens, and negs by (strength of schedule factor/games played). This assumes that a team's knowledge level (as measured by power/ten ratio) and aggression (as measured by negs) are independent of opponent. Alternatively, if you have different factors for powers/tens/negs, multiply the relevant statistic by the relevant strength of schedule factor, then divide by games played. For timed NAQT tournaments, multiply by 20/TUH.

4. Compute the Points Per Game (or PP20H) against an average "normalized" team using the formula:

Adj_Powers*15+Adj_Tens*10+Adj_Negs*(-5)+(Adj_Powers+Adj_Tens)*BC

This (the "PANTS") is a measure of how many points a team would be expected to score in a game against a totally average team in the field.

ADVANTAGES OF PANTS:
  • Clearly and unambiguously ranks teams across different sites; furthermore, gives a reasonable assessment of each team's strength expressed in units that make intuitive sense (points per game)
  • Takes into account the exact way in which tossups and bonuses affect a team's score
  • Works with any number of sites/teams and on any non-terrible format (may work for terrible formats too)
  • Depending on what you choose for your strength-of-schedule factor, can be quickly and easily computed with an Excel spreadsheet
SUBJECTIVE EVALUATION OF SOME FORM OF PANTS AS AN S-VALUE:
  • PANTS was developed while thinking about S-values and first computed with 2009 SCT D1 statistics, but it is stressed that PANTS is not intended as an S-value for this or future years. Several critical flaws preclude PANTS as it is proposed; however, it may be possible to modify PANTS to lessen or circumvent these flaws.
  • In PANTS, it is always better to answer a question than to not do so. I have not yet looked into forfeits, but one simple idea (stolen from previous ramblings in the S-value thread) is that teams that do not show up receive 0 points in 20 tossups (and do not count that opponent in any weighted SOS factors).
  • It is as-yet unknown to what degree intentional gaming of the system to drop to a lower playoff bracket (and thus artificially inflate adjusted tossups) affects the system. It is hypothesized that the SOS factor, if chosen correctly, can account for a team playing in a much weaker field, but does not completely account for a team playing in a lower bracket within the same field.
  • It is hypothesized that the average conversion ratio (D2 conversion/D1 conversion) on common tossups/bonuses should yield a measure of how the "D2 average opponent" would compare to the "D1 average opponent." Therefore multiplying the D2 SOS factor by the conversion ratio for tossups and the D2 BC by the conversion ratio for bonuses should adequately scale D2 vs D1 fields/packet sets. Similarly, a conversion ratio should yield a measure of CCCT vs D2 "average performance" on tossups; thus, multiplying the CCCT SOS factor by the conversion ratio for tossups should allow for easy insertion of CC schools into the D2 ICT list (alternatively, one could recalculate everything with the CCCT as an additional site; it is unknown how this will affect PANTS).
  • It is unknown whether PANTS "almost always" invites a higher-finishing team over a lower-finishing team with better statistics. Plotting each team's within-sectional reported finish against each team's within-sectional PANTS ranking yields an R^2 of 0.9739, but it is unknown whether this holds for other years/divisions or how the addition of D1 teams playing on D2 sets will affect it.
FUTURE WORK ON PANTS
  • A comparison of different SOS factors is possibly necessary.
  • I have "proof-of-concept" with 2009 D1 SCT data (largely because I had already calculated most of what I needed back when I was looking at crazier things) and need to verify its usefulness on ACF-style tournaments.
  • PANTS can be converted into an expected winning percentage by the formula 1/(1+(T2/T1)^EXP), where T1 and T2 are the PANTS for teams 1 and 2. This formula is similar to that for "Pythagorean" win-loss except that it does not use points against/scored but a team's expected points against an average team. It is yet to be determined what exponent EXP works best. Attempts to convert this to a Bradley-Terry model, which appears to be the more statistically correct thing to do, have so far failed due to improper scaling factors. I'm not sure what the purpose of this would be, but it looks cool.
Comments/questions/criticisms welcome.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Close Look into Dwight Wynne's PANTS

Post by theMoMA »

I would question the idea that power-to-ten ratio remains consistent in varying field strengths. In my experience, this is simply false. Teams buzz differently against different fields, and at a tournament like SCT where the questions aren't particularly tough, there are going to be good players from different teams outbuzzing each other to powers on a regular basis.
Andrew Hart
Minnesota alum
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Close Look into Dwight Wynne's PANTS

Post by theMoMA »

Reading again, I can see how adjusting 15s, 10s, and -5s by themselves is a fine way to go; you could certainly find some kind of linear weights to them based on field strength, but that doesn't really seem necessary. I think this stat is a fine way to combine a strength of schedule factor and bonus conversion.
Andrew Hart
Minnesota alum
User avatar
Down and out in Quintana Roo
Auron
Posts: 2907
Joined: Wed Apr 09, 2008 7:25 am
Location: Camden, DE
Contact:

Re: A Close Look into Dwight Wynne's PANTS

Post by Down and out in Quintana Roo »

cvdwightw wrote: [*]Average Tossup Points per Tossup Heard (TPTH) of a site, divided by the average TPTH of all teams (or equivalently tossups per game, but I'm generalizing here).
[*]Average Bonus Conversion of a site, divided by the average BC of all teams.
[*]An average of Opponent's TPTH (calculated either using or not using games played against that team) weighted by the number of times a team plays each opponent, divided by the average Opponent's TPTH of all teams
[*]Weighted average of Opponent's Powers per Tossup Heard, Tens per Tossup Heard, and Negs per Tossup Heard as calculated above (3 different SOS factors)
[*]Some other statistic that you like using that accurately categorizes how strong each site's field is.
What are the fastest ways to compute these sorts of statistics? The time factor alone is what would make this difficult, but the idea is really interesting.
Mr. Andrew Chrzanowski
Caesar Rodney High School
Camden, Delaware
CRHS '97-'01
University of Delaware '01-'05
CRHS quizbowl coach '06-'12
http://crquizbowl.edublogs.org
User avatar
Mechanical Beasts
Banned Cheater
Posts: 5673
Joined: Thu Jun 08, 2006 10:50 pm

Re: A Close Look into Dwight Wynne's PANTS

Post by Mechanical Beasts »

Five minutes in Excel, a lifetime of profit. I exaggerate, but this wouldn't be too hard or computationally intensive.
Andrew Watkins
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: A Close Look into Dwight Wynne's PANTS

Post by cvdwightw »

theMoMA wrote:Reading again, I can see how adjusting 15s, 10s, and -5s by themselves is a fine way to go; you could certainly find some kind of linear weights to them based on field strength, but that doesn't really seem necessary. I think this stat is a fine way to combine a strength of schedule factor and bonus conversion.
If I understand you correctly, you're suggesting to weight powers vs tens by the power/ten ratio of the entire field. That's an interesting idea - fields with a higher power/ten ratio should probably steal a higher percentage of powers than fields with a lower power/ten ratio. I'm not sure how this would differ from a straight linear scaling of 15's, 10's, and -5's independently, and it doesn't matter of course with pure ACF data since there are no 15's - you get beat, you get beat.
Dr. Isaac Yankem, DDS wrote:What are the fastest ways to compute these sorts of statistics? The time factor alone is what would make this difficult, but the idea is really interesting.
The degree of difficulty varies. Something like average bonus conversion is ridiculously easy - put all the bonus conversions in an Excel file, find the average, find the average of each site - the only time-consuming part is putting in the data; once you have that it takes maybe five minutes to do everything else. Similarly for TPTH/PPG - these are extremely easy to calculate. Weighted averages are slightly more computationally intensive but I'm guessing are more accurate - here you just weight each opponent's TPTH/Powers/Tens/Negs by how many times a team played that opponent, then you take the average of that; we can do this since most stats now have game-by-game data. The most computationally intensive one (the only one that takes more than about a few minutes of Excel computation) is to compute each team's opponents' statistics using only games that they didn't play against that team. The only reason this is computationally intensive is that you actually have to work game-by-game and subtract out the tossup points earned against a team. Also, when you're dealing with small sample sizes (such as teams that only get like 2-3 powers the whole tournament), it can really skew the results.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
Avram
Lulu
Posts: 90
Joined: Thu Oct 09, 2008 5:45 pm

Re: A Close Look into Dwight Wynne's PANTS

Post by Avram »

cvdwightw wrote:the only time-consuming part is putting in the data; once you have that it takes maybe five minutes to do everything else.
One of my back-burner projects is results integration more substantial than SQBS HTML output hosting. In our discussions over how to make QBTPS a good system, Ray and I have talked about possible solutions, and we hope to show prototypes of new results formats and results integration in the coming weeks or months. The only way to make this easy is for all hosts to share tournament data in computer-readable formats and for quizbowlers to develop software that analyzes such data effectively. We can do both of these things, I think.
Avram Lyon
Kazan Federal University '11
UCLA '14 (or so)
Grinnell '06
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Re: A Close Look into Dwight Wynne's PANTS

Post by cvdwightw »

Note to enterprising quizbowl statisticians: do not invent statistics in which Minnesota buzzes on 23 out of 20 tossups against an average opponent. PANTS is under revision to fix this issue.
Dwight Wynne
socalquizbowl.org
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: A Close Look into Dwight Wynne's PANTS

Post by theMoMA »

I don't know if there's anything wrong with that. It might not allow you to say that "this is how many points that Team X will actually get playing an average opponent over all sites of the tournament." But I don't know why you'd want to do that, anyway, and it should still rank the teams in the right order.

Let's imagine a tournament with two sites, both of which feature the same number of teams, and one team at each site gets every tossup (for simplicity's sake). Let's just say that the average team at Site A gets 5 PPTUH points, and the average team at Site B gets 15. You obviously don't want the team at Site B to have the same value in your stat as the one at Site A, since getting all of the tossups (impossible though it is) is much more of a feat against the field at Site B (we're setting it to 1.5 times as hard, if we're using 15/((15+5)/2) as the "park factor"). So yeah, your stat might say that that team is answering 30 tossups per game, but that actually kind of represents what they're doing, relative to average.

You could slide the numbers back so that the top team only "buzzes on" a maximum of 20 tossups (or whatever you want for SCT), but you're just dividing all the results by a number, so you might as well not do it.
Andrew Hart
Minnesota alum
User avatar
Kwang the Ninja
Rikku
Posts: 480
Joined: Thu Apr 23, 2009 3:25 pm

Re: A Close Look into Dwight Wynne's PANTS

Post by Kwang the Ninja »

I did a spreadsheet calculating this for ACF Fall. Link.
This looks very accurate. Awesome job, Dwight.
Dallin Kelson
Chipola '11, UF '13
User avatar
marnold
Tidus
Posts: 706
Joined: Wed Jan 17, 2007 12:32 pm
Location: NY

Re: A Close Look into Dwight Wynne's PANTS

Post by marnold »

I have a hard time believing Mike Sorice performed worse on this set than 4 high school teams. The conventional stats certainly don't imply that.
Michael Arnold
Chicago 2010
Columbia Law 2013

2009 ACF Nats Champion
2010 ICT Champion
2010 CULT Champion
Member of Mike Cheyne's Quizbowl All-Heel Team

Fundamental Theorem of Quizbowl (Revised): Almost no one is actually good at quizbowl.
User avatar
Pilgrim
Tidus
Posts: 647
Joined: Mon Oct 08, 2007 12:20 pm
Location: Edmonton

Re: A Close Look into Dwight Wynne's PANTS

Post by Pilgrim »

marnold wrote:I have a hard time believing Mike Sorice performed worse on this set than 4 high school teams. The conventional stats certainly don't imply that.
I think this is largely caused by the hilariously terrible bracketing at the Chicago site. If, for example, you were using average PPB as a park factor, the average PPB of the whole Chicago field was 15.56, but the average PPB in Illinois A's bracket (wherein all of their games from the stats were played) was 17.79. Changing Illinois A's PANTS to account for this puts them ahead of everyone except Maggie Walker and State College, which seems reasonable to me for ACF Fall.

I think this also brings up an important issue with the strength of schedule factor - it doesn't account for the format of the tournament. Depending on rebracketing and such, a team at one site might have a lot more games against the top x% of teams at that site than some team at a different site.

Edit: On re-reading the original post, some of the proposed SOS factors account for this. I don't know what Dallin used, but I'm guessing it's one that just looks at the average of a site, since those are way easier to calculate.
Trevor Davis
University of Alberta
CMU '11
User avatar
Auks Ran Ova
Forums Staff: Chief Administrator
Posts: 4295
Joined: Sun Apr 30, 2006 10:28 pm
Location: Minneapolis
Contact:

Re: A Close Look into Dwight Wynne's PANTS

Post by Auks Ran Ova »

Pilgrim wrote:Changing Illinois A's PANTS to account for this
hee hee hee hee
Rob Carson
University of Minnesota '11, MCTC '??, BHSU forever
Member, ACF
Member emeritus, PACE
Writer and Editor, NAQT
Avram
Lulu
Posts: 90
Joined: Thu Oct 09, 2008 5:45 pm

Re: A Close Look into Dwight Wynne's PANTS

Post by Avram »

Weighted averages are slightly more computationally intensive but I'm guessing are more accurate - here you just weight each opponent's TPTH/Powers/Tens/Negs by how many times a team played that opponent, then you take the average of that; we can do this since most stats now have game-by-game data. The most computationally intensive one (the only one that takes more than about a few minutes of Excel computation) is to compute each team's opponents' statistics using only games that they didn't play against that team. The only reason this is computationally intensive is that you actually have to work game-by-game and subtract out the tossup points earned against a team. Also, when you're dealing with small sample sizes (such as teams that only get like 2-3 powers the whole tournament), it can really skew the results.
I've added a preliminary SOS statistic to the testing installation of my stats system, using the weighted TPTH, excluding games against a given team (see http://quizbowl.gimranov.com/qbsql/index.php?t=mopants). I implemented it as an additional column in the Team Stats table-- if you would like to run this on your data, you would need to add your data to the test installation of QBSQL-- it can import SQBS data (not HTML) files -- using http://quizbowl.gimranov.com/qbsql/tour ... modify.php. The Master Username is "login" and the Master Password is "password".

In order to turn this into an SOS statistic as Dwight described it, you will still need to find the average across all sites. For convenience, I provide the average of the SQBS tournament's SOS statistic at the bottom of the table.

In order to run this on any given set played across the country, you can create a tournament for it in QBSQL, then import all of the SQBS files from each site. The team stats page should then give correct SOS data for all teams. If someone does this, I can also make changes to the script to calculate all of the proposed SOS statistics and matching PANTS.

Edit: I would be happy to upload and implement the various metrics if someone sends me the SQBS datafiles for the sites of any tournament of interest. Send files to [email protected] and I'll try on the various PANTS.
Avram Lyon
Kazan Federal University '11
UCLA '14 (or so)
Grinnell '06
Locked