A-value SOS adjustments are overtuned

CPiGuy · Post by **CPiGuy** » Sun Jan 26, 2025 5:34 pm

See title.

I've been meaning to make this post for close to a year and realized that ACF Regionals is next weekend so I really ought to do so now.

There has been much virtual ink spilled over whether A-value formulas appropriately adjust for strength of schedule; much of this discussion is largely vibes-based because it is, in fact, really hard to compare performances at different sites against different opposition. However, I think there is a data point from last year that proves fairly conclusively that the SOS adjustment is overtuned.

Our 2024 ACF Regs site was very close to a triple round-robin. We finished with the highest PPG, the best tossup stats, and the highest PPB (and all of these things were true before the 10th round). Despite leading the field in every statistical category (at least every category that matters for A-value -- wins do not), and our only game against a non-common opponent being a 530-(-10) win, which surely should not have decreased our A-value, we ended up with the third-best raw A-value due to the SOS adjustment.

The only explanation I can find for this is that the SOS adjustment is so strong that our opponents received more of a boost to their A-values for having to play a team with our stats three times than we received for... being the team with those stats. (Either that, or scoring 530 points and going 19/0 against Minnesota C was still such a hit to our SOS that we'd have been better off not playing that game, which is also a ridiculous proposition.)

I think the SOS adjustment should be tuned down until it would result in us having had the best raw A-value; it's just plainly unreasonable that SOS appears to have a stronger effect on A-value than actual statistics in this situation.

(Side note: I am also confused why Minnesota C had a lower SOS than ASU or Minnesota B, since they didn't have to play themselves at all; they should have had one of the highest SOS adjustments in the country! Is there some element to the SOS computation I'm missing, or was there actually an error in computing their SOS?)

CPiGuy · Post by **CPiGuy** » Sun Jan 26, 2025 7:09 pm

At Kevin's suggestion (on Discord), I did some "alternate history" to explore various feasible scenarios.

Had the tenth game not been played, our A-value would have been 1.6 higher. This would have been enough for us to go up one spot. This means that our A-value went *down* for playing a game in which we went 19/0 and scored 530 points. This seems extremely suboptimal -- there was basically no way that playing this game would have helped us.

I also changed the outcome of every game we lost by flipping the outcome of one cycle. This resulted in a situation in which we went 10-0 and had the best stats in every statistical category that A-value measures, but were still more than 10 points of A-value behind ASU because of the SOS adjustments.

Another thing I discovered is that, when computing SOS, stats from games against your team are thrown out. This makes sense for tossup stats but not for points per bonus, since bonuses are opponent-independent; this ended up significantly hurting us because Minnesota B had a very low PPB in two of their games against Arizona State, so this (rightfully) lowered our SOS but didn't affect ASU's SOS at all. This is another source of unnecessary variance in the SOS adjustment that makes it difficult to justify using it as a large factor.

Santa Claus · Post by **Santa Claus** » Sun Jan 26, 2025 9:06 pm

As I have been named, I'll give my two cents.

I agree that it looks bad that a team with strong statistic performance would have a worse A-value than others in its field due to SOS, but even with this data point I don't think it's definitive that it is "overtuned". The point about playing weaker teams lowering A-value seems unhelpful as well - ideally a strong performance would cancel out the decrease in SOS from playing a below-median team but it's not that surprising that this is not always true.

The ultimate goal of A-value is to determine which teams qualify for nationals. Basically any change to SOS will change which ones do and in what order; this is inevitable. Iowa State missed qualifying in the first 48 by an A-value difference of 4.1. If SOS was calculated differently it is possible that would have made up that distance but without doing the math it's unclear: there are plenty of other teams just above and below the cutoff that are in similar situations and different metrics will benefit some more than others.

It is also worth noting that Iowa State did, in fact, finish third at their mirror. It's true that the same trends in SOS would be present if they had won more times, but A-value averaging means they likely would have been in the first 48 to qualify if they had won even one more game. Minnesota B was also very close to not qualifying at that threshold - if SOS was calculated differently they may have failed to do so despite going 2-1 and 1-2 against statistically superior teams.

I would imagine that the only practical changes to SOS are ones which don't do too much and have at least some justification. For instance, I assume ACF is more likely to drop bonus conversion from the calculation of SOS than raise SOS by a factor of 0.9 or something. This is the sort of thing we'd want to run more complete simulations of (and which I assume have been run internally by ACF).

edit: very stupidly forgot that Iowa State ultimately qualified for nats; adjusted language to account for the actual thresholds I was using

Fado Alexandrino · Post by **Fado Alexandrino** » Mon Jan 27, 2025 3:29 pm

The problem with the A/D value SOS calculation (from what I can tell) is that if your schedule is weak enough, there literally aren't enough points in a game you could score to balance out the fact that your opponents suck.

thedoge · Post by **thedoge** » Mon Feb 03, 2025 2:20 pm

CPiGuy wrote: ↑Sun Jan 26, 2025 7:09 pm Another thing I discovered is that, when computing SOS, stats from games against your team are thrown out.

Could someone at ACF add this to the website on A-values? https://acf-quizbowl.com/nationals-qual ... /#a-values

Post by **theMoMA** » Tue Feb 04, 2025 1:16 pm

The SOS measure is sensitive to small field effects. In this case, the issue is that there were three teams that scored roughly 300 points per game and had roughly 16-17 points per bonus, and one team that scored roughly 50 points per game with roughly 9 points per bonus.

At the end of a triple round robin, the teams evidently played a 1 v 2 and 3 v 4 game to reach ten games. Accordingly, the 1 and 2 seeds (not Iowa State) played 3 games against the 50 ppg team, and Iowa State played 4 games against the 50 ppg game. As a result, Iowa State's SOS was 0.95, meaning that Iowa's final A value was 5% lower than its non-SOS-adjusted A value. The 1 and 2 seeds each had a SOS near 1.2, meaning their SOS-adjusted scores were about 20% higher than their non-SOS-adjusted scores.

I suspect it's not appropriate to judge the SOS measure based on such small field sizes, with the added dimension that one of the teams was extremely weak compared to the national average, so that playing that team just one more time lowered the SOS a lot. The conditions that create this unexpected result are very rare, and expecting SOS to be perfectly resilient to them feels almost like expecting a theory about the macro universe to work perfectly at the quantum level. I know, as of my research a few years ago, that the SOS as calculated makes NAQT D values correlate much better to ICT finishes (with same rosters) than excluding it. I suspect that any issues with SOS are these "quantum breakdown" issues at small field sizes with unexpectedly wonky team-strength configurations and are likely not indicative of problems in more "normal" fields.

I can't find documentation of whether ACF has this, but NAQT's analogous D values have an SOS floor: SOS cannot go below 0.75, and lower values are replaced by 0.75. This was based on my research on the performance of teams that qualified from extremely low SOS fields; I found that, below 0.75, field strength effects seem to give way to team strength effects.

I wonder if that may be true on the level of individual teams. For instance, I'm not sure it makes sense to consider a team that is a 50 A value team twice as easy to play against as a 100 A value team. In fact, it may be roughly as easy to score against this team. A ~50 A value team is probably scoring about 3 tossups per game with about 8 points per bonus. A ~100 A value team is probably scoring about 5 tossups per game with about 10 points per bonus. A median team playing a 100 A value team would expect to score perhaps 15 tossups vs. 17 against the 50 A value opponent, meaning the difference in the median team's scores should be expected to be roughly 10% rather than 50%. (If a median team is roughly 15 ppb, 15 tossups * (10 + 15) = 375 points, and 17 tossups * (10 + 15) = 425 points, about a 9% difference.)

I don't have the time or inclination to research this at the moment, but I suspect placing a floor on individual teams' components for the purpose of SOS calculations, I'm guessing so it would be around a 100 A value floor (although the SOS is not based on A value), would fix much of the issue Conor has identified.

It's somewhat annoying to actually calcuate SOS, so let's imagine a toy model in which three teams had an A value for SOS purposes of 300 and one team had an A value of 50. Note that I'm calculating SOS by A value rather than by TU and bonus components for simplicity's sake to demonstrate the effect I'm talking about.

Team 300-1 (plays 300 teams 7 times and 50 team 3 times): average opponent A value = 225
Team 300-2 (plays 300 teams 7 times and 50 team 3 times): average opponent A value = 225
Team 300-3 (plays 300 teams 6 times and 50 team 4 times): average opponent A value = 200

If the median A value is 210 (which is about right), then Teams 300-1 and -2 will have an SOS of 1.07, while Team 300-3 will have an SOS of 0.95.

If we placed an A value floor for purposes of SOS calculation, and the 50 team were adjusted to 100, then Teams 300-1 and -2 would have an SOS of 1.14, and Team 300-3 would have an SOS of 1.04. Both teams would be shifted upward, and the difference between the A values would decrease from 0.12 to 0.10. A value would still find Team 300-3 the least strong because it scored the same number of points despite having an additional game against the weakest team.

I haven't thought through whether placing a floor in the SOS calculation would have other effects, and I haven't investigated how it would work, but I think it is theoretically sound in that it accounts for the fact that, below a certain point, opponent strength is not really operative to how many points a team is expected to score, just because the difference in expected tossup points taken away by the opponent only differs by a small percentage. I welcome anyone interested to research this further.

Stinkweed Imp · Post by **Stinkweed Imp** » Thu Feb 06, 2025 1:04 am

CPiGuy wrote: ↑Sun Jan 26, 2025 7:09 pm At Kevin's suggestion (on Discord), I did some "alternate history" to explore various feasible scenarios.

Had the tenth game not been played, our A-value would have been 1.6 higher. This would have been enough for us to go up one spot. This means that our A-value went *down* for playing a game in which we went 19/0 and scored 530 points. This seems extremely suboptimal -- there was basically no way that playing this game would have helped us.

While a cap would help with extreme values, it doesn't seem unintuitive to me that winning a game could cause your adjusted a-value to go down: the purpose of strength of schedule adjustments are to try to capture the fact that points scored against good teams are worth more. If a game improves your stats, it makes sense for it to improve the adjusted stats of the teams that beat you as well.

The Quizbowl Resource Center

A-value SOS adjustments are overtuned

A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned

Re: A-value SOS adjustments are overtuned