Database Hits Do Not Determine Difficulty

Dormant threads from the high school sections are preserved here.
Locked
User avatar
Adventure Temple Trail
Auron
Posts: 2751
Joined: Tue Jul 15, 2008 9:52 pm

Database Hits Do Not Determine Difficulty

Post by Adventure Temple Trail »

Over the past few months, especially since the (much-welcomed) revival of Quinterest, I have seen a particular type of flawed reasoning rear its head repeatedly in quizbowl discussions. In set discussion contexts, the form of the argument goes roughly as follows:

(1) I perceive a problem with a particular question, topic, or clue.
(2) I searched a packet database and got n hits.
(3) n is a problematic number.
(4) Therefore, sufficient evidence exists that the problem is as I say it is.

To make it slightly more concrete, I have seen a rise in claims such as the following:
  • “I heard a tossup on an appropriate answer line that used <CLUE> towards the middle last weekend. I have only seen one instance of <CLUE> ever being used before, and it was earlier, at a harder tournament. Therefore, that clue seems really hard and probably should have been earlier.”
  • “I believe that <TOPIC> is important and needs to come up more. I searched Quinterest and only saw it come up a single-digit number of times. Therefore, it is clear that <TOPIC> doesn’t come up enough relative to how important I believe it is.”
  • “This question used a leadin clue from a text which has never been asked about in quizbowl before. That fact alone makes the current leadin too hard for this difficulty level.”
  • “Eh, <TOPIC> isn’t actually too hard for this difficulty level. I searched a packet archive and it has come up n times before, so competent teams had a chance to learn it and it's probably fine.”
All of these types of claims are flawed. And I think they are having a harmful effect on quizbowl discussion. Here are two reasons why, followed by some more fruitful ways of assessing new questions for future editing or set-discussion conversations which can be used instead.

Database searches do not determine a topic’s difficulty

One can get a rough idea of how previous editors treated difficulty by searching a database, but it’s important to bring in outside reasoning and real-world exposure to a judgment of how hard something actually is. Some examples of cases where a database-only search fails to provide an accurate impression:
  • A topic has been used twelve times before in high school quizbowl and it was too difficult every time. Reusing that topic in the same way would propagate error forward and result in low conversion rates.
  • It is still the case that there are unasked or underasked possible topics which are genuinely easy despite not having come up much before, if at all. E.g.: a literary bestseller just broke onto the scene, or a scientific breakthrough has just hit the news. It would be impossible for a clue from Donna Tartt’s The Goldfinch to come up in packets before 2014, since that book was released in 2014. Nonetheless, such a clue was used in the 2014 NSC (and converted) due to the book’s rapid sales and apparent literary merit.
  • A question writer who has actual experience with a field deliberately picks a never-before-used clue based on its repeated use and importance in classes they’ve taken, scholarship they’ve read, lab work they've done, a show they've seen, etc. Attempts to test outside-quizbowl learning through fresh early clues will appear artificially difficult if one’s only reference point is past packets, even in cases where such clues are very recognizable to those who know them.
  • A topic was “trending” / in the midst of a “canon bubble” several years ago, coming up a lot in a short window, but has since faded from popularity and has appeared a lot less in sets from the past year or two. This makes it less likely that newer players will have internalized that topic from quizbowl exposure, even if a prior generation of players all had to know that topic cold to stay competitive.
Looking at past packets gives only a reactive sense of how questions have been, rather than a predictive sense of how they ought to be. What’s more, sending the message that “difficulty goes down as database frequency goes up” is harmful to current and future question quality. It implies that question writing should do little beyond recycle previously-used clues in the same order as their previous uses, and endorses the attitude that quizbowl is merely a circular test of who has memorized the words from past quizbowl. Instead of uncritically accepting past frequency as a determiner of difficulty, it’s important to determine why previous question writers thought it was good to repeatedly ask about a topic and ascertain independently whether they were correct in their judgment.

As of now, database searches can’t accurately determine a topic’s frequency

As Joelle alluded to in the thread about GLBTQ topics in quizbowl (in which I agree with Vasa, Casey, Alex, and Colin, as I hope some of my past question writing shows), the current options for searchable online databases just aren’t very big. A huge number of sets aren’t on Quinterest yet, and there’s hardly any topic that appears more than ten times within that database. What's more, searches can often be thrown off by simple mishaps. Example: If a topic has multiple spellings, or is given in some packets in the original language but in others in English translation, that can easily alter the result numbers, even if one is careful.

When new data becomes available from tools such as Quinterest, it’s important to ensure that we don’t draw large mistaken conclusions from small unrepresentative samples, and it’s even more important to ensure that our perceptions of appropriate difficulty, or adequate representation, or current trends in “the canon” don’t get warped through the overuse of a skewed sample.

What’s more, even assuming a searchable database of every publicly-accessible question set were to exist, such a database would not include a single NAQT set. At least in high school quizbowl, NAQT is by far the dominant producer of questions, and has an outsize role in setting the difficulty and frequency of question topics through its gathering of conversion data. As long as NAQT sets remain for sale rather than publicly available, any sort of attempt to search ALL the things has to be marked with a small asterisk.

Ways to develop a keener sense for difficulty than just database search
(or: how to reinforce those database claims which are worthwhile and correct)

This post is not so secretly a roundabout way of saying the following: If a person wants to make useful constructive points in tournament discussion, they need to develop a sense of difficulty along multiple axes, not just from looking back at old questions sets. I have a couple of suggestions of things I’ve done towards that end.

I will go ahead and note, just for the sake of full disclosure: I do think that looking through old sets to see how clues were used or worded in past tournaments is one useful method that editors can use to make their own questions better. But until the time when databases are actually more comprehensive, I strongly recommend doing this by maintaining a large packet archive of every available set on one’s own hard drive, making sure to update it as new sets are publicly released.

One other heuristic I use when I need to get quick-n’-dirty data on ordering clues is to do a comparison of Google hits at large (or a keyphrase search in Google Books), rather than a search just in packets. (e.g.: “Hmm, looks like ‘Freya of the Seven Isles’ has about 75,000 hits and ‘Nostromo’ has a million...let’s filter that down to ‘Nostromo Conrad’ to get rid of the random pop culture...still 280,000 -- I guess that’s easier since more sites exist about it.”) It's also possible to do this for single clues. This method is itself very flawed, and it takes some context to use well, but when you need a quick snap judgment of whether one title is easier than another early in a question, and don’t have much real-world exposure, it can sometimes work. --Retracted per a lower post. Someone has got to tell the people...

Another good method for improving one’s judgments of difficulty is to take opportunities to moderate to real teams as a staffer at tournaments. Staffing at tournaments, and paying attention to where mid-level and weak teams buzz as you go, is a great way to get a sense of how well teams do on questions if you’re able to do so often. This is particularly important if you have written for or just edited a set and you have ultimate responsibility for how stuff actually plays out.

Lastly, it’s always a good idea to draw on genuine exposure to and investigation of academic material if one has it, or try to gain some such exposure if one doesn’t have it.



To conclude: The next time a question feels off to you, try to do more beyond just a quick search of old packets; it'll provide a more compelling argument and do more to help you develop a sense of appropriate difficulty for your own writing projects. Happy discussing, everyone.
Last edited by Adventure Temple Trail on Thu Nov 06, 2014 12:17 am, edited 2 times in total.
Matt Jackson
University of Chicago '24
Yale '14, Georgetown Day School '10
member emeritus, ACF
User avatar
vinteuil
Auron
Posts: 1454
Joined: Sun Oct 23, 2011 12:31 pm

Re: Database Hits Do Not Determine Difficulty

Post by vinteuil »

To sort-of build off what Matt has posted (all of which is of course fantastic and with which I totally agree), I would very much welcome any developers who would be able to integrate basic data etc. from the internet at large (google hits, wikipedia stuff, etc.) into Quinterest or some Quinterest-like entity, or at the very least to integrate googling "Site:quizbowlpackets.com "CLUE"" into its search capabilities.
Jacob R., ex-Chicago
User avatar
Auks Ran Ova
Forums Staff: Chief Administrator
Posts: 4295
Joined: Sun Apr 30, 2006 10:28 pm
Location: Minneapolis
Contact:

Re: Database Hits Do Not Determine Difficulty

Post by Auks Ran Ova »

This is a really good post and I endorse it fully.
Rob Carson
University of Minnesota '11, MCTC '??, BHSU forever
Member, ACF
Member emeritus, PACE
Writer and Editor, NAQT
User avatar
Excelsior (smack)
Rikku
Posts: 386
Joined: Sun Jan 25, 2009 12:20 am
Location: Madison, WI

Re: Database Hits Do Not Determine Difficulty

Post by Excelsior (smack) »

One other heuristic I use when I need to get quick-n’-dirty data on ordering clues is to do a comparison of Google hits at large
While we're debunking myths, here is another myth: "Google does not lie about the number of results it has for a given query". If you click through the results for "Freya of the Seven Isles" (quoted), you will find that the results peter out around page 19, at 184 results or so. A far cry from 75,000! Worse still, there is no obvious relationship between the actual number of results for a query and the number of results Google claims exists for a query. Ordering is not even preserved - Google may report that A has more results than B, when, in actuality, B has more results than A.

I doubt that this heuristic is a major part of anyone's question-writing process, but the informed writer should nonetheless be informed that it is, by and large, worthless.
Ashvin Srivatsa
Corporate drone '?? | Yale University '14 | Sycamore High School (OH) '10
User avatar
Cody
2008-09 Male Athlete of the Year
Posts: 2891
Joined: Sun Nov 15, 2009 12:57 am

Re: Database Hits Do Not Determine Difficulty

Post by Cody »

There are also plenty of other sites where you can get an idea of what is read / widely known. For example, Evan showed me that Goodreads is a nice way to get a rough idea as to which books by an author are most widely read by the general populace, if you're in doubt.
Cody Voight, VCU ’14.
User avatar
Gautam
Auron
Posts: 1413
Joined: Sun Feb 11, 2007 7:28 pm
Location: Zone of Avoidance
Contact:

Re: Database Hits Do Not Determine Difficulty

Post by Gautam »

Excelsior (smack) wrote:
One other heuristic I use when I need to get quick-n’-dirty data on ordering clues is to do a comparison of Google hits at large
While we're debunking myths, here is another myth: "Google does not lie about the number of results it has for a given query". If you click through the results for "Freya of the Seven Isles" (quoted), you will find that the results peter out around page 19, at 184 results or so. A far cry from 75,000! Worse still, there is no obvious relationship between the actual number of results for a query and the number of results Google claims exists for a query. Ordering is not even preserved - Google may report that A has more results than B, when, in actuality, B has more results than A.

I doubt that this heuristic is a major part of anyone's question-writing process, but the informed writer should nonetheless be informed that it is, by and large, worthless.
I've been meaning to make this post for ages! Thanks for doing this. It has led me to make some stupid decisions before. Don't rely on that large number to mean anything.
Gautam - ACF
Currently tending to the 'quizbowl hobo' persuasion.
User avatar
Mark Wolfsberg
Lulu
Posts: 87
Joined: Wed Mar 06, 2013 11:20 am
Location: Bethlehem , NY

Re: Database Hits Do Not Determine Difficulty

Post by Mark Wolfsberg »

I am extremely impressed by the level of thoughtfulness you put into question writing and " hardness ranking."

Thank you to everyone who puts in the effort.
Mark Wolfsberg
Parent / Driver - certainly not a coach
Bethlehem History Bowl Club
User avatar
Pushkin's Beard
Lulu
Posts: 90
Joined: Sat Dec 17, 2011 12:29 am

Re: Database Hits Do Not Determine Difficulty

Post by Pushkin's Beard »

Excelsior (smack) wrote:
One other heuristic I use when I need to get quick-n’-dirty data on ordering clues is to do a comparison of Google hits at large
While we're debunking myths, here is another myth: "Google does not lie about the number of results it has for a given query". If you click through the results for "Freya of the Seven Isles" (quoted), you will find that the results peter out around page 19, at 184 results or so. A far cry from 75,000! Worse still, there is no obvious relationship between the actual number of results for a query and the number of results Google claims exists for a query. Ordering is not even preserved - Google may report that A has more results than B, when, in actuality, B has more results than A.

I doubt that this heuristic is a major part of anyone's question-writing process, but the informed writer should nonetheless be informed that it is, by and large, worthless.
I agree that looking at "number of results in google" to gauge difficulty is a very bad idea. However, Google is not lying to you. Instead, Google omits results with too much overlap/similarity to pages already returned. If you change your settings, you can see all 75,000 results although next to none of them will give you pages more relevant to your search than results in the previous 19 pages. Sorry for being nitpicky and I do want to reiterate that I completely agree with your point that quantity of Google results should not be used in the question-writing process.
Noah Cowan
Georgetown Day School '15
User avatar
Excelsior (smack)
Rikku
Posts: 386
Joined: Sun Jan 25, 2009 12:20 am
Location: Madison, WI

Re: Database Hits Do Not Determine Difficulty

Post by Excelsior (smack) »

Pushkin's Beard wrote:I agree that looking at "number of results in google" to gauge difficulty is a very bad idea. However, Google is not lying to you. Instead, Google omits results with too much overlap/similarity to pages already returned. If you change your settings, you can see all 75,000 results although next to none of them will give you pages more relevant to your search than results in the previous 19 pages. Sorry for being nitpicky and I do want to reiterate that I completely agree with your point that quantity of Google results should not be used in the question-writing process.
Well, no, that's not really the case. If you do choose to include the omitted results, you will have more results, but still not necessarily as many as Google claims. In the "Freya of the Seven Isles" case, this gives me about 500 results total (the exact number will vary over time and depend on who is performing the search, but the point is that it's not as many as Google claims). The ~75,000 reported by Google is an estimate, not a number computed by actually examining all pages in their index and counting those which contain the term "Freya of the Seven Isles".
Ashvin Srivatsa
Corporate drone '?? | Yale University '14 | Sycamore High School (OH) '10
User avatar
Pushkin's Beard
Lulu
Posts: 90
Joined: Sat Dec 17, 2011 12:29 am

Re: Database Hits Do Not Determine Difficulty

Post by Pushkin's Beard »

Excelsior (smack) wrote:
Pushkin's Beard wrote:I agree that looking at "number of results in google" to gauge difficulty is a very bad idea. However, Google is not lying to you. Instead, Google omits results with too much overlap/similarity to pages already returned. If you change your settings, you can see all 75,000 results although next to none of them will give you pages more relevant to your search than results in the previous 19 pages. Sorry for being nitpicky and I do want to reiterate that I completely agree with your point that quantity of Google results should not be used in the question-writing process.
Well, no, that's not really the case. If you do choose to include the omitted results, you will have more results, but still not necessarily as many as Google claims. In the "Freya of the Seven Isles" case, this gives me about 500 results total (the exact number will vary over time and depend on who is performing the search, but the point is that it's not as many as Google claims). The ~75,000 reported by Google is an estimate, not a number computed by actually examining all pages in their index and counting those which contain the term "Freya of the Seven Isles".
I said change settings, not click the "include omitted results" button. The results number gives you the number of times the search term is used in the over 60 trillion webpages through which Google "sifts." It is POSSIBLE to get all the results Google says there are, but do you really want to wait for it to collect, much less LOAD, the millions of pages it finds with that search term?
Noah Cowan
Georgetown Day School '15
Locked