Page 1 of 3

Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:00 pm
by sharadmv
Hello everyone,

I don't know exactly where to post this, but here we go:

Starting last summer, I started creating a quizbowl question parser that has now evolved into a database with a web interface.

The parser takes packets in either .pdf, .doc, or .rtf format and extracts the tossups from them. It has worked for most of the recent-ish tournaments on quizbowlpackets.com, notably the HSAPQ tournaments. The parser doesn't require ACF packet formatting, but it works best when packets are formatted as such.

The website is located at http://quizbowl.tpclubs.com/?page=db.

Here are some features of the website:
1. It contains over 24000 tossups (not bonuses) and I will be adding more when the tournaments are released online. You can search the database by year, difficulty, tournament, category, and location in tossup.
2. Every question is categorized (by broad categories, i.e. Literature, History, etc. To see a complete list, go to the website and click on the category dropdown box.) This was done by a program I wrote that classifies quizbowl questions by word frequencies. Note that it still has bugs and categorizes questions incorrectly (especially Trash questions). You can help fix these errors by clicking on the category of an incorrectly categorized question and selecting a new one from the dropdown. Future packets will be categorized using the questions in the database, so fixing the current errors will improve categorization in the future.
3. It has a random question generator. It contains the same filters that you can search through the database with, and you can generate however many questions you want. It also hides the answer, in case you want to answer it yourself.
4. It has a browse function. This will allow you to get the tossups from whichever round in whichever tournament you desire.
5. The newest feature is a question reader. It is very similar to the one on the UMD site a while ago. However, it will allow you to filter the questions into whatever category you want. Here's how you use it: click Generate to load a question (after you select your filters). Type in "buzz" and press enter in the lower box to buzz in, and then type in the answer and press enter (you will have 10 seconds.) As of now, the way the program knows you typed in the answer is that it checks whether the answer string contains the input string (meaning you could type in nothing, press enter, and always get it right.) This is a temporary measure and shouldn't really matter if you're trying to learn. The speed of reading can be manipulated by a slider bar (10 being slowest, 100 being fastest).

Please post feedback or email me at [email protected]

Thanks!

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:07 pm
by i never see pigeons in wheeling
You. Are. Amazing. That is all.
I tried it early on in its development, and I knew that this would become my new default packet database when it was ready. And now that it is ready, it's exceeded my expectations.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:14 pm
by Windmill Tump
Yeah, this is...ridiculous.
I've spent only a few minutes on this, and I'm already amazed. Great job!

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:37 pm
by davud363000
drno wrote:You. Are. Amazing. That is all.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:37 pm
by Masked Canadian History Bandit
This is really useful. Very good UI on the question reader (e.g. no need to remove certain categories one by one to only play one category).

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:40 pm
by Down and out in Quintana Roo
Holy crap dude. This is pretty wild stuff. Great work.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:48 pm
by Harpie's Feather Duster
Image

Seriously, this is totally awesome. I'm excited to just skim through this a little bit later.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:52 pm
by PennySalem
Dang... this is like ACFDB on a new level.
Now I have something "simple" to give people who want to become better quickly :)

Edit: Crap! Now people will be good at Geography!

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 10:57 pm
by AKKOLADE
Stickying this, because heck yeah~

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 11:00 pm
by AKKOLADE
Any chance a checkbox could be added to the "read to me!" option to hide the category? If not, no big deal!

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 11:13 pm
by AKKOLADE
ALSO could it be done so that you can play straight through a packet and/or play bonuses?

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 11:20 pm
by sharadmv
Fred wrote:ALSO could it be done so that you can play straight through a packet and/or play bonuses?
Bonuses are going to be the next big thing I'm going to work on, but that won't probably come for a while.
The checkbox thing and playing through a packet I can get to sooner.

Just a quick note:

There's an option on the website to search for a question through the url, so in Chrome, you can setup a custom search function by going to Settings->Options->Basics->Search->Manage->Add and put in "http://quizbowl.tpclubs.com/?page=db&search=%s" for the URL. Or, you can search for things manually by putting in http://quizbowl.tpclubs.com/?page=db&search=dickens or whatever else you want.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 18, 2011 11:59 pm
by AlphaQuizBowler
This is awesome stuff. Maybe I'm not doing this right, but it seems like I can only do one-word searches. Is it possible to search using multiple words for things like a book title?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 12:22 am
by sharadmv
Multiple words should definitely be working. Can you tell me what's happening?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 12:31 am
by Rufous-capped Thornbill
This is great, but it seems that every single answer is being ruled incorrect?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 12:36 am
by Wackford Squeers
This is great, but it would be cool if some of the features of UMD's quizbowl tester could be implemented, like accepting alternate answers and, if not tracking where specific people buzzed, marking the average spot where people do so. Also, having to type buzz instead of just hitting the space bar is sort of weird but not a huge deal. The tabbed interface is very smooth, and I'm incredibly impressed by the work put into this site. Bravo.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 1:13 am
by sharadmv
The reason a lot of answers are being ruled incorrect is because right now the answer checking is solely based on whether the input string is contained in the original tossup answer string. This causes one-character differences to be ruled incorrect and will also cause prompts to be accepted. For now, until I make improvements, imagine that if you were supposed to get it correct, you got it correct, and if you weren't, you didn't. Think of it as a learning experience, rather than competitive one, for now.

I'll add alternate answers and tracking where buzz to my to-do list.

Thanks for all the feedback!

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 11:51 am
by Skepticism and Animal Feed
This doesn't seem to work from my work computer (IE on a Windows machine); I guess I'll have to try it at home.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 11:57 am
by JamesIV
This is a remarkable achievement, I am absolutely flabbergasted. Quality, total quality...

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 12:35 pm
by Broad-tailed Grassbird
Looks really good. It is my new time waster.

You may need to work on what is categorized as trash though.

Trash

This son of the “Oracle of New Lebanon” was turned out of one of his earliest offices by James Harper and gained his fortune by managing the merger of the Galena & Chicago Union and Northwestern Railroads.  Although he received partial vindication through the Potter Committee, he was hurt by a series of telegrams obtained from Senator Oliver P. Morton that alleged his staff offered bribes for votes; those were the Cipher Dispatches.  He would later fail to directly challenge a body including Joseph Philo Bradley, who was appointed because the independent David Davis had resigned his position on the Supreme Court.  Earlier, his “Figures That Could Not Lie” report, combined with the efforts of William Wickham, helped bring down a certain “Grand Sachem.”  For 10 points, name this governor of New York who busted the Tweed Ring and went on to lose to Rutherford B. Hayes in 1876.

ANSWER: Samuel Jones Tilden

2010 ACF Nationals - Editor's Round (PO 4).docx (Question #20)

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 12:36 pm
by Marble-faced Bristle Tyrant
Yeah, I saw a Joseph McCarthey one that was labeled trash, too.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 1:53 pm
by Rompimientos del Centauro
Personal favorites are "Great Rift Valley" under Literature and "George Carlin" under Fine Arts.
Bookmarked - this is pretty fantastic.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 2:54 pm
by AKKOLADE
Acts of the Apostles in the 2010 PACE NSC as trash.

Would it be helpful if you get emails with corrections for things like this? Would you like to wait on it?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 3:45 pm
by ryandillon
Fred wrote:Acts of the Apostles in the 2010 PACE NSC as trash.

Would it be helpful if you get emails with corrections for things like this? Would you like to wait on it?
If things like this are coming up repeatedly and the Torrey Pines people are eager to fix them, would it be possible to have a centralized thread or something where the problems could be easily navigated rather than a bunch of emails?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 3:47 pm
by sharadmv
The interface allows you to fix the category if it's incorrect. In every tab but the Read To Me! tab, you can manipulate the result by clicking on the category (which is underlined and in curly brackets) and a dropdown will appear, and selecting a new category will change it. Further uploaded packets will be categorized based on questions in the database, so the more accurate the database is, the more accurate future categorization will be.

If you notice an error while using the Read To Me! section, there is no straightforward way to fix the tossup category. What you can do is search for questions with that answer, and then look for the question, and then change it in the Search tab. If you don't want to do this, it's fine, and I'll be working to add in a feature that allows you to change the category in the Read To Me! tab too.

And yes, trash seems to be the most inconsistent, probably because trash questions are the least normal.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 6:34 pm
by Edward Elric
Morraine Man wrote:This doesn't seem to work from my work computer (IE on a Windows machine); I guess I'll have to try it at home.
I had similar issues also, but that may just be my computer.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 7:35 pm
by cvdwightw
I checked this out and thought it to be very well-done. Two ideas for improvement:

1. A category of "miscellaneous academic" should be added for things that don't easily fit into one of the other areas. The classifier would have to judge this based on something like "doesn't match any other category with cutoff-level confidence." Also, "Current Events" could be its own category, so people aren't getting 2008 current events when they look at history questions from 2008.

2. Stratification of difficulty levels. A random 20-tossup sample of college-level science questions produced everything from Delta Burke questions on polymers and Alan Turning to a Gaddis II tossup on Andreev reflection. Similarly, a random 20-tossup sample of high school-level literature tossups contained both a FNT tossup on Thomas Hardy and a NSC tossup on The Good Soldier. My recommendation would be to divide this into seven levels: middle school, high school novice (e.g. FNT), high school regular (e.g. HSAPQ), high school hard (e.g. NSC), and the corresponding novice-regular-hard at the college level (e.g., respectively, ACF Fall, Regionals, and Nationals).

I'm also a bit curious as to why the RMP category was divided up into secondary categories, while nothing else was (e.g. Fine Arts into Auditory and Visual Arts, Science into Biology, Chemistry, Physics, Math, and Other Science), but I recognize that doing that might make the category classification even more difficult and inaccurate than it is now.

EDIT: since people are posting their favorite mis-classifications, this literature tossup on hydrogen bonding is pretty cool.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 19, 2011 9:09 pm
by Lightly Seared on the Reality Grill
This definitely has potential. The first question I generated was a 2009 CO Lit question that seemed cut off. Sure enough, only the part after the (*) power mark was displayed. The punctuation probably threw the parser off.

Re: Torrey Pines Quizbowl Database

Posted: Wed Apr 20, 2011 3:37 pm
by grapesmoker
What's the backend and how are you doing your parsing?

Re: Torrey Pines Quizbowl Database

Posted: Wed Apr 20, 2011 9:35 pm
by sharadmv
It's a Java parser that extracts questions from .pdf, .doc, .docx, and .rtf packets that are close to ACF Packet guidelines and populates a MySQL database, which is queried with php.

Also, I just recently fixed a big bug that actually automatically marked you wrong if there were any capital letters in your answer. Sorry about that, you should be getting the answer right more often now.

Re: Torrey Pines Quizbowl Database

Posted: Thu Apr 21, 2011 11:01 am
by grapesmoker
sharadmv wrote:It's a Java parser that extracts questions from .pdf, .doc, .docx, and .rtf packets that are close to ACF Packet guidelines and populates a MySQL database, which is queried with php.

Also, I just recently fixed a big bug that actually automatically marked you wrong if there were any capital letters in your answer. Sorry about that, you should be getting the answer right more often now.
I like your UI a lot. Are you using jQuery for this or something similar?

Re: Torrey Pines Quizbowl Database

Posted: Thu Apr 21, 2011 12:29 pm
by sharadmv
I'm using GWT for the interface.

Re: Torrey Pines Quizbowl Database

Posted: Thu Apr 21, 2011 5:16 pm
by grapesmoker
sharadmv wrote:I'm using GWT for the interface.
Ah, GWT, my nemesis. I mean, the tool I work with every day. Cool.

Re: Torrey Pines Quizbowl Database

Posted: Thu Apr 21, 2011 6:12 pm
by Kahloon
Wow, this is amazing. Major props.

Re: Torrey Pines Quizbowl Database

Posted: Fri Apr 22, 2011 9:44 am
by MahoningQuizBowler
I noticed a new tab today - Score Tracker - and created an account. What features will it have? I know people had mentioned tracking buzz location and similar things...

Re: Torrey Pines Quizbowl Database

Posted: Fri Apr 22, 2011 10:52 am
by Papa's in the House
I just used this for the first time to look for some myth to quiz myself on. I got the following tossups labeled as "myth":

2010 Penn Bowl - Illinois A.doc (Question #7) [College]

{Mythology}
Geminin sequesters a protein responsible for initiating this process. SSB proteins are responsible for holding the substrate in this process, which in prokaryotes is terminated by Tus proteins binding to Ter sites. In Eukaryotes it's initated by the recruitment of MCM proteins following the recruitment and phosphorylation of cdc6, which in turn leads to the ORC complex falling apart. The Klenow fragment is a cleaved version of an enzyme used by E. Coli to undergo this process, which in Eukaryotes uses Pol alpha, delta, and epsilon. The nature of this process was determined in an experiment that grew E. Coli on N-15 media, and in eukaryotes it requires the formation of Okazaki fragments on the lagging strand. For 10 points, name this process that creates a namesake fork, in which an organisms genome is copied.

2008 Chicago Open Literature - Packet_8.doc (Question #2) [College]

{Mythology}
This character courts his wife by taking her family to see the Lamont Sisters and the �Society Contralto� at the theater, as well as accompanying them to Schuetzen Park, where August's boat sinks in the pond. He dreams of owning a small house, but is deterred when he's tricked into paying a month's rent on a house with a foot of standing water in the basement. Able to play six mournful airs on the concertina, he teams up with Cribbens after escaping a mine, but becomes easy to track because he carries a canary in a cage from his Polk Street office. This character's wife becomes obsessed with gold coins after winning five thousand dollars in a lottery, which causes the jealousy of Marcus Schouler, to whom he ends up handcuffed in Death Valley. For 10 points, name this dentist created by Frank Norris.

The second time I generated myth tossups, it worked. Good job on the database.

Re: Torrey Pines Quizbowl Database

Posted: Fri Apr 22, 2011 1:10 pm
by sharadmv
MahoningQuizBowler wrote:I noticed a new tab today - Score Tracker - and created an account. What features will it have? I know people had mentioned tracking buzz location and similar things...
Whoops, didn't mean to upload that yet. It's not working, but I'm working on it.
I plan to add a system of logging in, tracking where you buzz, and a way of comparing your buzz to other people's buzzes.

Re: Torrey Pines Quizbowl Database

Posted: Fri Apr 22, 2011 1:21 pm
by Sniper, No Sniping!
sharadmv wrote:
MahoningQuizBowler wrote:I noticed a new tab today - Score Tracker - and created an account. What features will it have? I know people had mentioned tracking buzz location and similar things...
Whoops, didn't mean to upload that yet. It's not working, but I'm working on it.
I plan to add a system of logging in, tracking where you buzz, and a way of comparing your buzz to other people's buzzes.
The difference I notice between what you made (which I think is awesome and thank you so much for making it, I love this resource) and what UMD made is when i want to type in an answer early it doesn't pause the question, it just keeps reading.

Re: Torrey Pines Quizbowl Database

Posted: Fri Apr 22, 2011 1:46 pm
by at your pleasure
CavsFan2k10 wrote:
sharadmv wrote:
MahoningQuizBowler wrote:I noticed a new tab today - Score Tracker - and created an account. What features will it have? I know people had mentioned tracking buzz location and similar things...
Whoops, didn't mean to upload that yet. It's not working, but I'm working on it.
I plan to add a system of logging in, tracking where you buzz, and a way of comparing your buzz to other people's buzzes.
The difference I notice between what you made (which I think is awesome and thank you so much for making it, I love this resource) and what UMD made is when i want to type in an answer early it doesn't pause the question, it just keeps reading.
Typing in "buzz" should pause it, and then you can type the answer in.

Re: Torrey Pines Quizbowl Database

Posted: Sat Apr 23, 2011 9:05 pm
by Coldblueberry
Yay Sharad.

Did someone mention the fact that the "All" box remains checked even when you select a specific option? It's just a small detail.

Re: Torrey Pines Quizbowl Database

Posted: Sat Apr 23, 2011 11:48 pm
by Sniper, No Sniping!
Coldblueberry wrote:Yay Sharad.

Did someone mention the fact that the "All" box remains checked even when you select a specific option? It's just a small detail.
De-check it.

Re: Torrey Pines Quizbowl Database

Posted: Sat Apr 23, 2011 11:54 pm
by AKKOLADE
I believe his point is that it'd be reasonable to expect that box to automatically unchecked itself when you check another box.

Re: Torrey Pines Quizbowl Database

Posted: Sun Apr 24, 2011 10:50 am
by ryandillon
sharadmv wrote:
MahoningQuizBowler wrote:I noticed a new tab today - Score Tracker - and created an account. What features will it have? I know people had mentioned tracking buzz location and similar things...
Whoops, didn't mean to upload that yet. It's not working, but I'm working on it.
I plan to add a system of logging in, tracking where you buzz, and a way of comparing your buzz to other people's buzzes.
If this came about, would there be a way to link up with someone else and play against them?

Re: Torrey Pines Quizbowl Database

Posted: Sun Apr 24, 2011 10:44 pm
by LucasBrown
1.
sharadmv wrote:The interface allows you to fix the category if it's incorrect. In every tab but the Read To Me! tab, you can manipulate the result by clicking on the category (which is underlined and in curly brackets) and a dropdown will appear, and selecting a new category will change it.
It might be helpful to put a paragraph like this on the webpage.

2. You mentioned that the parser is in Java. Would it be possible to obtain the source code of that parser?

3. This would be even more awesome if "Read To Me!" had a speech option--generate an MP3 file from the text of the question, send it to the user's computer, and have it be played from within the browser.

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 25, 2011 10:46 am
by Broad-tailed Grassbird
LucasBrown wrote:1.
sharadmv wrote:The interface allows you to fix the category if it's incorrect. In every tab but the Read To Me! tab, you can manipulate the result by clicking on the category (which is underlined and in curly brackets) and a dropdown will appear, and selecting a new category will change it.
It might be helpful to put a paragraph like this on the webpage.

2. You mentioned that the parser is in Java. Would it be possible to obtain the source code of that parser?

3. This would be even more awesome if "Read To Me!" had a speech option--generate an MP3 file from the text of the question, send it to the user's computer, and have it be played from within the browser.
you gonna read them all?

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 25, 2011 4:39 pm
by LucasBrown
nalin wrote:
LucasBrown wrote:3. This would be even more awesome if "Read To Me!" had a speech option--generate an MP3 file from the text of the question, send it to the user's computer, and have it be played from within the browser.
you gonna read them all?
No--text-to-speech converters (even the free ones) are getting quite good these days.
Of course, it would be preferable to have the tossups read by an actual human, but I don't think anybody's going to sit down and read tossups for a total of eight days (24,000 tossups at 30 sec/tossup).

Re: Torrey Pines Quizbowl Database

Posted: Mon Apr 25, 2011 6:53 pm
by alkrav112
LucasBrown wrote:
nalin wrote:
LucasBrown wrote:3. This would be even more awesome if "Read To Me!" had a speech option--generate an MP3 file from the text of the question, send it to the user's computer, and have it be played from within the browser.
you gonna read them all?
No--text-to-speech converters (even the free ones) are getting quite good these days.
Of course, it would be preferable to have the tossups read by an actual human, but I don't think anybody's going to sit down and read tossups for a total of eight days (24,000 tossups at 30 sec/tossup).
However, this seems like a task that, when divvied up between the wealth of people who are willing and able to competently read, say, 20 tossups and send the resultant audio files to the appropriate parties, would become manageable. As long as tossups were assigned and tracked assiduously so as to prevent two people from doing the same work, I would think that such a database could be assembled relatively quickly. Of course, there would be challenges associated with the sheer size of such a collection of audio files, in addition to challenges in organizing said files and turning them into something people can "play quizbowl" with (the latter concern being ancillary, I suppose, if the purpose of having the files is simply obviating the need for a physical moderator). Also, readers would have to be comfortable with their voices being used for such a public enterprise.

This might be a topic for another thread, but is an audio compendium of questions something the quizbowl community would value (and, perhaps more importantly, would value enough to take the necessary steps to achieve it it)? I.e., would having such a thing greatly improve the way we study for and play quizbowl over the several excellent databases of written questions we currently have? If so, who has the desire and expertise to coordinate such a project, and what steps are necessary in order to move forward with it?

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 26, 2011 2:04 am
by Hieronymus
We are getting hosting on the cheap from an alumnus of Torrey Pines. Sticking that many audio files in a database would probably incur his wrath. . . which is not that wrathy, but what-have-you. An audio compendium of questions would be a good idea, though.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 26, 2011 3:21 pm
by Skepticism and Animal Feed
Personally I would like a feature that makes it sound like T-Pain is reading the questions to me.

Re: Torrey Pines Quizbowl Database

Posted: Tue Apr 26, 2011 5:07 pm
by alkrav112
Morraine Man wrote:Personally I would like a feature that makes it sound like T-Pain is reading the questions to me.
Rappa Ternt Moderata, featuring the hit single "I'm 'n Luv (Wit a Buzzer)"