Some new question database
Some new question database
It can be found here.
I had no plans for the summer and I thought I'd learn some computer programming, so I decided to take on the idea of a question database. I know it's been done before, but I wanted to address three particular issues that intrigued me:
Full Unicode support*: This means that any and all diacritical or special characters that were originally in the packet are preserved. This might not seem like a big deal, but I've always appreciated these emebellishments. What's more, you don't need to specify diacritics in your search; just typing in "bela bartok" will catch instances of "béla bartók" as well.
A large source of questions: For college tournaments, this means ACF and many mACF tournaments. Not all of them, but at least the ones that were generally well-received. In addition, there's plenty of freely-available high school tournaments as well, so I can't see why they shouldn't be a part of this as well. As of now, PACE and other house-written sets are up. Trash tournaments are part of this too, since there's plenty of well-written and interesting trash questions out there. Currently, only tournaments from 2008 have been added to the database, though I plan to add older tournaments as well.
Very flexible search options: By default, searches will only look through collegiate, academic tournaments, as I feel this will be the most popular source of questions people would want to look through. However, this can be easily changed under "Search Options", as well as looking at only answer choices vs. clues, tossups vs. bonuses, or limiting results by year. Under "Subject Options", you can choose which subjects to include and whether to have them included with an "and" or an "or". It's the difference between retrieving questions tagged with "American Literature" and "Poetry", or questions tagged with "American Literature" or "Poetry". To retrieve all questions within a certain subject, select the appropriate checkboxes and leave the search field empty. If no subjects are selected, all subjects will be searched; this combined with an empty search field, however, will not retrieve all the questions in the database.
Subject tagging in particular has been made extremely accessible. Anyone can categorize a question as they see fit. Moreover, if a question is miscategorized, it can be fixed instantly, instead of having the webmaster see to it. I'm leaving this feature of the site open, for now. Hopefully there won't be any abuse of the system (e.g. evolution tossups being categorized as religion).
There's a bit more to the site, but I'd rather you try it out. Let me know what you guys think of it.
* Technically untrue, as MySQL doesn't support all Unicode tables, but as long as quiz bowlers don't start writing Amaterasu in hiragana, the current scheme should cover pretty much everything that comes up.
I had no plans for the summer and I thought I'd learn some computer programming, so I decided to take on the idea of a question database. I know it's been done before, but I wanted to address three particular issues that intrigued me:
Full Unicode support*: This means that any and all diacritical or special characters that were originally in the packet are preserved. This might not seem like a big deal, but I've always appreciated these emebellishments. What's more, you don't need to specify diacritics in your search; just typing in "bela bartok" will catch instances of "béla bartók" as well.
A large source of questions: For college tournaments, this means ACF and many mACF tournaments. Not all of them, but at least the ones that were generally well-received. In addition, there's plenty of freely-available high school tournaments as well, so I can't see why they shouldn't be a part of this as well. As of now, PACE and other house-written sets are up. Trash tournaments are part of this too, since there's plenty of well-written and interesting trash questions out there. Currently, only tournaments from 2008 have been added to the database, though I plan to add older tournaments as well.
Very flexible search options: By default, searches will only look through collegiate, academic tournaments, as I feel this will be the most popular source of questions people would want to look through. However, this can be easily changed under "Search Options", as well as looking at only answer choices vs. clues, tossups vs. bonuses, or limiting results by year. Under "Subject Options", you can choose which subjects to include and whether to have them included with an "and" or an "or". It's the difference between retrieving questions tagged with "American Literature" and "Poetry", or questions tagged with "American Literature" or "Poetry". To retrieve all questions within a certain subject, select the appropriate checkboxes and leave the search field empty. If no subjects are selected, all subjects will be searched; this combined with an empty search field, however, will not retrieve all the questions in the database.
Subject tagging in particular has been made extremely accessible. Anyone can categorize a question as they see fit. Moreover, if a question is miscategorized, it can be fixed instantly, instead of having the webmaster see to it. I'm leaving this feature of the site open, for now. Hopefully there won't be any abuse of the system (e.g. evolution tossups being categorized as religion).
There's a bit more to the site, but I'd rather you try it out. Let me know what you guys think of it.
* Technically untrue, as MySQL doesn't support all Unicode tables, but as long as quiz bowlers don't start writing Amaterasu in hiragana, the current scheme should cover pretty much everything that comes up.
Arnav // Stanford University
Re: Some new question database
I loooooove the title!
Gautam - ACF
Currently tending to the 'quizbowl hobo' persuasion.
Currently tending to the 'quizbowl hobo' persuasion.
- Frater Taciturnus
- Auron
- Posts: 2463
- Joined: Mon Dec 12, 2005 1:26 pm
- Location: Richmond, VA
Re: Some new question database
DamnEärendil wrote:
* [Technically untrue, as MySQL doesn't support all Unicode tables, but as long as quiz bowlers don't start writing Amaterasu in hiragana, the current scheme should cover pretty much everything that comes up.
Janet Berry
[email protected]
she/they
--------------
J. Sargeant Reynolds CC 2008, 2009, 2014
Virginia Commonwealth 2010, 2011, 2012, 2013,
Douglas Freeman 2005, 2006, 2007
[email protected]
she/they
--------------
J. Sargeant Reynolds CC 2008, 2009, 2014
Virginia Commonwealth 2010, 2011, 2012, 2013,
Douglas Freeman 2005, 2006, 2007
- Maxwell Sniffingwell
- Auron
- Posts: 2164
- Joined: Sun Feb 12, 2006 3:22 pm
- Location: Des Moines, IA
Re: Some new question database
Eärendil wrote: * [Technically untrue, as MySQL doesn't support all Unicode tables, but as long as quiz bowlers don't start writing Amaterasu in hiragana, the current scheme should cover pretty much everything that comes up.
I've always wondered about that one.http://www.hsquizbowl.org/acf/formatting-sample.doc wrote: ANSWER: Six Records of a Floating Life [accept 大中华文库—浮生六记]
Greg Peterson
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player" - Mike Cheyne
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player" - Mike Cheyne
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
How are you doing your importing?
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
Re: Some new question database
I first convert everything (docs, PDFs, RTFs, etc.) to a UTF-8 text file. Then I clean up some commonly-found patterns with regular expression matching. At this point, I look over the text file and manually make edits to clean up the questions, so they're easier to detect as either a tossup or a bonus. Then, regular expressions match tossups, categorize them as tossups, and insert them into the database. Repeat, except with bonuses.
My biggest complaint so far has been inconsistent bonus stylings. For what it's worth, I think that the point value of every part should be at the beginning of the clue, enclosed in [ ], as in the following example:
Also, I forgot to mention this earlier, but since this might be of interest to high schoolers and trash enthusiasts, could the mods please cross-post this appropriately? Much thanks.
EDIT: I can't spell.
My biggest complaint so far has been inconsistent bonus stylings. For what it's worth, I think that the point value of every part should be at the beginning of the clue, enclosed in [ ], as in the following example:
I hate having to deal with "A. ... ANSWER: ... B. ... ANSWER: ... C. ... ANSWER. ..." or some other scheme. For the algorithm, these look too much like tossups and screws everything up.23. He was the first Congressman to serve for over 40 years, but his reputation was diminished by his pugnacious manner. For the stated points:
[10] For 10, identify this ‘explosive’ speaker of the House of Representatives, who held the post from 1903-1911.
ANSWER: Joseph Gurney Cannnon
[10] For 10, Cannon used his power to help pass this 1909 tariff, which may have been the key to President Taft’s undoing in the 1912 election.
ANSWER: Payne-Aldrich Tariff
[5]/[5] For 5 points each, Taft was also hurt by the controversy between these two men: one the chief of the Forest Service, the other a Secretary of the Interior who opened public lands in Alaska for private development.
ANSWER: Gifford Pinchot and Richard Ballinger [either order, 5 points for each correct answer]
Also, I forgot to mention this earlier, but since this might be of interest to high schoolers and trash enthusiasts, could the mods please cross-post this appropriately? Much thanks.
EDIT: I can't spell.
Arnav // Stanford University
-
- Wakka
- Posts: 248
- Joined: Mon Sep 12, 2005 10:49 am
- Contact:
Re: Some new question database
This looks cool, and I like that you've stressed the Unicode thing here. But pretty soon we're going to have to declare a moratorium on new question databases!Eärendil wrote: I decided to take on the idea of a question database. I know it's been done before,
Carlo Angiuli, Indiana University
Director, Aegis Questions, Inc.
Director, Aegis Questions, Inc.
- Auks Ran Ova
- Forums Staff: Chief Administrator
- Posts: 4295
- Joined: Sun Apr 30, 2006 10:28 pm
- Location: Minneapolis
- Contact:
Re: Some new question database
Obviously the best solution is to create some sort of searchable database for them.leapfrog314 wrote:This looks cool, and I like that you've stressed the Unicode thing here. But pretty soon we're going to have to declare a moratorium on new question databases!Eärendil wrote: I decided to take on the idea of a question database. I know it's been done before,
Rob Carson
University of Minnesota '11, MCTC '??, BHSU forever
Member, ACF
Member emeritus, PACE
Writer and Editor, NAQT
University of Minnesota '11, MCTC '??, BHSU forever
Member, ACF
Member emeritus, PACE
Writer and Editor, NAQT
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
I think the issues involved in parsing packets are common to all the various attempts at having a database. Here's my proposal: the CS-minded among us should pool our resources and work on one database. It doesn't have to be mine, but I think it should be similar in design, with other capabilities built in (categories, quality rankings, difficulty, etc.).
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Skepticism and Animal Feed
- Auron
- Posts: 3238
- Joined: Sat Oct 30, 2004 11:47 pm
- Location: Arlington, VA
Re: Some new question database
I think ACFDB and Gyaankosh are fine; I haven't encountered a single bug while using them. Is the hard part just re-formatting packets so they can be used? If quizbowl players don't have time to do that, could we just hire some company (perhaps in the third world) to do it for us? I'd be willing to donate a portion of my proceeds from RMPfest and other stuff I might write in the future to do it (and I'm sure other writers and/or individuals would too) because I just find the databases so useful.
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
Format your packets properly to begin with and we won't have this problem.Victor Eremita wrote:I think ACFDB and Gyaankosh are fine; I haven't encountered a single bug while using them. Is the hard part just re-formatting packets so they can be used? If quizbowl players don't have time to do that, could we just hire some company (perhaps in the third world) to do it for us? I'd be willing to donate a portion of my proceeds from RMPfest and other stuff I might write in the future to do it (and I'm sure other writers and/or individuals would too) because I just find the databases so useful.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Skepticism and Animal Feed
- Auron
- Posts: 3238
- Joined: Sat Oct 30, 2004 11:47 pm
- Location: Arlington, VA
Re: Some new question database
Right, so a different solution to the problem would be a time machine to allow Jerry to yell at people who wrote packets before the modern formatting standards were adopted (which was what, last year?).
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
Bruce, do you have anything to contribute to this discussion or are you just being contrarian?Victor Eremita wrote:Right, so a different solution to the problem would be a time machine to allow Jerry to yell at people who wrote packets before the modern formatting standards were adopted (which was what, last year?).
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Skepticism and Animal Feed
- Auron
- Posts: 3238
- Joined: Sat Oct 30, 2004 11:47 pm
- Location: Arlington, VA
Re: Some new question database
The question I asked was whether or not the problem with having a comprehensive database is just an issue of having to re-format packets. You responded by telling people to properly format their packets. I pointed out that I was referring to old packets whose authors did not know the current standard.
I suspect that some people here don't actually read my posts, they just say "ah, I've disagreed with this guy in the past, therefore he must be saying something terrible right here."
I suspect that some people here don't actually read my posts, they just say "ah, I've disagreed with this guy in the past, therefore he must be saying something terrible right here."
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
Yes, the issue is formatting the packets. This has been obvious from day one and anyone who's taken a crack at a question database understands this. Old packets will have to be reformatted to current standards if we want them to be entered into a searchable database. However, the notion of outsourcing this job is laughable; instead, what needs to happen is that a bunch of people take on the distributed task of formatting these packets.Victor Eremita wrote:The question I asked was whether or not the problem with having a comprehensive database is just an issue of having to re-format packets. You responded by telling people to properly format their packets. I pointed out that I was referring to old packets whose authors did not know the current standard.
I suspect that some people here don't actually read my posts, they just say "ah, I've disagreed with this guy in the past, therefore he must be saying something terrible right here."
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Skepticism and Animal Feed
- Auron
- Posts: 3238
- Joined: Sat Oct 30, 2004 11:47 pm
- Location: Arlington, VA
Re: Some new question database
Pretty much everyone in quizbowl has an obvious interest in those packets getting formatted and submitted to a database. The original thread from QBDB has dozens of people volunteering to format packets in their spare time. These seem to be just as on track as the Greg Peterson MO packet.
There's some kind of failure here, and I'm wondering what needs to happen before people actually start reformatting packets. Does the community have to settle on a single database? Do people need to be given financial incentives? Etc.
There's some kind of failure here, and I'm wondering what needs to happen before people actually start reformatting packets. Does the community have to settle on a single database? Do people need to be given financial incentives? Etc.
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
- No Rules Westbrook
- Auron
- Posts: 1238
- Joined: Mon Nov 22, 2004 1:04 pm
Re: Some new question database
Been there, brother speaks the truth.I suspect that some people here don't actually read my posts, they just say "ah, I've disagreed with this guy in the past, therefore he must be saying something terrible right here."
But, anyways, comprehensive question databases (searchable or even non-searchable) are just grand. I think they may be the single most important development toward the goal of making new entrants into good college players and writers.
Ryan Westbrook, no affiliation whatsoever.
I am pure energy...and as ancient as the cosmos. Feeble creatures, GO!
Left here since birth...forgotten in the river of time...I've had an eternity to...ponder the meaning of things...and now I have an answer!
I am pure energy...and as ancient as the cosmos. Feeble creatures, GO!
Left here since birth...forgotten in the river of time...I've had an eternity to...ponder the meaning of things...and now I have an answer!
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
Hey, how about this for consideration:Victor Eremita wrote:Pretty much everyone in quizbowl has an obvious interest in those packets getting formatted and submitted to a database. The original thread from QBDB has dozens of people volunteering to format packets in their spare time. These seem to be just as on track as the Greg Peterson MO packet.
There's some kind of failure here, and I'm wondering what needs to happen before people actually start reformatting packets. Does the community have to settle on a single database? Do people need to be given financial incentives? Etc.
I'm really fucking busy, all the goddamn time. I just finished writing a quarter of a tournament, and I'm about to start putting together ACF Winter; before that, I was living outside of my home for more than half a year because I had to work on my project, which I am currently in the process of quitting; which means that I have to find a new advisor to work with and I still have a ton of loose ends to wrap up before I finally leave this group in January (by which time I'll be on the hook for ACF Regionals and Penn Bowl packets).
I'm sorry that my peripatetic lifestyle and the myriad of things I have to do in order to live and support myself and make progress towards my degree are getting in the way of working on something that no one is obligated to provide. If I had a couple months off from my work, I could probably do many of these things, but I don't have a couple months. As a result, things that are lower in priority tend to fall by the wayside and things that are higher in priority, like, you know, producing tournaments and stuff get done first. But at least, unlike you, I have taken a non-negligible amount of time and effort to advance a project I would like to see completed; unlike you, I don't come into these threads with rhetorical questions about what can be done and bemoaning the alleged failure of an undertaking towards which I've contributed nothing.
You don't have to be a genius to understand what needs to happen for this to work: someone needs to sit down and parcel out the workload to the various people who are interested in contributing. Arguably I should have done that, but I didn't. If you want to help by taking on this task, then be my guest, but this bullshit about financial incentives (what financial incentives?) and musings about how many databases we need is completely unhelpful.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Maxwell Sniffingwell
- Auron
- Posts: 2164
- Joined: Sun Feb 12, 2006 3:22 pm
- Location: Des Moines, IA
Re: Some new question database
?Victor Eremita wrote:These seem to be just as on track as the Greg Peterson MO packet.
Greg Peterson
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player" - Mike Cheyne
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player" - Mike Cheyne
- BuzzerZen
- Auron
- Posts: 1517
- Joined: Thu Nov 18, 2004 11:01 pm
- Location: Arlington, VA/Hampshire College
Re: Some new question database
See, it's a joke at your expense.cornfused wrote:?Victor Eremita wrote:These seem to be just as on track as the Greg Peterson MO packet.
Evan Silberman
Hampshire College 07F
How are you actually reading one of my posts?
Hampshire College 07F
How are you actually reading one of my posts?
- Mechanical Beasts
- Banned Cheater
- Posts: 5673
- Joined: Thu Jun 08, 2006 10:50 pm
Re: Some new question database
The meaning here is "not."cornfused wrote:?Victor Eremita wrote:These seem to be just as on track as the Greg Peterson MO packet.
Jerry, I don't think that Bruce is trying to be inflammatory, and I don't think that he means to insinuate that you, yourself, are personally at fault for not taking control of this operation. He wants to get this thing done, and wants to know what it would take to get people in charge of assigning work and then doing that work. Moreover, he wants to suppose that maybe financial incentives--which obviously haven't been tried or even really considered--could be helpful.
Andrew Watkins
- No Rules Westbrook
- Auron
- Posts: 1238
- Joined: Mon Nov 22, 2004 1:04 pm
Re: Some new question database
Yeah, plus, let's remember that Bruce views all individuals as POLITIES which interact with each other according to Realist norms, so it's only natural that he'd suggest something like financial incentive.
Ryan Westbrook, no affiliation whatsoever.
I am pure energy...and as ancient as the cosmos. Feeble creatures, GO!
Left here since birth...forgotten in the river of time...I've had an eternity to...ponder the meaning of things...and now I have an answer!
I am pure energy...and as ancient as the cosmos. Feeble creatures, GO!
Left here since birth...forgotten in the river of time...I've had an eternity to...ponder the meaning of things...and now I have an answer!
- Matt Weiner
- Sin
- Posts: 8148
- Joined: Fri Apr 11, 2003 8:34 pm
- Location: Richmond, VA
Re: Some new question database
Hey, an easy way to simplify the task of reformatting old packets is to write a parser that strips out everything besides text, including paragraph returns and such, and replaces it with a space. Then, the human labor only has to go through and insert returns between blocks of question text, answer lines, the next question, etc. That seems like it would be much easier than asking people to get rid of all the indents and stuff on their own.
Matt Weiner
Advisor to Quizbowl at Virginia Commonwealth University / Founder of hsquizbowl.org
Advisor to Quizbowl at Virginia Commonwealth University / Founder of hsquizbowl.org
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
It's standard Bruce operating procedure to suggest unworkable solutions to nonexistent problems and then get defensive when people point out these simple facts. Whatever, this thread is about neither my feelings nor Bruce. I will repeat again that the notion of financial incentives is absurd on its face; we can barely get people to properly format their packets correctly and we already offer financial incentives for that. Not to mention that no on in quizbowl has any money, so what's going to be funding these incentives is unclear.everyday847 wrote:Jerry, I don't think that Bruce is trying to be inflammatory, and I don't think that he means to insinuate that you, yourself, are personally at fault for not taking control of this operation. He wants to get this thing done, and wants to know what it would take to get people in charge of assigning work and then doing that work. Moreover, he wants to suppose that maybe financial incentives--which obviously haven't been tried or even really considered--could be helpful.
I'll take a step forward in this: if you are interested in adding to QBDB's content, you can help out by formatting packets. I've set up a spreadsheet on Google Docs to keep track of the task of formatting packets. Basically, the way it works is like this: if you want to help, send me an email at [email protected] with "[qbdb]" in the subject line and I will invite you to collaborate on the spreadsheet. Then, grab a tournament off the Stanford archive, format it according to the ACF guidelines (including renaming the packets with the correct scheme), and send me a zip file with the set. When you decide to work a tournament, enter the information into the columns provided so that no one else tries to do the same thing. When I've uploaded the set, I will note that in the spreadsheet and update the thread.
Last time I tried soliciting help, I had people send me all sorts of packets which were formatted wrong. I hope it doesn't happen again, and I'm glad to post the guidelines again if people need to see them. As I've said before, if you want to see this happen, pitch in. It's too much work for me to handle as one person, but if a bunch of people take an hour out of their week to format a single set, we're going to have this done very fast.
I certainly don't want to undermine anyone else's efforts to create a packet database. I do think that my program is farther along than the others just because I've worked on it longer, and I've gotten most of the bugs out (although I still haven't figured out how to handle Unicode because I'm using XML as an intermediary; suggestions are welcome). Obviously, features like categories and difficulty/quality rankings would be great too, and I do plan to add them, but the infrastructure for a useful database is there and people could make use of it effectively if more data was entered.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
Re: Some new question database
This is almost exactly the way my parser works, except that it saves time and inserts line feeds whenever a packet is formatted correctly. The worst are PDFs, which usually come out as a continuous block of text and need to be manually scanned through, but it's still not abominable.Matt Weiner wrote:Hey, an easy way to simplify the task of reformatting old packets is to write a parser that strips out everything besides text, including paragraph returns and such, and replaces it with a space. Then, the human labor only has to go through and insert returns between blocks of question text, answer lines, the next question, etc. That seems like it would be much easier than asking people to get rid of all the indents and stuff on their own.
I had thought of putting in a difficulty/quality ranking system into Gyaankosh, but scrapped it when I realized it probably wouldn't be ready by early September. I'll look into it as a future addition. The other major implementation I'd like to put in is preservation of italics, bolds, and underlines, but that I don't think I'll have time to introduce anything new until January. That said, people do seem to find this useful, so I'll continue to add older tournaments. On the other hand, as far as usability is concerned, what other features would be useful? Once enough questions have been placed into subjects, we can work out a simple frequency list or concordance-like feature, but I'm interested in hearing what else would be worthwhile.grapesmoker wrote:Obviously, features like [...] difficulty/quality rankings would be great too
I don't think so. I think Carlo's, Jerry's, and my databases have slightly different ideological approaches and different functionalities, which some people may like and others may not. It also helps to have multiple redundancies, in case one of the goes down or something.Whig's Boson wrote:Does the community have to settle on a single database?
Would people be interested in this? I'd be glad to offer my database as a starting point, but I think we'll run into problems with different programming experiences. Personally, I'm very comfortable with Python, but I have very little knowledge of PHP, which I suspect many people are going to use as their scripting language of choice. On the other hand, I do know that I wouldn't mind having another admin or two to help me run Gyaankosh and import packets, if people are interested.grapesmoker wrote:Here's my proposal: the CS-minded among us should pool our resources and work on one database. It doesn't have to be mine, but I think it should be similar in design, with other capabilities built in (categories, quality rankings, difficulty, etc.).
Arnav // Stanford University
- Blackboard Monitor Vimes
- Auron
- Posts: 2362
- Joined: Sat Aug 18, 2007 5:40 pm
- Location: Richmond, VA
Re: Some new question database
A database that preserves underlining would be awesome. I've yet to come across one, and I'm frequently curious as to how much of some titles are underlined.
Sam L,
Maggie L. Walker Governor's School 2010 / UVA 2014 / VCU School of Education 2016
PACE
Maggie L. Walker Governor's School 2010 / UVA 2014 / VCU School of Education 2016
PACE
- grapesmoker
- Sin
- Posts: 6345
- Joined: Sat Oct 25, 2003 5:23 pm
- Location: NYC
- Contact:
Re: Some new question database
I'm pretty sure my import scheme preserves underlining. In fact, it certainly does, since you can see it in the answer formatting. It's just that most titles in the question text are either italicized or in quotes.MLWGS-Gir wrote:A database that preserves underlining would be awesome. I've yet to come across one, and I'm frequently curious as to how much of some titles are underlined.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
- Blackboard Monitor Vimes
- Auron
- Posts: 2362
- Joined: Sat Aug 18, 2007 5:40 pm
- Location: Richmond, VA
Re: Some new question database
I had forgotten about that, actually, as I haven't used your database since early last summer. Now that school's started and I have no time for anything, I've directed most of my attention to stuff I can download to my laptop for use at practice, as I can't get internet at school, but a quick glance at what we read Friday reveals that that has underlining as well. I think it might actually just have been Carlo's that I was thinking of, actually. My memory notably fails at life.grapesmoker wrote:I'm pretty sure my import scheme preserves underlining. In fact, it certainly does, since you can see it in the answer formatting. It's just that most titles in the question text are either italicized or in quotes.MLWGS-Gir wrote:A database that preserves underlining would be awesome. I've yet to come across one, and I'm frequently curious as to how much of some titles are underlined.
Sam L,
Maggie L. Walker Governor's School 2010 / UVA 2014 / VCU School of Education 2016
PACE
Maggie L. Walker Governor's School 2010 / UVA 2014 / VCU School of Education 2016
PACE
- AlphaQuizBowler
- Tidus
- Posts: 695
- Joined: Mon Dec 03, 2007 6:31 pm
- Location: Alpharetta, GA
Re: Some new question database
Is there a reason why I can't categorize tossups? The uncategorized tossups are there, and you can click add subject, but when you do they remain in the uncategorized section.
William
Alpharetta High School '11
Harvard '15
Alpharetta High School '11
Harvard '15
Re: Some new question database
That's because the server caches all the uncategorized questions. If you categorize a question, then immediately come back to the uncategorized question page, chances are you'll still be looking at what's in the cache, as opposed to what's actually in the database. The cache should refresh every 5 minutes or so, though, so it'll eventually remove itself from the uncategorized page. This is a small nuisance I'd like to iron out but haven't quite figured out how to do it yet.
Arnav // Stanford University