On Monday, someone did the seemingly impossible: Dethroned “Jeopardy!” legend James Holzhauer, the professional sports bettor from Vegas whose gargantuan run at the show broke brains and shattered records. The kingslayer? Chicago librarian Emma Boettcher, who wrote her master’s thesis on deep learning and “Jeopardy!”
And we found it.
Boettcher’s 2016 paper, titled “Predicting the Difficulty of Trivia Questions Using Text Features,” was her master’s thesis for the School of Information and Library Science at UNC Chapel Hill. On “Jeopardy!”, the writers on the show rate the difficulty of their clues in dollar-value increments: $200, $400, $600, and so on. In her paper, Boettcher set out to see if she could use a machine to rate the difficulty of questions using machine learning. And to do so, she set out to evaluate two factors:
1. Readability. By which she means: How easy or difficult is a given “Jeopardy!” clue to understand, or parse out? As you know if you’re a sentient human, “Jeopardy!” clues are written as answers, and contestants have to provide the questions. This makes it different — and more difficult — than normal trivia game shows, where contestants would be asked something like “What’s the greatest emergent technology website on the Internet?” and give a simple answer (“Futurism dot com”). On Jeopardy, they’re given a category (“WEBSITES”) and a corresponding clue (“This Brooklyn-based upstart publication might not cover fascist Italian art movements, but would be a favorite of a Delorean-driving Doc Brown.”) which they need to respond to with a question (“What is Futurism dot com?“). This makes these clues more sophisticated (and thus, better to test against machine learning) than the average trivia question.
2. Information need. Pretty simple: Do you know the answer to the question? On “Jeopardy!,” the higher the dollar value assigned by the show’s writer, ostensibly, the more obscure the answer to the question will be. But one group of people’s ideas about obscure knowledge might be different to another’s. For example, a lot of great “Jeopardy!” champs might be able to rattle off the entire Periodic Table of Elements by heart, but when they face a clue about Miley Cyrus, or hit THE DREADED OPERA CATEGORY, they freeze up. Even better, an example from a recent show: A question about the alter-ego of “Disney Channel teenager Miley Stewart” (Answer: “What is Hannah Montana?”) rated $400, while a question about the alter-ego of Ziggy Stardust (Answer: “Who is David Bowie?”) rated $2,000. We could argue all day about whether or not it should be easier to know David Bowie or Hannah Montana, but Boettcher’s paper sets out to let a machine settle that argument for us.
So, Boettcher wrote a text-mining program to evaluate these two factors on a five-point scale, setting out to answer the following research questions:
What parts of a clue’s text make a “Jeopardy!” clue generally difficult?
What parts of a clue’s readability make a “Jeopardy!” clue difficult?
What topics contribute to “Jeopardy!” clue difficulty?
And how did she do it? This is where it gets really cool:
Using former “Jeopardy!” champion Ken Jennings’ taxonomy of trivia difficulty, she scraped two entire seasons’ worth of “Jeopardy!” clues from the J-Archive (which collects every single clue on every single episode of the show), and filtered out any exceptions for normalcy (like the high schoolers’ and kids’ Jeopardy tournaments, because those are just the dumbest).
She then took that data, and rated them on length (how many words in a clue), media (if there was video or audio in a clue, which was then stored as metadata), phrasing (how convoluted the phrasing of the clue was), topic (how obscure the knowledge of the clue might be), and unigrams (or data sequences). For the phrasing, she mined the data using Python NLTK, a suite of language processing tools. And for the unigrams, she generated ratings using a piece of text-mining software called Light Side. The unigrams were organized into data tranches: If they appeared in all the clues, some of the clues, or very few of the clues.
And what did she find?
Though unigrams and topic membership were not shown to be significant features for predicting the difficulty of trivia clues, features relating to media, length and phrases all had significant impact on difficulty.
In other words, an increased amount in data would have an increased impact on a “Jeopardy!” clue’s difficulty — making it easier. Which sounds… pretty obvious, right? The more forthcoming a “Jeopardy!” clue is — and the more information it gives you — the more likely you are to get it right, no matter the topic or the familiarity with any other element of the clue.
What kind of impact could Boettcher’s research have outside of trivia game shows? Per her conclusion:
This finding may be useful for those studying Tweets or other documents with constricted forms. Similarly, this research has shown that knowing the form of media materials linked to by a text is significant without knowing what the media itself contains, suggesting that for similar projects in text mining, gathering or creating exhaustive descriptions of peripheral media files for similar projects may not be necessary.
Basically, in data mining, it might be useful just to forget about the specifics of the data — thus skipping a massive and onerous step in data mining — and focus on the structure/form of the data itself. And we should mention: Boettcher isn’t the first “Jeopardy!” contestant to use data mining in preparation for the show. Roger Craig, the guy who held the single-day winnings record for “Jeopardy!” before James Holzhauer came along and obliterated it, also used data-mining while in grad school to prepare for the show — which he claims helped him win.
As for whether or not her work helped Boettcher win? Per this conversation between Boettcher and host Alex Trebek on her blockbuster episode of the show:
TREBEK: Emma Boettcher! This young lady is a librarian from Chicago who did her master’s paper on our show?
BOETTCHER: That’s right Alex. I ran a series of text mining experiments to see if a computer could predict how difficult a clue was, based on things like how long it was, what words were used, what the syntax was, [and] whether it had an audio-visual elements in it.
TREBEK: And what did you discover?
BOETTCHER: That it is very hard to do (laughs).
Just facts: You still gotta be quick on the buzzer, and know the answers. Which all the data mining in the world won’t help you do. Good thing for Emma Boettcher: She didn’t need it. Forget the machines, and notch one for the ingenuity of abstract human endeavor — at least until Watson starts talking shit again.
With additional research/reporting by Dan Robitzski
READ MORE: Predicting the Difficulty of Trivia Questions Using Text Features [UNC.edu]