Using machines to grade standardized tests is nothing new — we’ve all heard the spiel about making sure we fill in the bubble on our answer sheet completely so the Scantron machine won’t misread it.
But a new Motherboard investigation found that 21 states have quietly started using AI-powered systems to score the essay portions of standardized tests — and the systems are doing a terrible job at it.
The problem with using AIs to grade essays is two-fold, according to Motherboard’s investigation.
For one, the systems suffer from the same bias problem as other AIs — if the humans scoring the tests used to train a system tend to score essays from one demographic of student consistently higher or lower than others, the AI will do the same.
The second problem is that the AIs tend to grade essays based on metrics such as sentence length, vocabulary, and spelling — even if it amounts to smart-looking gibberish.
Not So Much
Motherboard proved that latter issue by using the commonly deployed E-rater to score an essay generated by an algorithm designed to spit out nonsensical writing. That drivel essay — the first line of which read “Invention for precincts has not, and presumably never will be undeniable in the extent to which we inspect the reprover” — scored a 4 out of 6.
“Knowledge of conventions is simply one part of a student’s ability to write,” Norbert Elliot, the editor of the Journal of Writing Analytics, told Motherboard, later adding that “there may be a way that a student is particularly keen and insightful, and a human rater is going to value that. Not so with a machine.”
READ MORE: Flawed Algorithms Are Grading Millions of Students’ Essays [Motherboard]
More on AI in education: The Solution to Our Education Crisis Might be AI