The folks behind IELTS recently published a white paper encouraging institutions to think carefully about the language tests they accept. The paper seems, in part, like an effort to push back at the use of AI and automated scoring in language tests.
“In order to effectively address each element of the socio-cognitive framework and to ensure all aspects of a language skill are elicited, it is vital to move beyond simple multiple-choice or fill-in-the-blank questions to incorporate more diverse tasks that activate the skills and abilities expected of students by higher education institutions.”
Regarding the use of AI and “algorithmic scoring” in language testing, the authors note:
“…unlike algorithmic scoring… the IELTS Speaking test cannot be ‘hacked’ using gaming techniques that can trick mechanical evaluators into mistakenly evaluating speech as high quality when it is not.”
It notes that algorithmic scoring “requires students to generate predictable patterns of speech in response to fixed tasks,” unlike the IELTS speaking section which “gives the student the best opportunity to be assessed on their communicative proficiency.”
Of writing assessment, the paper notes:
“Given the nature of writing and its importance to learning new knowledge and communicating ideas, there are few shortcuts that can provide the same level of evaluation as an expert trained in writing assessment.”
The paper includes a side-by-side comparison of the two IELTS writing tasks and five “algorithmic scoring” tasks. Weirdly, the authors couldn’t name the test containing those five tasks.
Also included is an infographic about claimed shortcomings of AI-generated reading tasks and a note about the challenge of assessing reading skills “in a truncated period of time.”
The paper has some stuff about listening, but I think you get the point. Beyond singing the praises of IELTS, it really seems like BC and IDP are pushing back at recent trends in the testing industry. And at their competitors.
The closing remarks (which are highly recommended reading) include this:
“While there may be assessments on the market that promise quicker results, more entertaining formats, or easier pathways, the question institutions and students alike must ask is: at what cost?”
There are also a few words about “inherent duty.”
I’m not informed enough to know if the above criticisms are valid, but it is good when testing companies justify their existence and their products. It is also good for tests to be quite different from each other. The last thing we need is a blob of samey tests used for all possible purposes.
Will this make a difference? Well, I haven’t seen any evidence that institutions actually read this sort of stuff. University leaders seem to pay scant attention to the details of the tests they accept – it’s hard enough to get them to adjust scores to match new concordance tables or to stop “accepting” tests that ceased to exist years ago. But things could change.