Check out this wonderful new article in TESOL Quarterly by Yasin Karatay and Jing Xu. It explores the possibility of using an AI-powered spoken dialog system to simulate an IELTS examiner. Specifically, the researchers used a self-developed SDS powered by GPT-4o to simulate the third part of the IELTS speaking test.
It is important to note that in this research the AI served only as the interlocutor. Ratings were carried out by trained humans who reviewed recordings of the interactions.
The authors concluded that “the SDS consistently elicited some key interactional competence features, as seen in face-to-face oral proficiency interviews, and that such features were useful in distinguishing between higher- and lower-proficiency test takers.” They also highlighted areas for improvement around non-verbal clues and some unnatural interactions.
There is a lot more to it, of course. So take a moment to read the article.
Though the IELTS partnership takes a “cautiously curious” approach to the use of AI in testing, this may be an area worth exploring in more depth. Some observers have raised concerns about the ability of the partnership to maintain consistent standards across four million annual speaking tests, each carried out by an individual examiner. It is possible that at least some of these administrations are impacted by things like bias, fatigue and ability in the examiner population. It’s no secret that IELTS test takers frequently travel to test centers which they feel are more conducive to a higher speaking score. The IELTS partnership is quick to dissuade test takers of this notion, but it isn’t inconceivable that one examiner might be better at their job than another. Or that one might possess certain biases which another does not and that such differences could impact how the speaking test unfolds.
On top of that, this sort of change could lead to immense cost savings for the organizations that administer the IELTS (even if human raters are retained). Some of those savings could be passed on to test takers, if only to make the IELTS more competitive in an increasingly crowded market of tests. That’s a touchy subject so I’ll leave it for another day, but I think most readers can imagine the possibilities.
Finally, it is worth noting that this research was done by researchers at Cambridge University Press and Assessment. So maybe this could lead to something.
You may also appreciate this similar research funded by ETS back in 2021. Based on that work, I’m still optimistic that an AI-powered SDS system will find its way into some future version of the TOEFL.