At the 2019 TOEFL iBT Seminar in Seoul on September 5, ETS announced details of the new “Enhanced Speaking Scoring” for the TOEFL, which has actually been in place since August 1, 2019.
In the past, speaking responses were graded by two human graders. Now, however, speaking responses are graded by one human grader along with the SpeechRater software. This software is a sort of AI that can evaluate human speech, and has been used by ETS for various tasks since about 2008. Most notably, it provided score estimates for the “TOEFL Practice Online” tests they sell to students.
According to ETS:
“From August 1, 2019, all TOEFL iBT Speaking responses are rated by both a human rater and the SpeechRater scoring engine.”
They also note:
“Human raters evaluate content, meaning, and language in a holistic manner. Automated scoring by the SpeechRater service evaluates linguistic features in an analytic manner.”
To elaborate (and this is not a quote), ETS indicated than the human scorer will check for meaning, content and language use, while the SpeechRater will check pronunciation, accent and intonation.
It is presently unknown how the human and computer scores will be combined to create a single overall score, but looking at the speaking rubric could provide a few hints. Note that in the past the human raters would assess three categories of equal weight: delivery, language use, and topic development. If the above information is accurate, the SpeechRater now assesses delivery, while the human now assess language use and topic development. It is possible, then, that the SpeechRater provides 1/3 of the score, and than the human rater provides the other 2/3.
I will provide more information as I get it. In the meantime, check out the following video for more news and speculation.