The Educational Testing Service (ETS) just published a document called “Reimagining Educational Assessments: AI Innovations for Enhancing Test Taker Experience.”

The document says, seemingly in reference to scoring of the TOEFL iBT, that:

“ETS combines the efficiency of AI with essential human oversight. While AI manages most of the scoring, human raters review a sample of the machine-scored responses.”

That appears to be a departure from how the TOEFL has traditionally been scored.  Traditionally, every response (not just “a sample”) has been graded by both a human rater and by AI.  Until now, it has never been accurate to say that AI “manages most of the scoring.”

That said, the phrasing used in the document is somewhat vague.  Maybe I’ve misunderstood it.  Perhaps someone from ETS can confirm what it means.

UPDATE: I have been informed by reliable sources that there has been no change to the scoring process.

This comes a few months after ETS began the process of offshoring human scoring of TOEFL test taker responses to facilities in India.

In a recent interview with the Free Press Journal, ETS India head Sachin Jain said that ETS aims to provide TOEFL score reports in just 2 days.  That’s good news for test takers.  A typical TOEFL test taker in India is someone who is also planning to take the GRE. Consequently, the TOEFL sometimes gets shunted aside and left to the last minute.  This sort of test taker will appreciate faster results quite a lot. It will also be good for business, as the other tests in this category have provided two-day results for quite some time.

Jain also mentioned that sometime in Q1 of 2025, ETS India will start providing a free “intermediate guide to the TOEFL” to test takers.  That’s also good news.  I really like the beginner’s guide which is currently provided… but as I noted in my review it is a somewhat skimpy offering compared to what people taking the IELTS in India currently receive at no cost.

According to reports that rolled in last week, the Educational Testing Service (ETS) has begun training individuals from outside the USA to score TOEFL test taker responses and to serve as scoring leaders.

This seems to represent something of a shift as far as the TOEFL scoring process goes.  To date, responses have been scored solely by individuals physically located in the USA (and in possession of a degree from an American university).  It is unclear at this time which countries the new raters will be located in.

Update:  For a little more confirmation, head over to the ETS Glassdoor page.

At the 2019 TOEFL iBT Seminar in Seoul on September 5, ETS announced details of the new “Enhanced Speaking Scoring” for the TOEFL, which has actually been in place since August 1, 2019.

In the past, speaking responses were graded by two human graders. Now, however, speaking responses are graded by one human grader along with the SpeechRater software. This software is a sort of AI that can evaluate human speech, and has been used by ETS for various tasks since about 2008. Most notably, it provided score estimates for the “TOEFL Practice Online” tests they sell to students.

According to ETS:

“From August 1, 2019, all TOEFL iBT Speaking responses are rated by both a human rater and the SpeechRater scoring engine.”

They also note:

“Human raters evaluate content, meaning, and language in a holistic manner. Automated scoring by the SpeechRater service evaluates linguistic features in an analytic manner.”

To elaborate (and this is not a quote), ETS indicated than the human scorer will check for meaning, content and language use, while the SpeechRater will check pronunciation, accent and intonation.

It is presently unknown how the human and computer scores will be combined to create a single overall score, but looking at the speaking rubric could provide a few hints. Note that in the past the human raters would assess three categories of equal weight: delivery, language use, and topic development. If the above information is accurate, the SpeechRater now assesses delivery, while the human now assess language use and topic development. It is possible, then, that the SpeechRater provides 1/3 of the score, and than the human rater provides the other 2/3.

I will provide more information as I get it. In the meantime, check out the following video for more news and speculation.