This week I was lucky enough to again have an opportunity to attend a workshop hosted by ETS for TOEFL teachers.  Here is a quick summary of some of the questions that were asked by attendees of the workshop.  Note that the answers are not direct quotes, unless indicated.

 

Q:  Are scores adjusted statistically for difficulty each time the test is given?

A: Yes.  This means that there is no direct conversion from raw to scaled scores in the reading and listening section.  The conversion depends on the performance of all students that week.

 

Q: Do all the individual reading and listening questions have equal weight?

A: Yes.

 

Q:  When will new editions of the Official Guide and Official iBT Test books be published?

A:  There is no timeline.

 

Q:  Are accents from outside of North America now used when the question directions are given on the test?

A: Yes.

 

Q:  How are the scores from the human raters and the SpeechRater combined?

A:  “Human scores and machines scores are optimally weighted to produce raw scores.”  This means ETS isn’t really going to answer this question.

 

Q: Can the human rater override the SpeechRater if he disagrees with its score?

A: Yes.

 

Q:  How many different human raters will judge a single student’s speaking section?

A:  Each question will be judged by a different human.

 

Q:  Will students get a penalty for using the same templates as many other students?

A:   Templates “are not a problem at all.”

 

Q: Why were the question-specific levels removed from the score reports?

A: That information was deemed unnecessary.

 

Q:  Is there a “maximum” word count  in the writing section?

A:  No.

 

Q:  Is it always okay to pick more than one choice in multiple choice writing prompts?

A:  Yes.

I was able to ask a few more questions at an ETS webinar. Here’s what I learned (the answers are not direct quotes):

Q: Will results come back in six calendar days or six business days now?
A: Six calendar days.

Q: How significant are pauses when students are answering questions in the speaking section?
A: They can be very significant and can affect the score a lot.

Q: Could the same human grader score all four speaking responses?
A: No.

Q: Will a new Official Guide be published in 2019?
A: No. That has not been prioritized.

Q: Could students get only NINE reading questions with a specific reading passage?
A: Yes. This will happen if a fill-in-a-table question is given.

Q: Is it okay to mention the reading first in integrated essay body paragraphs?
A: The order “does not matter.” The scoring rubric is “not that structured.”

At the 2019 TOEFL iBT Seminar in Seoul on September 5, ETS announced details of the new “Enhanced Speaking Scoring” for the TOEFL, which has actually been in place since August 1, 2019.

In the past, speaking responses were graded by two human graders. Now, however, speaking responses are graded by one human grader along with the SpeechRater software. This software is a sort of AI that can evaluate human speech, and has been used by ETS for various tasks since about 2008. Most notably, it provided score estimates for the “TOEFL Practice Online” tests they sell to students.

According to ETS:

“From August 1, 2019, all TOEFL iBT Speaking responses are rated by both a human rater and the SpeechRater scoring engine.”

They also note:

“Human raters evaluate content, meaning, and language in a holistic manner. Automated scoring by the SpeechRater service evaluates linguistic features in an analytic manner.”

To elaborate (and this is not a quote), ETS indicated than the human scorer will check for meaning, content and language use, while the SpeechRater will check pronunciation, accent and intonation.

It is presently unknown how the human and computer scores will be combined to create a single overall score, but looking at the speaking rubric could provide a few hints. Note that in the past the human raters would assess three categories of equal weight: delivery, language use, and topic development. If the above information is accurate, the SpeechRater now assesses delivery, while the human now assess language use and topic development. It is possible, then, that the SpeechRater provides 1/3 of the score, and than the human rater provides the other 2/3.

I will provide more information as I get it. In the meantime, check out the following video for more news and speculation.

Schools Accepting TOEFL MyBest Scores

The following institutions have stated publicly that they will accept TOEFL MyBest Scores. Note that this list could be out of date. It is best to contact the school you are interested in directly.

Yale Graduate School of Arts and Sciences. Source: “If you wish to send us “MyBest Scores”, we will accept them. All TOEFL scores we receive will be made available to the program reviewing your application. “

Miami University. Source: “We accept MyBest scores for the TOEFL. This means that the highest scores for each section from different TOEFL exams will determine a combined highest sum score.”

Carnegie Mellon School of Design. Source: “the School of Design also accepts MyBest scores for TOEFL iBT. “

Shoreline Community College. Source: “MyBest scores are accepted.

University of British Columbia College of Graduate Studies. Source: “The College of Graduate Studies accepts MyBest Scores.”