Students often ask me something like “I got 22 in the speaking section, but I need 26.  How can I get 26?”  That sounds like a dumb question, because the TOEFL Speaking Rubrics explain exactly how to get a score of 26.  But it isn’t really a dumb question.  Students struggle with two things:

  • The speaking rubrics are unclear
  • The SpeechRater software is mysterious

In this article, I hope to explain what the rubrics mean and how the SpeechRater affects your score.

I also want to illustrate how it is challenging for teachers to answer that hypothetical question, since there are so many different factors that affect the scores.

First, Some Math

In the old days, your final speaking score was totally based on the rubrics.  Each answer got a rubric score from 0 to 4, and simple math determined your final score out of 30.  The formula looked look something like this:

((4 + 3 + 3 + 4)  / 16) * 30 = 26.25.

Easy, right?  Yes, I know there were six speaking questions back then, but I’m trying to make a point.

But now it isn’t so easy.  You get the rubric scores plus a SpeechRater score for each answer. But the SpeechRater scores are on a mystery scale, and the weight they are given is also a mystery.  So the formula looks like this:

((4 + ? + 3 + ? + 3 + ? + 4 + ?) / ?) * 30 = 26.25

Or something like that.  Basically, the rubrics are only part of the formula now.  We know the final score (because it shows up on the score report), but we don’t know exactly how it is determined.

But forget about this.  I mention this stuff only to illustrate that students are confused sometimes, and that their confusion is legitimate. 

Delivery

The first rubric category is “delivery.”  The independent speaking question rubrics are on the left, the integrated speaking question rubrics are on the right.  There are some minor differences, but to me they are pretty much the same.  The text on the top describes a score of 4, while the text on the bottom describes a score of 1.

Note the repeated use of “listener effort” in the descriptions.  This is how the human rater will determine your score.  He will listen to your answer once or twice or three times and then give you a score of 1, 2, 3 or 4 based on how much “effort” it took to understand you.  It is totally okay to have an accent, but if your accent forces him to use “extra effort” to understand you, your score will go down. Same for your intonation.  Same for your rhythm.  And so on.  ETS calls this “holistic” scoring.  The human rater isn’t counting up all the mistakes you make.  He doesn’t keep track of how many times you say something wrong.  He just listens to the whole thing and gives your answer a score based on how he feels.

The SpeechRater also judges your delivery.  But it is not holistic.  It actually does count up mistakes.  That’s important to keep in mind.  We don’t know exactly how the SpeechRater works, but we know that it judges you based on a few categories.  They include:

  • Pause Frequency.  This is the number of times you pause inappropriately while speaking.  If you pause in the middle of a sentence to think of what to say next, that will hurt you.   The number of pauses in your answer is counted, and you get a deduction based on the count.  You must practice delivering sentences without pauses.
  • Repetitions.  This is the number of times you repeat words inappropriately.  You will lose points if you say:  “I think, I think studying alone is better.”  Or if you say:  “When I was a student… at student… at high school I worked on a major product.”
  • Rhythm.  This is sometimes referred to as “intonation.”  If you stress the right syllables when you speak, you will get a higher score here.  Sadly, I can’t really summarize rhythm in the English language in a sentence, but I will link to some resources if I can find them.
  • Speaking Rate.  This is your words per minute.  More words per minute will result in a higher score.  But if you speak TOO quickly you might mess up your rhythm score, and you might require the human rater to use “extra effort” to understand you.  
  • Sustained Speech.  This is similar to pause frequency.  It measures how much you speak without pausing, or using something like “um” or “ah.”  This is easy to practice.  Just try to avoid saying “um” more than once or twice in your answer.
  • Vowel Length.  Are you pronouncing vowels properly?  Do you know the difference between a short vowel and a long vowel?  Can you say “mate” and “mat” properly?  This is important.

There are more categories, but these seem to be the main ones.  The weighting of the categories is unclear.  To learn more about how this aspect of the SpeechRater tech works, I recommend reading Automated Speaking Assessment, by Klaus Zechner and Keelan Evanini, from ETS.

Language Use

Again, here are the rubrics (independent on the left, and integrated on the right).

The human rater listens to your answer a few times and makes a snap judgement based on what they have heard. But what are they listening for?  Well, they are listening for grammatically correct English, of course.  They are also listening for some amount of sophistication.  This is defined as “good control of basic and complex structures.”  The rubrics also refer to a “range of structures.”  ETS does not say what this means, but to me it means a mix of simple, compound and complex sentences.  You can read more about sentence structures over here.  The rubric also mentions “effective use of… vocabulary.”  Again, it doesn’t say what “effective” means, but to me that means high-level vocabulary and a lack of repetition. 

As for the SpeechRater, it checks for a few grammatical things.  It creates a “transcript” of your answer and judges it based on:

  • POS-based features.  This means that it looks at the grammatical expressions used in your answer and compares them to a database of those listed in a corpus of other answers, all of which have been assigned high scores in the past.  Good answers sound like other good answers, right?
  • Clause-based features.  I don’t quite grasp this concept, but it seems that the SpeechRater is specifically looking for the presence of dependent clauses, and dependent infinitives.  This makes sense, as those are used to create the sort of structures mentioned above.
  • Phrase-based features.  The SpeechRater is looking for noun phrases, verb phrases, prepositional phrases, complex nominals.  It is also looking for coordinating phrases, which are also used to create the sort of structures mentioned above.

This all sounds needlessly complicated, but it is stuff you can practice.  You can practice using compound and complex sentences with the right conjunctions.  You can practice using verb phrases.  You can study noun phrases.  These are pretty basic things.

Meanwhile, the SpeechRater examines a few aspects of your vocabulary. They are:

  • Lexical Diversity.  The variety of words used in your answer.  Don’t repeat yourself.  You can practice this.
  • Average Word Difficulty.  You will get points for using more “difficult” words.  But don’t go crazy and use totally obscure words.  ETS uses its own corpus (called The TOEFL Academic Language Corpus… no, I haven’t seen it) to determine the difficulty level of words.  I suspect the corpus is based on words you ought to be using, rather than just weird words that exist in a dictionary somewhere.
  • Lexical frequency profile.  This is described as “the proportion of low frequency and high frequency words in the response.”  Frequency refers to how often they appear in the aforementioned corpus.  I don’t quite grasp the difference between this and “difficulty.”  Sorry.

Again, though, these are things you can practice. You just need to grow your vocabulary and ensure that you are using words that native speakers actually use.  Read a book.  Listen to a podcast.  Do something.

The above details are based on the content of Automated Speaking Assessment, by Klaus Zechner and Keelan Evanini.

Topic Development

Finally, you are scored based on your topic development.  Here are the rubrics:

I don’t believe that the SpeechRater checks your content development.  I think this is scored entirely by the human rater, but I could be wrong.  In the independent task, the human rater wants to hear an answer than is on-topic, and addresses the given question.  The details given in the answer should support the main argument made in the answer. 

In the integrated tasks, the details given should match those included in the sources.  The rater will examine the reading and listening and determine the key details on their own, or they might be given a chart summarizing the key details presented in the sources.  Your job is to mention as many of those details as possible.  You don’t have to mention them all to get perfect score, but you should mention as many as possible. 

Note how the rubric says your answer should have a “clear progression of ideas” and clear “relationships between ideas.” Indeed, the integrated rubric really emphasizes this at all score levels.  To me, this sounds like a request for transitional words like “as a result” and “therefore” which are used explicitly to connect ideas.  Don’t just mention details (especially in the integrated task) but mention how they relate to each other.

Conclusion

Okay.  That’s it.  A quick look at the SpeechRater and the rubrics.  If you are the hypothetical student mentioned the beginning who needs four more points, you should figure out which of the above things you are doing wrong, and then STOP DOING THEM WRONG.  Good luck to you.