Hey, so I’ve reduced the price of my TOEFL ebook to 99 cents on Amazon.  The book covers the TOEFL iBT writing section and contains some of the best practice questions and answers I’ve created over the years.  It also collects a bunch of the grammar articles that appeared on this blog before August of last year.  If that seems like the sort of the thing that might interest you, do pick up a copy on Amazon.  The book is no longer part of the Kindle Unlimited program, so even Prime members can buy a copy.  It is also available in paperback, but obviously that’s a bit more expensive.  Buying a copy will help me fulfill my dream of having the best-selling TOEFL book on Amazon, where I’m currently #6.

Anyone preparing for the TOEFL (or helping people prepare for it) ought to know that John Healy has integrated the e-rater into his My Speaking Score platform.  Anyone who has purchased some amount of SpeechRater credits can now submit an unlimited number of essays and receive an e-rater score.  This service is currently in beta, but is still pretty useful.  I’ve already received a couple of “are you seeing this?!” messages.  I’ve written quite a lot about the move from pricey TOEFL tutoring to affordable self-guided prep, and I suppose this will accelerate the move.

As I said, you can gain access by purchasing any package of SpeechRater credits.  Try the coupon code TESTRESOURCES to save 10% on that purchase.

Below is a screenshot!

New this month from John Norris and Larry Davis is a detailed comparison of the old TOEFL Independent Writing Task and the new TOEFL Writing for an Academic Discussion Task.

It notes that among test-takers who completed both tasks in operational settings, 50% received the same score (from 0 to 5) on each from a single human rater. 47% received a score that was +/- one point on the same scale.

Furthermore, the article notes: “We saw no difference in terms of the measures of cohesion that we evaluated, and overall very few differences in terms of specific measures of syntactic complexity, grammaticality and mechanics, or word use.”

Some differences were noted, though. According to the article:

“A few linguistic measures differed across tasks in a manner that may suggest a slightly greater orientation toward academic register in the IND writing task. These measures included slightly greater use of academic vocabulary, as well as somewhat longer noun phrases and clauses, features typical of academic writing. On the other hand, responses to the WAD task showed marginally higher lexical density (relative frequency of content words) and somewhat fewer word usage errors, both of which may be associated with shorter responses. “

Also, the Writing for an Academic Discussion task elicited more writing per unit of time.

It is worth noting that e-rater scores were not available for most of the WAD responses studied in this report as automated writing scoring was only added to the TOEFL Essentials Test in late 2022. I’d like to see more research into how human scores for WAD tasks compare to e-rater scores for the same.Also: given the change to the test, it may be a good time for a follow up the research done by Brent Bridgeman, Catherine Trapani and Yigal Attali in 2012 about the possibility that machine scores can differ (in terms of their closeness to human scores) for certain gender, ethnic and country groups.

It has been argued that the new “Writing for an Academic Discussion” task requires test takers to write fewer words overall, every grammar or language use mistake they make will have a greater impact on their score.  I figured I would run a few tests in the e-rater to see if this is the case.

First up, this response got a score of 5.0:

“I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids go to bigger places.”

If I submit the same response with all of the commas removed I get a score of 4.0.

If I submit the original response but with ONE spelling mistake (truely) I get a score of 4.0.

So does a small number of individual errors have the potential to impact the score of a test taker?  It would seem so.

However, it isn’t as clear cut as it seems.  While the human raters give only whole number scores, it seems like there the e-rater is using decimals under the hood.  I imagine that my original response is a “low 5” answer (maybe just a 4.5, rounded up).

Here is an answer where I retain the spelling mistake, but beef up my vocabulary usage:

“I respect the ideas noted by Claire and Kelly, but I feel that the only way to truely solve this problem is to construct better schools.  Parents want their children to be educated at the best possible facilities. It is almost impossible to find great schools in the countryside. The government currently provides funding for beneficial things like science laboratories and lavish libraries in cities, so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids depart for bigger places.”

This time, my score goes back to 5.0.  I can use the same technique to overcome the penalty for not using commas.  Actually, I can use the same technique to overcome the problem of not using conjunctions, which I have written about elsewhere.  I bet it works both ways.

Decimals, right?  I gain a few tenths of a point for improving my vocabulary and that compensates for the few tenths of a point I lost for the spelling mistake.  We’ve seen a similar thing in some implementations of the SpeechRater AI.

Anyway.  This might be useful for people preparing for the test.  Test-takers ignore the quirks of automated scoring at their own peril.

I’ve recently uploaded a bunch of videos about the new writing task.  Just in case you’ve missed ’em, here they are:

 

 

What do test-takers need to perform well on the new TOEFL “writing for an academic discussion” task?

Conjunctions, I guess.

The scoring rubric for this task hints at this when it says that a high-scoring answer contains “effective use of a variety of syntactic structures.”

I started today’s experiments by submitting a response consisting of only simple sentences. It certainly lacked syntactic variety, and received a score of 4.0 from the AI scoring engine:

I like the ideas noted by Claire and Kelly. I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment.  Parents who are concerned about their kids go to bigger places.

I tried to beef it up with some impressive vocabulary but it still scored 4.0:

I respect the ideas noted by Claire and Kelly. I feel that the sole way to truly resolve this dilemma is to construct better schools.  Parents want their children to be educated at the best possible facilities. It is almost impossible to find impressive schools in the country. The government currently has money for valuable things like science laboratories and fantastic libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment.  Parents who are concerned about their kids go to bigger places.

I wrote a really long version (197 words) but still scored 4.0:

I like the ideas noted by Claire and Kelly. I feel that the only way to truly solve this problem is to build better schools. Schools are the bedrock of all academic systems.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The country only has a limited range of places to learn. The government currently has money for good things like science laboratories and nice libraries in cities. These facilities give adolescents there an advantage. It is clear to see that they learn much more. Their test scores on university entrance exams are the highest in the entire nation. This means they can attend the best universities. The government ignores rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment. Students at those schools perform poorly. They struggle to learn advanced concepts. I think they have poor reading comprehension as well. They are unable to go to good universities. It is likely that they don’t enjoy impressive careers either.  Parents who are concerned about their kids go to bigger places.

I’ll spare you the pain of reading it, but a long version with better vocabulary also scored 4.0.

In an effort to insert some syntactic variety, I wrote a version of the original answer with two different coordinating conjunctions (but, but, so). It also got 4.0:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places, but it is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

And when I used three different coordinating conjunctions (but, yet, so)? It still scored 4.0:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities, yet they ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

Next, I wrote a version with two different coordinating conjunctions (but, so) and one subordinating conjunction (while) and it scored 5.0:

While I like the ideas noted by Claire and Kelly, I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places, but it is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents are concerned about their kids and go to bigger places.

Finally!

It still scored 5.0 when I moved the conjunctions around a bit:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. While the government currently has money for good things like science laboratories and nice libraries in cities, they ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

And it still scored 5.0 when I used a different subordinating conjunction (when):

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities, so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids go to bigger places.

So there ya go. If you are planning to take the new TOEFL after July 26, make your writing a bit more sophisticated by including both types of conjunctions. That could certainly show off your ability to effectively “use of a variety of syntactic structures.”

One takeaway of this research is that a possible “magic formula” for the new writing task is: 100+ words, one subordinating conjunction, two coordinating conjunctions.

You can find all of my submissions on this page.

To learn more about the impact of word count on automated scoring of “writing for academic discussion” prompts that will be included on the TOEFL on July 26, I spent some time answering the sample questions provided by ETS. This sort of experimentation is mildly important, I think, as many test-takers (and tutors) hold the idea that only really long responses get high scores on the current TOEFL. This sometimes results in the creation of monster-sized TOEFL essays.

What did I learn?

Happily, the automated scoring system (e-rater) gave me a perfect score of 5.0 for the following 102-word response to sample question one (about how to repopulate the countryside):

“While I appreciate the solutions presented by Claire and Kelly, I feel that the only way to truly solve this problem is to construct better schools.  Parents want their children to be educated at the best possible facilities, but it is almost impossible to find impressive schools in the countryside. Although the government currently provides funding for amenities like science laboratories and lavish libraries in cities, they neglect rural areas. Rural schools, like the one I attended, lack even basic educational supplies like computers and sports equipment.  Consequently, parents who are concerned about their kids head for greener pastures, so to speak.”

The question prompt recommends writing about 100 words, so I’m happy. Students can confidently follow the given instructions, I guess.  

I fiddled about with a series of shorter (but similar) answers and was able to get a score of 4.0 for the following 49-word response:

“While I appreciate your ideas, I think we need better schools.  Parents want their children to utilize excellent facilities, but it’s impossible to find impressive schools in the countryside. Although the government currently funds amenities like libraries in cities, they neglect rural areas.  Consequently, parents head for greener pastures.”

That’s as low as I could go and still get a score of 4.0.

But I could get a score of 3.0 for the following 32-word response:

“While I appreciate your suggestions, we need better schools.  Children need excellent facilities, but the countryside lacks them. Although the government funds academics in cities, they neglect rural areas.  Consequently, parents leave.”

Anything lower than that resulted in a score of 2.0 or less.

A few things are worth mentioning:

  1. The automated score will be combined with a human score on the real test.
  2. Obviously word count correlates with other features like range of vocabulary and number of grammatical features.
  3. The above scores encompass a whole range of scores once they are scaled up. A score of 4.0 from the e-rater could scale up to anything from 21 to 26. My sample would likely be on the low end of this range. It would be nice to get decimals from the ETS website.

You can find a record of everything I submitted along with the e-rater scores over here.

ETS now provides twenty-two samples of the “writing for an academic discussion” questions that will appear on the TOEFL Test starting July 26. Each one includes AI scoring using the same e-rater that is used to (partially) score the real test. More questions will be released in the weeks ahead.

Closer to that day I’ll examine the questions and gather some data related to item design and frequent topics. I’ll also experiment a bit with ETS’s AI scoring.

One thing stands out already, though. Most of these questions are about more challenging topics than appear on the current Independent Writing task (which will be removed July 26). They seem to require more thoughtfulness. Or even some amount of erudition.

Consider Sample 11:

“This week, we will be discussing a shortage of affordable housing that exists in many countries. In these places, housing – both apartments (flats) and houses – are expensive, because populations are growing faster than new housing is being built. Now, think about places in your country that have a housing shortage. In your post, I would like you to indicate the most effective way for the government to address a housing shortage in your country. Please explain why you think so.”

Or Sample 1:

“Let’s think about population Trends in urban and rural areas (villages). Living in urban areas can be expensive; nonetheless, when they have a choice of where to live, people in some countries do not wish to live in rural areas even if the cost of living there is lower. If governments of some countries want to attract more people to live in rural areas or villages, what is the best strategy or approach that governments can use? Why?”

The test-takers might be asked how to solve the housing crisis? How to repopulate the countryside? I like this approach to item design, but I know some people might struggle to come up with ideas.

Compare these to this Independent Writing task from Test 1 in the Official Guide to the TOEFL:

“Some young adults want independence from their parents as soon as possible. Other young adults prefer to live with their families for a longer time. Which of these situations do you think is better?”

Or from Test 2:

“Do you agree or disagree with the following statement? Young people enjoy life more than older people do.”

I know that some of the new sample questions are easier than the ones I quoted above. And some of the independent tasks on the current version of test are harder than what appears in the Official Guide. That said, if you are preparing for the new TOEFL, you should be prepared to THINK. As always, remember that this is not a test of your English skills. It is a test of your ability to use your English skills in an academic context.

I watched a webinar provided by ETS Global called “TOEFL Writing Without Secrets.”

It was, overall, a very useful webinar.  It contained insights that would be useful to both test-takers and test-prep people.  But look at this reading passage used to illustrate the first TOEFL writing task (the integrated essay):

Eagle-eyed readers will recognize that as the question used to illustrate this task in the Official Guide to the TOEFL.  Long-time readers will know that this isn’t what the TOEFL integrated essay actually looks like.  I am too tired to once again explain how the integrated question is designed, but basically the reading has four paragraphs (not two).  It has an introductory paragraph and it has three supporting body paragraphs, each with a concise argument.

This faulty question has been in the Official Guide since it was first published back in 2005. One imagines that the writers of the book were working from prototype versions of the TOEFL and didn’t have access to samples of actual test forms (’cause the test hadn’t actually been given at that point).  The original practice test in the book has a similarly faulty question, by the way.  It has also appeared in every new edition of the book.

What really gets my goat is that this has been sort of catastrophic for TOEFL test prep.  Countless third-party books have been published that also include crappy practice questions, seemingly influenced by the above content.  Overall, this makes the TOEFL a less attractive test than the IELTS.  IELTS test-takers have a crystal clear picture of what that test is like, partially because of all the amazing and accurate official test collections that have been published.

I feel that had the Official Guide been a bit more accurate the third-party books would also be more accurate.  And now the faulty question has even influenced the quality of an actual ETS webinar!  Oh the humanity.

Apparently an updated edition of the Official Guide will soon be published (to match the new version of the TOEFL).  If anyone from ETS is reading this, I implore you to touch up these sections.

ETS has published ten new sample writing for academic discussion questions on their website.  That’s really wonderful news for people who plan to take the TOEFL when it is revised on July 26.  

But that’s not all!  Users can submit their responses to the questions and get an AI score from 0 to 5!  That will help everyone predict how they will do on the real test.

The folks have ETS have indicated on LinkedIN that the collection of questions will be expanded in the future.  And, as I have already reported here, ETS plans to release a test-prep app which includes new questions of every type along with AI grading for both the writing and speaking sections.

When preparing for standardized tests, people are often forced to struggle with bad study tools.  Books and courses often contain inaccurate questions.  People teaching those tests also struggle with this problem, obviously.

To nudge publishers and course designers in the right direction as they update their books for the revised TOEFL test, I recently teamed up with Jaimie Miller to analyze existing samples of the new TOEFL writing question and produce some guidelines for good question creation.  The results of our work follow.

How to Write “Authentic” Academic Discussion Prompts for the new TOEFL iBT Writing Test

If you are producing sample activities for TOEFL iBT test-takers who need to prepare for the Academic Discussion portion of the new TOEFL iBT Writing test (added to the test starting July 26, 2023) we strongly encourage you to base your activities on the analysis that we have done of the sample activities that ETS has made available. By doing so, you’re creating material that test-takers are likely to recommend to their friends. 

Test-takers have this view of the information when it is time to write their Academic Discussion response:

(You may also want to download this side-by-side comparison and breakdown of the 3 samples that ETS has released so far as a reference to guide)

Section One: Instructions

Section 1 contains the instructions, which are always the same.  Only the academic department (sociology, business, political science, etc) changes.  Subjects in the liberal arts seem most common, but anything could be used as long as the question is accessible. The questions are unlikely to require any specific technical knowledge.  For instance, there are likely to be questions about the use of social media (which most people are broadly familiar with) but questions about something like genetically modified plants (which some people aren’t familiar with) are unlikely. There is an image of the professor.

Section Two: The Question

Section 2 includes the actual question, and some context. In sample questions now available,  it ranges from 39 to 56 words.  It establishes the general theme of the question and provides background information to activate schemata and give the test-taker time to adjust. There is commonly a reference to “the discussion board” to make it look realistic. 

After a clear line break, the professor presents 1 or 2 questions that are visually set apart in a block. The questions are academic in tone, but not challenging.  Just a slight step up from the banality of the Independent writing task.  Nothing technical, nothing complicated, nothing culturally or demographically inaccessible.

When they ask an open-ended question  (“What do you think is the most significant effect…?” or “Which issue would you argue is more important…?”), they follow with a simple “Why?” question that encourages the test-taker to dig into reasons and examples. 

When they ask a YES/NO question (“Is advertising just a way of manipulating people…?”), they follow with a second YES/NO question that takes an opposing perspective (“… or is it an important source of information…?”).  

It’s common to find a comparative or superlative adjective in the question. For examples, review ETS’s 3 samples here.

The total word count for the questions ranges from 19 to 30 words in the samples currently available. 

The total word count for Section 2 probably ranges from 69 to 75 words total in the samples currently available.  

Section 3: The First Response

In Section 3, the first student responds to the question.  Responses in the currently available samples are 39 to 59 words. In samples currently published by ETS, the first student’s responses seem to have the following characteristics:

✅ contractions (“don’t” as opposed to “do not”) occur with limited frequency

✅ “I know / I don’t think / I think”

limited use of personal examples or personal stories

❌ no abbreviations (“television” not “TV”)

✅ 1 example with a generalization that is based on EITHER:

blending plausible 2nd Person generalizations with 3rd Person generalizations (“When you are watching television, you are not moving around or exercising. This is especially true for children. When children spend a lot of time watching television, they have a greater tendency to be overweight”

OR plausible statistics with specific details and multiple numbers (“I read that in just one year, from 2018 to 2019, the number of computers, tablets and mobile phones using ad blockers increased from 142 million to 615 million”

✅ realistic use of capital letters for emphasis in strategic places  (“I think the REAL question is…”)

(The above features may or may not appear in other items of this type)

Section Four: The Second Response

In Section 4, the second student responds to the question in 53 to 59 words.  In open-ended questions (“What do you think is the most significant effect…” or “Which issue would you argue is more important…?”), they mention a new idea that Student #1 didn’t mention. For YES/NO questions (“Is advertising just manipulation… or is it a source of information?”), Student #2 argues against whatever Student #1 said.

In samples currently published by ETS, student #2’s responses seem to have the following characteristics:

✅ “I think / I disagree with…” 

✅ contractions (“I’m” and “There’s” as opposed to “I am” and “There is”) occur with limited frequency

❌ abbreviations (“television” not “TV”)

✅ occasional use of adverbs (“actually”)

✅ 1 example with a generalization that is based on EITHER:

blending plausible 2nd Person generalizations with 1st Person personal stories (“Think of all the different places in the world you can experience through television! Last night, I watched a program about life in Antarctica, and it was fascinating!”

OR plausible statistics with specific details and multiple numbers (“People can find out a lot about products from advertising. There’s plenty of evidence that people usually begin the process of making a big purchase by looking at ads and reviews… I’m going to post later about an advertisement that gave me a lot of useful information.”

(Again, note that the above features appear in the materials currently published.  They may or may not appear in future items of this type.)

The total number of words that test-takers are exposed to is probably in the range of 165 to 193.

Section Five: Participants

Section 5 is simply to note that one student is male and one student is female. Images of each student are presented along with their responses.

Section 6: The Test-Taker’s Response

The test-taker’s response is typed in Section 6. A word count is displayed on the screen.  

If you use these guidelines to create your own content, I will be happy to link to it here.  So far, decent questions can be found at:

 

I think the most frequently asked question about the new TOEFL writing task (“writing for an academic discussion”) is about the timing.  Test-takers want to know if the ten minutes provided for the task is for both reading and writing.  The answer to that comes from a Tweet by ETS:

The second task requires a limited amount of reading to provide a context for writing and help the writer form a response. The reading is part of the task, so it is included in the 10-minute response time.

So there ya go.  You’ve got ten minutes to read the question, read the student responses and write your own response.  That will require effective time-management.  I recommend quickly skimming the responses so that you can spend most of your time writing your own answer.

I do hope that ETS recognizes that a lot of people want to know this information, and that a Tweet (a reply no less) is probably not the best way to disseminate it!

 

I spent a few hours fixing up my sample questions for the new TOEFL writing task.  I somewhat expanded the discussions and added pictures to each speaker, just like the real test will have.  I also revised the recommended template a bit so that it begins with a clear thesis statement.  I am also thinking about using short paragraph breaks for fun.  

More samples to come.  Many more!