The folks at Duolingo have published an article (and matching blog post) about how much time test takers should be given to complete writing tasks. Their research suggests that shorter tasks are just as useful as longer tasks in terms of reliability and validity.

This is a controversial topic among people who take the time to look at the sorts of tasks that are included on tests of English proficiency. It has generated some discussion.

Nowadays, both test makers and test takers seem to favor test forms that are shorter (in duration) than those used in the past. Since long essay tasks require a significant amount of time to complete, they are less popular than they used to be. Recall that last year the 300(ish) word “Independent Essay” task was dropped from the TOEFL, in favor of a 100(ish) word “Academic Discussion” response, meant to simulate a message board interaction. Research provided by ETS indicates that the shorter task is just as useful as the longer one it replaced.

A separate (but related) controversy relates to how closely test items should resemble real-world tasks carried out by students in the course of their future studies. The move to shorter writing tasks means that newer tests often include items that do not simulate real academic work. Some people find this problematic. Some do not. Yet others argue that the tasks on more traditional tests never actually simulated this sort of thing in the first place.

In recent days, many people have reported seeing “not available” in the academic discussion response section of their TOEFL score report.  It looks a bit like this:

This means that AI-generated information about your response is not available.  It does not mean that your answer was not scored.  Don’t worry… your answer was scored.

Why does this happen?  It happens because your response could not be scored by ETS’s e-rater AI and was scored only by human raters.

Why could the AI not score your response?  That is unclear, but it sometimes happens if a response is extremely long. But it sometimes happens for other (unknown) reasons.  Leave a comment below if you have seen this in your score report.

No clients to write for at the moment, so I finally have a moment to produce my own material. Here’s a new TOEFL integrated writing practice question.

It’s about banning CFL light bulbs. The topic represents an effort to make questions that are more tedious and obscure than what test takers usually find in practice materials. As it is a first draft, let me know if you spot any typos or unclear phrases.

I realize now that I must stop writing so many lecture “paragraphs” that begin with some variation on “as for the argument about xxxxx.”

According to the pile of post-it notes on my desk, next up is a question about whether or not a certain ring belonged to Caligula. Apparently I read an article about that topic earlier this year.

Earlier this year I helped a student prepare for the ALP Essay Exam. I couldn’t find much information about the test online, so I thought I would write a few notes here.  I might revise this post in the future, so stop by in the future for updates. If you need tutoring for the ALP Essay exam, you can contact me.

What is the ALP Essay Exam?

The ALP Essay Exam is used by Columbia University to assess the writing skills of students.  It is often used to determine if students have the language skills necessary to take classes at the university. It can also be used to determine if students should take supplementary writing classes (in addition to their regular schedule of classes). Test-takers have 105 minutes to write a standard (four or five paragraph) argumentative essay about a specific topic.  The essay must be based on the contents of two short academic articles.

You can read about it over here.

What Does the ALP Essay Exam Look Like?

You’ll get a question about a serious topic.  Don’t expect something basic and simple like the IELTS.  Instead, expect something that might actually be studied in a first-year university class.  You might get something about gentrification, affirmative action, the use of standardized testing… that sort of thing.  The question might look like this:

“Please read the two passages below.  The authors have differing opinions about the topic of gentrification in the United States. Which author do you agree with, and to what extent?  In your essay you should support your opinion, and challenge the opinions of the author you disagree with.  You have 105 minutes to complete your essay.”

The passages should be fairly short.  Maybe just a paragraph or two, excerpted from a longer article.  They will have opposing opinions on the same topic. The author of each one will be credited

If the topic is gentrification, they might look like this:

“One of the most significant benefits of gentrification is the improvement of housing. Ordinarily, housing presents enormous challenges in the management of urban centers. Therefore, gentrification seems to solve this challenge because it favors the improvement of housing within the gentrified community. In addition, it is believed to stabilize declining areas. In most cities, suburban areas are known to experience degradation leading to the emergence of slums. This phenomenon is caused by the increased strain on urban infrastructure and services. Therefore, gentrification addresses an array of urban management challenges by reducing suburban sprawl and strain on the existing infrastructure.

Another positive effect of gentrification is the increase in property values. As a result, property owners reap high income from real estate investment, and this serves as a means of attraction for potential businesses. It is also suggested that gentrification leads to a significant increase of local fiscal revenues. Moreover, gentrification has led to the rehabilitation of property with little state sponsorship. Therefore, an increase in property values and local fiscal revenues promote economic development of gentrified areas. Economic development is also enhanced by an increase in purchasing power in the centralized economy, although it is uneven.

It is also believed that gentrification leads to increased social mix and reduction in crime rates. This phenomenon has been evidenced in gentrified cities such as London, Atlanta and Washington, DC.

-Caroline Mutuku

and:

Gentrification usually leads to negative impacts such as forced displacement, a fostering of discriminatory behavior by people in power, and a focus on spaces that exclude low-income individuals and people of color.

During gentrification, poorer communities are commonly converted to high-end neighborhoods with expensive housing options such as high-rises and condominiums. As property prices increase, the original residents of the neighborhood are forced out in a variety of ways. First, with an increase in the prices of buildings, the gap between the price of the building and the income that the landlord gets from renting the building grows bigger; landlords thus increase rent prices, which forces out the low-income residents. As building prices continue to increase, the problem exacerbates because it becomes even more profitable to convert these apartment buildings into non-residential areas. Additionally, since investors can earn more money from selling buildings, real-estate dealers have less incentive to improve the buildings. The real estate dealers instead sell the buildings at higher prices. This cycle of rising building prices continues until only large and well-financed investors are able to continue.

Displacement… is disproportionately borne by low-income individuals of color, many of whom are elderly individuals.  Physical frailty makes it more challenging for elderly individuals to resist the actions that landlords take to remove tenants. Researchers have also found that elderly people are more intensively affected by social changes around them; for example, many older adults cited loss of friendships or community networks as a reason to move. 

-Emily Chong

How to Structure the Essay

The structure is fairly easy.  Write an introduction that provides some background on the topic and a clear thesis statement that states your opinion on the topic.  Then write two or three body paragraphs.  Each one should focus on a specific argument in support of your argument or the rebuttal of a specific point in the article you don’t agree with.  Finally, write a conclusion that sums of what you’ve just created.  Aim for 400 to 600 words in total.  Easy, right?

How to Get a Good Score

Getting a good score isn’t so easy.  To award you a high score, the rater needs to see an argument, but they also need to see the use of fairly sophisticated writing techniques.  The list below is drawn from the official ALP website, and a few other sources used in ALP classes at Columbia.

Remember that your essay must also quote from the sources when appropriate.

Remember, also, that in addition to this advanced stuff, your essay needs to show mastery of basic stuff.  That means basic transitions (therefore, however, in addition) and a mix of all three sentence types (simple, compound, complex).  You also need nearly perfect grammar to get a high score.

Sample Paragraphs

I can’t teach you the basic stuff here, but I can show you examples of the advanced concepts mentioned above.

Here’s a sample paragraph from an essay I wrote about mental health.  I’ve underlined parts that use the above techniques.  In order, they are: parallel structure, using the article, appositive, noun clause in subject position, inversion. 

Young people are able to discuss their mental health challenges with others, and are willing to reach out for help when necessary. As the article by Smith indicates, 62% of millennials are comfortable with this. Proof is easy to find. Many organizations have taken up the suggestion of the Center for Workplace Mental Health and created departments which help workers cope with issues as they arise. In addition, employee benefits now include financial support for outside counseling and psychological care.  Even more indicative of this trend  is the recent emergence of businesses which profit from the desire that young people have to discuss their mental health. Several new smartphone apps, services jokingly referred to as “Uber for Counseling,” have made a lot of money connecting people with therapists. With just a few clicks, we can be connected with a therapist and receive their assistance via voice or text. The benefits are clear; when people are willing to talk about issues that challenge them, and there are people willing to listen to them, they can be given strategies that mitigate the negative effects or perhaps eliminate the issues altogether. Rarely do people today find themselves in an environment where they have absolutely no one to turn to.  This is quite a shift from even just a few decades ago, when sufferers of mental illness often felt lost at sea.

Next is part of a paragraph about reparations.  I’ve underlined an example of fronting, and an example of an appositive.  Note the extensive quotes from the article, which are integrated into my own sentences.

While long-term solutions to today’s problems must certainly involve political and economic changes, the political and economic systems are slow to change. With great enthusiasm, conservative journalist Frank Williamson says that “the political interests of African Americans… are best served by equality under the law.” Williamson, an experienced political writer, knows that politicians have been working towards “equality under the law” for decades, and are still far from achieving it.

Here is an introductory paragraph from an essay about inclusive language.  Note how I’ve underlined a parallel structure, fronting, and another parallel structure.  Note that I ended with a clear thesis statement.

They say that people change over time, and that language changes along with them. Nowadays, thanks to the spread of the Internet, language seems to be changing at a more rapid pace than ever before. Rarely do we go a week without reading an article or seeing a social media post that uses a term or phrase that is totally new to us. Many of us want to be supportive of marginalized groups, and we want to express our opinions clearly without being lost in a sea of jargon. Personally, I feel that our choice of words is very important, but we must be careful to avoid being overly judgmental of people who can’t keep up with the newest words.

Wrapping Up

Okay, so that’s a broad look at what the ALP test looks like and what you need to do.  For more help, or tutoring, feel free to contact me. To keep up with the latest changes to this test, contact Columbia University.

 

Amazon is now shipping copies of the new Official Guide to the TOEFL. As noted a few days ago, the guide no longer contains certain long running inaccuracies, so it’s probably a good time to record the Saga of the Altruism Question.

In late 2005 the first edition of “The Official Guide to the New TOEFL iBT” was published. It contained numerous inaccuracies. One can’t really blame the writers, as they compiled the book before the test launched. It brings to mind those early Star Trek: The Next Generation paperbacks where Troi calls Riker “Bill” and Tasha Yar has long hair.

The most notable errors were two depictions of the integrated writing task. One about group work (contained in the chapter about the writing section), and one about altruism (found in the practice set). I can go into details in the comments if you like, but basically this question has a very specific form and neither of the samples followed it.

Sadly, these two questions also appeared in the second edition, published in 2006.

By this time, third party publishers were releasing their own TOEFL prep books. And here’s the thing: they naturally based their books on the contents of the Official Guide. As a result, every single one of them contained terrible integrated writing questions. I’ll try to create a slideshow below that highlights some examples.  Sorry… it will probably look like trash on mobile.

For the most part, major publishers are adverse to spending money, so these errors remained in the books for ages. Kaplan included terrible integrated writing questions in their famous purple books right to the day they discontinued them. Princeton Review added a new integrated writing question to the 2024 edition of their TOEFL book which is horrific. If you squint at it long enough you’ll notice that it was inspired by the Official Guide.

Had the original book contained proper questions, this problem could have been avoided.

Anyway, the bad questions remained in the third edition, which was published in 2009.

By this time I was teaching TOEFL. At least twice a week someone would send me a practice essay based on the famous altruism question and ask me to grade it. Every time I’d politely explain that even though the question came from the Official Guide, it wasn’t accurate and it would be a waste of their time and money to have me check it. Fifteen years later, I still have to explain that a few times a month.

The questions remained in the fourth edition, published in 2012. By this time ETS had licensed dozens of retired tests to New Oriental, so the proper format was widely known.

The questions remained in the fifth edition, published in 2017.

Teachers were hopeful that the sixth edition, published in 2021, would not contain these faulty questions given that the book required radical revisions to match the changes to the test of 2019. Sadly…  it appeared once more.

But hey.  It’s 2024 now.  Nineteen years have passed.  The bad questions have finally been removed from the book.

Hey, so I’ve reduced the price of my TOEFL ebook to 99 cents on Amazon.  The book covers the TOEFL iBT writing section and contains some of the best practice questions and answers I’ve created over the years.  It also collects a bunch of the grammar articles that appeared on this blog before August of last year.  If that seems like the sort of the thing that might interest you, do pick up a copy on Amazon.  The book is no longer part of the Kindle Unlimited program, so even Prime members can buy a copy.  It is also available in paperback, but obviously that’s a bit more expensive.  Buying a copy will help me fulfill my dream of having the best-selling TOEFL book on Amazon, where I’m currently #6.

Anyone preparing for the TOEFL (or helping people prepare for it) ought to know that John Healy has integrated the e-rater into his My Speaking Score platform.  Anyone who has purchased some amount of SpeechRater credits can now submit an unlimited number of essays and receive an e-rater score.  This service is currently in beta, but is still pretty useful.  I’ve already received a couple of “are you seeing this?!” messages.  I’ve written quite a lot about the move from pricey TOEFL tutoring to affordable self-guided prep, and I suppose this will accelerate the move.

As I said, you can gain access by purchasing any package of SpeechRater credits.  Try the coupon code TESTRESOURCES to save 10% on that purchase.

Below is a screenshot!

New this month from John Norris and Larry Davis is a detailed comparison of the old TOEFL Independent Writing Task and the new TOEFL Writing for an Academic Discussion Task.

It notes that among test-takers who completed both tasks in operational settings, 50% received the same score (from 0 to 5) on each from a single human rater. 47% received a score that was +/- one point on the same scale.

Furthermore, the article notes: “We saw no difference in terms of the measures of cohesion that we evaluated, and overall very few differences in terms of specific measures of syntactic complexity, grammaticality and mechanics, or word use.”

Some differences were noted, though. According to the article:

“A few linguistic measures differed across tasks in a manner that may suggest a slightly greater orientation toward academic register in the IND writing task. These measures included slightly greater use of academic vocabulary, as well as somewhat longer noun phrases and clauses, features typical of academic writing. On the other hand, responses to the WAD task showed marginally higher lexical density (relative frequency of content words) and somewhat fewer word usage errors, both of which may be associated with shorter responses. “

Also, the Writing for an Academic Discussion task elicited more writing per unit of time.

It is worth noting that e-rater scores were not available for most of the WAD responses studied in this report as automated writing scoring was only added to the TOEFL Essentials Test in late 2022. I’d like to see more research into how human scores for WAD tasks compare to e-rater scores for the same.Also: given the change to the test, it may be a good time for a follow up the research done by Brent Bridgeman, Catherine Trapani and Yigal Attali in 2012 about the possibility that machine scores can differ (in terms of their closeness to human scores) for certain gender, ethnic and country groups.

It has been argued that the new “Writing for an Academic Discussion” task requires test takers to write fewer words overall, every grammar or language use mistake they make will have a greater impact on their score.  I figured I would run a few tests in the e-rater to see if this is the case.

First up, this response got a score of 5.0:

“I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids go to bigger places.”

If I submit the same response with all of the commas removed I get a score of 4.0.

If I submit the original response but with ONE spelling mistake (truely) I get a score of 4.0.

So does a small number of individual errors have the potential to impact the score of a test taker?  It would seem so.

However, it isn’t as clear cut as it seems.  While the human raters give only whole number scores, it seems like there the e-rater is using decimals under the hood.  I imagine that my original response is a “low 5” answer (maybe just a 4.5, rounded up).

Here is an answer where I retain the spelling mistake, but beef up my vocabulary usage:

“I respect the ideas noted by Claire and Kelly, but I feel that the only way to truely solve this problem is to construct better schools.  Parents want their children to be educated at the best possible facilities. It is almost impossible to find great schools in the countryside. The government currently provides funding for beneficial things like science laboratories and lavish libraries in cities, so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids depart for bigger places.”

This time, my score goes back to 5.0.  I can use the same technique to overcome the penalty for not using commas.  Actually, I can use the same technique to overcome the problem of not using conjunctions, which I have written about elsewhere.  I bet it works both ways.

Decimals, right?  I gain a few tenths of a point for improving my vocabulary and that compensates for the few tenths of a point I lost for the spelling mistake.  We’ve seen a similar thing in some implementations of the SpeechRater AI.

Anyway.  This might be useful for people preparing for the test.  Test-takers ignore the quirks of automated scoring at their own peril.

What do test-takers need to perform well on the new TOEFL “writing for an academic discussion” task?

Conjunctions, I guess.

The scoring rubric for this task hints at this when it says that a high-scoring answer contains “effective use of a variety of syntactic structures.”

I started today’s experiments by submitting a response consisting of only simple sentences. It certainly lacked syntactic variety, and received a score of 4.0 from the AI scoring engine:

I like the ideas noted by Claire and Kelly. I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment.  Parents who are concerned about their kids go to bigger places.

I tried to beef it up with some impressive vocabulary but it still scored 4.0:

I respect the ideas noted by Claire and Kelly. I feel that the sole way to truly resolve this dilemma is to construct better schools.  Parents want their children to be educated at the best possible facilities. It is almost impossible to find impressive schools in the country. The government currently has money for valuable things like science laboratories and fantastic libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment.  Parents who are concerned about their kids go to bigger places.

I wrote a really long version (197 words) but still scored 4.0:

I like the ideas noted by Claire and Kelly. I feel that the only way to truly solve this problem is to build better schools. Schools are the bedrock of all academic systems.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The country only has a limited range of places to learn. The government currently has money for good things like science laboratories and nice libraries in cities. These facilities give adolescents there an advantage. It is clear to see that they learn much more. Their test scores on university entrance exams are the highest in the entire nation. This means they can attend the best universities. The government ignores rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment. Students at those schools perform poorly. They struggle to learn advanced concepts. I think they have poor reading comprehension as well. They are unable to go to good universities. It is likely that they don’t enjoy impressive careers either.  Parents who are concerned about their kids go to bigger places.

I’ll spare you the pain of reading it, but a long version with better vocabulary also scored 4.0.

In an effort to insert some syntactic variety, I wrote a version of the original answer with two different coordinating conjunctions (but, but, so). It also got 4.0:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places, but it is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

And when I used three different coordinating conjunctions (but, yet, so)? It still scored 4.0:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities, yet they ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

Next, I wrote a version with two different coordinating conjunctions (but, so) and one subordinating conjunction (while) and it scored 5.0:

While I like the ideas noted by Claire and Kelly, I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places, but it is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities. They ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents are concerned about their kids and go to bigger places.

Finally!

It still scored 5.0 when I moved the conjunctions around a bit:

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. While the government currently has money for good things like science laboratories and nice libraries in cities, they ignore rural areas all the time. Rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, so parents who are concerned about their kids go to bigger places.

And it still scored 5.0 when I used a different subordinating conjunction (when):

I like the ideas noted by Claire and Kelly, but I feel that the only way to truly solve this problem is to build better schools.  Parents want their children to be educated at the best possible places. It is almost impossible to find great schools in the country. The government currently has money for good things like science laboratories and nice libraries in cities, so they should not ignore rural areas all the time. When rural schools like the one I attended lack even basic educational supplies like computers and sports equipment, parents who are concerned about their kids go to bigger places.

So there ya go. If you are planning to take the new TOEFL after July 26, make your writing a bit more sophisticated by including both types of conjunctions. That could certainly show off your ability to effectively “use of a variety of syntactic structures.”

One takeaway of this research is that a possible “magic formula” for the new writing task is: 100+ words, one subordinating conjunction, two coordinating conjunctions.

You can find all of my submissions on this page.

To learn more about the impact of word count on automated scoring of “writing for academic discussion” prompts that will be included on the TOEFL on July 26, I spent some time answering the sample questions provided by ETS. This sort of experimentation is mildly important, I think, as many test-takers (and tutors) hold the idea that only really long responses get high scores on the current TOEFL. This sometimes results in the creation of monster-sized TOEFL essays.

What did I learn?

Happily, the automated scoring system (e-rater) gave me a perfect score of 5.0 for the following 102-word response to sample question one (about how to repopulate the countryside):

“While I appreciate the solutions presented by Claire and Kelly, I feel that the only way to truly solve this problem is to construct better schools.  Parents want their children to be educated at the best possible facilities, but it is almost impossible to find impressive schools in the countryside. Although the government currently provides funding for amenities like science laboratories and lavish libraries in cities, they neglect rural areas. Rural schools, like the one I attended, lack even basic educational supplies like computers and sports equipment.  Consequently, parents who are concerned about their kids head for greener pastures, so to speak.”

The question prompt recommends writing about 100 words, so I’m happy. Students can confidently follow the given instructions, I guess.  

I fiddled about with a series of shorter (but similar) answers and was able to get a score of 4.0 for the following 49-word response:

“While I appreciate your ideas, I think we need better schools.  Parents want their children to utilize excellent facilities, but it’s impossible to find impressive schools in the countryside. Although the government currently funds amenities like libraries in cities, they neglect rural areas.  Consequently, parents head for greener pastures.”

That’s as low as I could go and still get a score of 4.0.

But I could get a score of 3.0 for the following 32-word response:

“While I appreciate your suggestions, we need better schools.  Children need excellent facilities, but the countryside lacks them. Although the government funds academics in cities, they neglect rural areas.  Consequently, parents leave.”

Anything lower than that resulted in a score of 2.0 or less.

A few things are worth mentioning:

  1. The automated score will be combined with a human score on the real test.
  2. Obviously word count correlates with other features like range of vocabulary and number of grammatical features.
  3. The above scores encompass a whole range of scores once they are scaled up. A score of 4.0 from the e-rater could scale up to anything from 21 to 26. My sample would likely be on the low end of this range. It would be nice to get decimals from the ETS website.

You can find a record of everything I submitted along with the e-rater scores over here.