I just uploaded the 2025 versions of my TOEFL writing templates to Youtube!  Below is the video.  Scroll down for the templates themselves and some explanations.

So I did things a bit differently in 2025.  For starters, I produced unique templates for all three of the integrated writing “styles” – casting doubt, problems and solutions, solutions and problems.  Basically, all three styles have the same body paragraphs, but slightly different introductions.

TOEFL Integrated Essay

Here’s the full template for the “casting doubt” style integrated essay:

  • The reading and the lecture are both about ______________.
  • While the author of the article argues that _________, the lecturer disputes the claims presented in the article.
  • His position is that ____________. 

 

  • First, the author argues that ______________.
  • The article states that ______.
  • The lecturer’s response is that _________.
  • Moreover, ____________. 

 

  • Second, the author mentions that ______________.
  • According to the article, ______.
  • In contrast, the lecturer argues that ____________.
  • He notes that _______. 

 

  • Finally, the article notes that ______________.
  • The author suggests that ______.
  • However, the lecturer points out that ______.
  • He says ____________.

I don’t think you need a conclusion.

If you get a problem/solution style question you can swap in this introduction:

  • The reading and the lecture are both about ______________.
  • While the article describes three associated problems, the lecturer suggests possible solutions to each of them.
  • His position is that ____________.

And if you get a solution/problem style question you can use this one:

  • The reading and the lecture are both about ______________.
  • While the article describes three solutions to this problem, the lecturer explains why they aren’t effective.
  • His position is that ____________.

TOEFL Academic Discussion Question

Next, for the academic discussion question I decided to greatly simplify things by producing a single scaled-down template for both the preference and open-ended styles.  Here ya go:

  • While I appreciate the points made by both ______ and ____, I strongly believe that _____.
  • This is because _____.
  • For example, ______.
  • Some people may argue that _____, but they overlook the fact that ______.

Pearson’s Sarah Hughes gave a great webinar yesterday called “The Role of Automated, AI, and Human Scoring” (link).  She discussed the ways in which Pearson spots what they call “Topic Templates” in the writing section of the PTE and what is done after they are detected. It was my first time to hear the term “topic templates” (as opposed to just “templates”), which is meant to distinguish between the memorization of typical discourse phrases and the memorization of a whole lot of generic junk into which a few topic-related phrases are plugged.

Pearson’s primary line of defense against topic templates seems to be a database of such templates created by a bot that crawls the web now and then.  When signs of a template are detected in a student response, that response is sent to a human rater (or raters?) to determine if the answer is acceptable or should be given a score of zero.  Note that most of the time human raters are not used to score the PTE writing section, which is handled entirely by AI.

I know I sound like a broken record every time this comes up, but I’d love to hear more about the detection of templates that aren’t widely circulated online.  I’ve linked to it a million times, but Sugene Kim’s article about preparing for the TOEFL test “Gangnam Style” is required reading for anyone interested in this topic.

The article describes how, here in Korea, students prepping for a test hire a hotshot tutor – one mentioned in Kim’s article is nicknamed “The Writing Sniper” – to craft a handful of bespoke templates just for them.  The students don’t get just the templates, of course, but also receive weeks of lessons about how to use them most effectively.  Needless to say, the templates don’t show up in any databases possessed by the testing firms.

When Kim’s article was published I had some fun creating my own topic templates for this task.  They fooled a lot of people with experience in the industry. You can read about my fun in a four-part series of blog posts starting over here.

The Gangnam approach somewhat suited the old TOEFL independent writing task, which had a lot of pretty formulaic questions like:

“Do you agree or disagree with the following statement? It is better to use printed materials such as books and articles to do research than it is to use the internet. Use specific reasons and examples to support your answer.”

It’s a bit less suitable for the IELTS, where the questions are sometimes (but not always) a bit more intricate.  Like:

“In spite of the advances made in agriculture, many people around the world still go hungry.  Why is this the case?  What can be done about this problem?”

The PTE has also has somewhat intricate questions like:

“Tobacco,  mainly in the form of cigarettes, is one of the most widely-used drugs in the world.  Over a billion adults legally smoke tobacco every day.  The long term health costs are high – for smokers themselves, and for the wider community in terms of health care costs and lost productivity.  Do governments have a legitimate role to legislate to protect citizens from the harmful effects of their own decisions to smoke, or are such decisions up to the individual.”

Obviously the effectiveness of a topic template is blunted by the various aspects that the PTE and IELTS prompts the test taker to touch on.  The PTE adds a firm word count limit to the mix which could further complicate things.

But the tests are still vulnerable.  The hotshots are good at what they do.  As they should be, considering the sky-high fees they sometimes command.

With that said, one is left wondering how the test makers detect templates that are not widely shared online and which are not reused by multiple students across numerous tests administrations.  Without a relevant database and without human raters who might notice their stilted nature, Pearson likely requires an AI solution specifically designed to spot them.  Is that part of the mix?

The continued use of human raters to score the IELTS might be an advantage, but I haven’t heard much from the IELTS partnership on this topic, even when it comes to widely circulated templates.  Do they supplement the expertise of their humans with a database of templates each essay is compared to?  Or are they entirely dependent on humans?  Humans are good… but are they good enough to beat The Writing Sniper?  That’s unclear.

I think it is also worth mentioning that such templates are best used by students with intermediate language skills who want to pass themselves off as advanced students. They have the ability to “fill in the blanks” of the templates with more than just a topic keyword, but with decent clauses or sentences. This makes detection trickier than you might expect.

Questions of equity linger in the back of my mind, as well.  Do current detection methods focus on low-hanging fruit?  Are they good at detecting low or no-budget test preppers who use stuff they find on social media but poor at stopping the techniques favored by preppers with a few thousand bucks to spend before taking a test?

Do share your own thoughts, if you have a moment.  More on this in the future.

Express scoring for the TOEFL is available again.  Test takers who pay a fee of $149 will receive their scores within 24 hours of taking the test.  Otherwise, scores are reported in 4-8 days. This option first appeared near the end of 2024, but was quickly withdrawn. At that time the fee was $99.

Interestingly, this option only appears when the test is to be taken at a test center. I don’t see it when attempting to book an at-home test.

Having this option is better than not having it.  But note that IELTS scores are now delivered in 1-2 days without an extra fee and that Pearson promises to deliver PTE scores in 2 days without asking for any additional payment either.

There is a wonderful new article in Language Testing Journal by Emma Bruce, Karen Dunn and Tony Clark which explores test score validity periods for high-stakes tests.  It isn’t in open access, though, so you’ll need institutional access or a healthy billfold to read it.

As most readers know, institutions and regulatory bodies generally won’t accept scores from tests taken more than two years ago.  This is based on research and advice from test makers, though the authors note that:

“While the role of test providers and language testing researchers is not to set the policy for test score use, it is becoming apparent that the messaging surrounding validity periods may benefit from consideration through a contemporary lens. While it is certain that test developers have a responsibility to communicate the idea that the fidelity of a test score in reflecting test-takers’ language proficiency may change over time depending on the circumstances of the test-taker in the period between taking the test and using the score, it is of equal import to communicate–especially to policymakers–the possibility of adapting the 2-year requirement according to risk or need in any given setting.”

Unmentioned is the fact that even if institutions desire to accept scores that are older than two years, it can be exceptionally difficult to actually receive those scores.  Correct me if I’m wrong, but I believe that none of the big four tests (TOEFL, IELTS, PTE and DET) allow test takers to send scores to recipients more than two years after a test date. In this way, it seems like the test makers are semi-enforcing a two-year validity period. I can’t even view the scores from my 2022 attempt at the TOEFL within my account on the ETS website.

After I return from my holiday, I will probably take the Duolingo English Test. Let me know if there is anything I should keep an eye out for. I’ve taken this test in the past, but not since the secondary camera requirement was introduced. I’m curious to see how that feels. I haven’t experienced the latest round (several rounds?) of item revisions either.

I’d like to take the TOEFL Essentials Test. A few days ago I was caught flat footed when someone asked me to help them prep for it. And I fear that one day the test will disappear and I’ll miss my chance.

I’m curious about how Pearson does at-home testing.

Leave a comment if there are any other tests I should try to check out. In 2024, I took the following tests:

  • Password Plus
  • Skills for English SELT
  • TOEFL iBT
  • PTE Core
  • PTE Academic
  • EnglishScore (all 3)
  • MET
  • LANGUAGECERT Academic

IDP Education has joined the discussion on “templated responses.” Australasia/Japan head Michael James noted in an article shared to LinkedIn that:

“AI’s role in high-stakes language testing has gained attention recently, particularly after a computer-marked test revised its scoring process to include human evaluators. This change has ignited a debate on this platform about a computer’s ability to identify templated responses.”

James points out that:

“The importance of human marking in high-stakes English language assessment cannot be overstated. IELTS examiners are highly trained language experts who bring a nuanced understanding and contextual awareness that AI systems lack. They can discern not only the grammatical correctness and structural integrity of a response, but also the underlying intent, creativity, and coherence of the content. This real-time, human-centred approach aims to reveal a student’s true abilities and potential.”

His work refers to the “cautiously curious approach” that the IELTS partnership has used in the past to describe its approach to AI.

There is more worth quoting here, but it is probably best to check it out yourself at the link above.

Moving forward, I would love to hear more about the humans who do this sort of work. Not just the humans who rate IELTS responses, but those who rate responses in all sorts of tests. Who are they? What makes them “highly trained experts”? How do they discern X, Y, Z? Are they under pressure to work quickly? These are questions asked by not only score users, but (more important and more frequently) by test takers themselves.

Wrapping up this series on “templated responses” I want to share a few paragraphs recently added to the ETS website (via the new FAQ for TOEFL):

Some test takers use templates in the speaking or writing sections of the TOEFL iBT test. It can be considered a kind of “blueprint” that helps test takers organize their thoughts and write systematically within the limited time when responding to the speaking section or composing an essay.

However, there are risks to using templates. They can be helpful to establish a general structure for your response, but if they do more than that, you’re probably violating ETS testing policies. The test rules are very strict that you must be using your own words in your responses, and not those from others, or passages that you have previously memorized.

ETS uses detection software to identify writing passages that are similar to outside sources or other test takers’ responses. If you use a template, there is a high probability of providing responses similar to those of other test-takers. So we strongly recommend that you produce your own responses during the test itself.

This is significant, as is perhaps the first time ETS has directly referenced “templates” in a communication to test takers.  The TOEFL Bulletin has long contained references to “memorized content,” but that’s not quite the same thing.

The verbiage may be tricky for some to fully grasp.  Templates are “helpful to establish a general structure” but test takers “must be using [their] own words” in their responses.  When does a template cross the line from being helpful to being a violation of ETS testing policies?  That’s not immediately clear.

However, John Healy reminded me to check out “Challenges and Innovations in Speaking Assessment,” recently published by ETS via Routledge.  In an insightful chapter, Xiaoming Xi, Pam Mollaun and Larry Davis categorize potential uses of “formulaic responses” in speaking answers and how they should be viewed by raters.  Those categories are:

  1. Practiced lexical and grammatical chunks
  2. Practiced generic discourse markers
  3. Practiced task type-specific organizational frames
  4. Rehearsed generic response for a task type
  5. Heavily rehearsed content
  6. Rehearsed response

I think these categories are self-explanatory, but here are a few quick notes about what they mean:

  1. Formulaic expressions (chunks of sentences) stored in the test taker’s memory.
  2. Stuff like “in conclusion” and “another point worth mentioning…”
  3. Stuff like “The university will ___ because ____ and ____.  The man disagrees because ___ and ___.”
  4. Like category three, but without blanks to be filled in.
  5. Like number 1, but the content “[differs] from formulaic expressions present in natural language use.”  This content is “produced with little adaptation to suit the real task demands.”
  6. A response that is “identical or nearly identical to a known-source text.”

Dive into the chapter for more detailed descriptions.

It is recommended that categories 2 and 3 be scored on merit, that category 1 be scored on merit if the chunks do not match known templates, that spontaneous content in categories 4 and 5 be scored on merit (if any exists), and that category 6 be scored as a zero.

Very sensible guidelines.  But putting them into use?  The chapter notes:

“Unfortunately, the guidelines did not remove the challenge of detecting what particular language has been prepared in advance. Certain types of delivery features may be suggestive of memorized content, such as pausing or other fluency features, or use of content that appears ill-fitting or otherwise inappropriate. Nonetheless, it can be quite challenging to detect formulaic responses, especially if the response references a previously unknown source text.”

According to the article, ETS experimented with supplying raters with examples of memorized content to refer to while making decisions, but that didn’t work out for a variety of reasons. Raters became overly sensitive to memorized content. The rating process became too slow.

The authors wrap up the article by making a few suggestions for future study, including redesigned item types and AI tools.

To me, AI tools are a must in 2024, both in terms of correctly identifying overly gamed responses and avoiding false positives.  A quick glance at Glassdoor reviews suggests that response scorers (of a variety of tests) are often low-paid workers who sometimes feel pressure to work quickly.  Tools that help them work more efficiently, accurately and swiftly seem like a good idea.

It is worth sharing a few notes from a long article published by Pearson in the summer.  It provides more information about the topics discussed in Jarrad Merlo’s webinar about the introduction of human raters to the PTE tests.

The article describes how a “gaming detection system” has been developed to aid in the evaluation of two of the speaking questions and one of the writing questions on the PTE. This system gives each response a numerical score from 0 to 1, with 0 indicating the complete absence of gaming and 1 indicating significant evidence of gaming.

This numerical approach seems wise, as the lines between acceptable and unacceptable use of “templated responses” in a response are sometimes blurred.  In a future post, I’ll summarize some research from ETS that discusses this topic.

Meanwhile, Pearson’s article notes that:

“While more rudimentary systems may rely on a simple count of words matching known templates, PTE Academic’s gaming detection system has been designed to consider a number of feature measurements that quantify the similarity of the response to known templates, the amount of authentic content present, the density of templated content, and the coherence of the response”

The article goes on to describe how the results of the AI checks are passed along to human raters, to aid in their decision-making regarding the content of responses.  It notes that the newly-implemented system:

“enables raters to make better informed content scoring decisions by leveraging the Phase I gaming detection systems to provide them with information about the about the [sic] extent of gaming behaviours detected in the response.”

That’s fascinating.  I’m not aware of another system where human raters can make use of AI-generated data when making decisions about test taker responses.

The article notes that human raters will not check all written responses for templated content.  Checks of most responses will be done entirely by AI that has been trained on a regularly-updated database of templates discovered via crawls of the web and social media.

A challenge with this approach that goes unmentioned is the difficulty of detecting templates that don’t show up on the public web.  In my neck of the woods, students pay high fees to “celebrity” test prep experts who create personalized templates that are neither shared publicly nor repeated by future test takers.  This came up in an article by Sugene Kim which I’ll share in the comments.

Perhaps Pearson should go whole hog and bring in human raters for some or all responses in the writing section as well.

More on this in the days ahead.

I have updated my TOEFL writing templates for 2021. In the attached video, you’ll find templates for both the independent and integrated essays.  I’ve adjusted them only slightly for this year… but I think they are a bit better than the 2020 versions.  I’ll probably make a video containing all of the 2021 speaking templates as well, so keep an eye on the channel.

Over the next few days I will adjust all of the static webpage articles so that they include the new templates.

Ha ha.  I am a TOEFL essay machine now.  This took about three minutes to create using my fake essay template, and I think it looks pretty decent.

The prompt is:

Do you agree or disagree with the following statement? It is better for children to grow up in the countryside than in a large city. Use specific reasons and examples to develop your essay.

The Essay is:

A lot of people today think that we should live in the city.  However, I strongly believe that it is much better for kids to live in the country for two reasons.  First, it leads to a lot of great job opportunities.  Second, it vastly improves our health and wellbeing, which a lot of people are struggling with nowadays.  To be fair, a lot of older people have the traditional view that cities are the best place for young people to live.  That said, I think this viewpoint is outdated and quite useless in today’s society.

First, life in the countryside can improve our range of job opportunities in the future.  As I implied above, people my parent’s age (and older) think that living in the countryside is actually quite dangerous.  When I was young and they had a lot of influence over my world view, I actually had the same opinion.  At that time, I thought the lack of businesses in the country would actually make it harder for me to get a job, and so I was hostile toward it.  However, after I entered college and my social network broadened, I realized the unique benefits of rural life.  Now I realize that the presence of agriculture can help us find employment in high paying fields.  For example, my young cousin makes a lot of money because he works in a field related to growing organic crops.  His experience changed my perspective, and now I am focusing on farming at university in the hope of achieving the same thing.

Second, life in the countryside has a noticeable effect on our physical health and maybe even our mental health.  I actually read a story about this in the Village Voice Newspaper a few months ago.  It pointed out that if we properly use hiking trails we can avoid the poor health that a lot of people are dealing with nowadays.  The article claimed that 75% of Americans think that the best way of staying fit is making use of rural sports.  Medical experts who reviewed the study results agreed, and suggested that rural lifestyles will have an even greater impact in the future because of the clean air in the countryside.  Consequently, I strongly feel that benefiting from life away from crowded cities is a fantastic way to stay healthy.

In conclusion, I think that it is best for young people to live in the countryside.  This is because it can lead to gainful employment, and because it has a positive impact on our minds and bodies.

(you can also read parts one, two,  and four of this series!)

Okay, I’m having fun with the Gangnam style TOEFL template I generated yesterday.  This time I tackled the second prompt in my collection.  Obviously it has a lot of overlap, since both deal with the Internet.  Next time I think I will delete the final sentence from the introduction. It lays the template on a bit too thick.  I’ll replace it with nothing, and just jump to the body after the thesis statement.

The prompt is:

Do you agree or disagree with the following statement? It is better to use printed materials such as books and articles to do research than it is to use the internet. Use specific reasons and examples to support your answer.

The “fake essay” is:

A lot of people today think that using online materials for research is a bad idea.  However, I strongly believe that using the Internet for research is wise for two reasons.  First, it leads to a lot of great job opportunities.  Second, it vastly improves our health and wellbeing, which a lot of people are struggling with nowadays.  To be fair, a lot of older people have the traditional view that websites are unreliable.  That said, I think this viewpoint is outdated and quite useless in today’s society.

First, using the Internet for researching topics can improve our range of job opportunities in the future.  As I implied above, people my parent’s age (and older) think that the web is actually quite dangerous.  When I was young and they had a lot of influence over my world view, I actually had the same opinion.  At that time, I thought relying on unreliable online sources would actually make it harder for me to get a job, and so I was hostile toward it.  However, after I entered college and my social network broadened, I realized the unique benefits of cutting edge research that is published online.  Now I realize that learning about the latest academic developments online can help us find employment in high paying fields.  For example, my young cousin makes a lot of money because he works in a field related to crypto-currency.  His experience changed my perspective, and now I am focusing on emerging web-based technologies at university in the hope of achieving the same thing.

Second, medical websites have a noticeable effect on our physical health and maybe even our mental health.  I actually read a story about this in the Village Voice Newspaper a few months ago.  It pointed out that if we properly use websites that report on health trends we can avoid the poor health that a lot of people are dealing with nowadays.  The article claimed that 75% of Americans think that the best way of staying fit is making use of the Internet.  Medical experts who reviewed the study results agreed, and suggested that websites will have an even greater impact in the future because of the number of doctors who are online.  Consequently, I strongly feel that benefiting from online research is a fantastic way to stay healthy.

In conclusion, I think that researching online is beneficial.  This is because it can lead to gainful employment, and because it has a positive impact on our minds and bodies.

(you can also read parts one,  three and four of this series!)