The second “Office Hours” chat about the upcoming TOEFL changes was a big success. About 75 interested test-prep folks showed up, and most stuck around for the whole hour. I’ll try my best to host a third one once the practice tests have been published. Probably July 16.

A few notes from the community:

  1. Many attendees expressed a desire for more clarity on how adaptive testing will work on the TOEFL. I sense that people want assurances that this change will be fair to students and comprehensible to people preparing them for the test. ETS should spend some time on this issue as the launch date approaches.
  2. Many people are still wondering if the revised TOEFL will contain integrated speaking and writing tasks. In some ways, integrated tasks are the TOEFL’s bread-and-butter, setting it apart from competing products. But on the other hand, eliminating them (or at least their current incarnations) could create a faster, cheaper and more streamlined test form.
  3. My friend Pamela Sharpe, who has been preparing students for the TOEFL since 1970, was in attendance. I asked her what she thinks the biggest revision to the TOEFL has been to date. Her response? Maybe this one.
  4. Most people are pretty enthusiastic about the new score scale. Few believe there is a meaningful difference between a kid with a score of 102/120 and a kid with a score of 104/120.
  5. Many people are wondering if the free practice tests from ETS will be adaptive. While the TOEFL Essentials test is adaptive, the practice tests provided by ETS are not.
  6. Some concerns regarding equity in test prep were raised. Right now, no-budget test takers can wander down to their local library and get copies of the official books to prepare for the test. That will no longer be the case in January, when test takers will be more dependent on paid prep products, which can be costly.
  7. Likewise, there were a few concerns about the accuracy of the practice tests set to be released next month. Older teachers remember how flawed some material was when the TOEFL iBT launched in 2005. This could impact our plans to prepare for the revisions in a timely manner.
  8. A few attendees expressed hope that the test will include global English accents.
  9. We talked about whether the TOEFL might become a regularly adjusted test, like the DET. Some attendees figure this would be great for students. Others were not so enthusiastic.
  10. We talked a lot about TOEFL Essentials and TOEIC and how early reports suggest that this test will share item types with those tests. Like it or not, people are starting to believe that this is a glammed-up TOEFL Essentials test. Mostly because of the way it was presented to test prep firms in China a few weeks ago. If ETS views this sentiment as problematic, they ought to nip it in the bud ASAP.

Check out this wonderful new article in TESOL Quarterly by Yasin Karatay and Jing Xu.  It explores the possibility of using an AI-powered spoken dialog system to simulate an IELTS examiner.  Specifically, the researchers used a self-developed SDS powered by GPT-4o to simulate the third part of the IELTS speaking test.

It is important to note that in this research the AI served only as the interlocutor.  Ratings were carried out by trained humans who reviewed recordings of the interactions.

The authors concluded that “the SDS consistently elicited some key interactional competence features, as seen in face-to-face oral proficiency interviews, and that such features were useful in distinguishing between higher- and lower-proficiency test takers.”  They also highlighted areas for improvement around non-verbal clues and some unnatural interactions.

There is a lot more to it, of course.  So take a moment to read the article.

Though the IELTS partnership takes a “cautiously curious” approach to the use of AI in testing, this may be an area worth exploring in more depth.  Some observers have raised concerns about the ability of the partnership to maintain consistent standards across four million annual speaking tests, each carried out by an individual examiner.  It is possible that at least some of these administrations are impacted by things like bias, fatigue and ability in the examiner population.  It’s no secret that IELTS test takers frequently travel to test centers which they feel are more conducive to a higher speaking score. The IELTS partnership is quick to dissuade test takers of this notion, but it isn’t inconceivable that one examiner might be better at their job than another.  Or that one might possess certain biases which another does not and that such differences could impact how the speaking test unfolds.

On top of that, this sort of change could lead to immense cost savings for the organizations that administer the IELTS (even if human raters are retained).  Some of those savings could be passed on to test takers, if only to make the IELTS more competitive in an increasingly crowded market of tests.  That’s a touchy subject so I’ll leave it for another day, but I think most readers can imagine the possibilities.

Finally, it is worth noting that this research was done by researchers at Cambridge University Press and Assessment.  So maybe this could lead to something.

You may also appreciate this similar research funded by ETS back in 2021.  Based on that work, I’m still optimistic that an AI-powered SDS system will find its way into some future version of the TOEFL.

The Educational Testing Service (ETS) has published a few charts illustrating how the new 1-6 TOEFL scoring scale will work and how to convert between old (1-120) TOEFL scores and new (1-6) TOEFL scores. As illustrated below, the scale will increase at increments of 0.5. It has been confirmed that the overall band score will be calculated by averaging the four section scores and rounding to the nearest 0.5.

Notably, ETS has also confirmed that the current 0-120 scale will be eliminated entirely starting in 2028.

A pleasant feature of the new scale is that it will provide more consistency when linking section scores to the CEFR. Currently that is somewhat confusing. For instance, a score of 22 in the TOEFL listening section represents C1 fluency, while a score of 22 in the reading section represents only B2 fluency. Come January, this will no longer be an issue.

I would be remiss if I didn’t link to Nicholas Cuthbert‘s video from DETcon London, since it generated quite a lot of spirited discussion.  Notably, Nicholas describes Duolingo as “ahead of the game.”  And also says:  “Duolingo are winning.”

Hyperbole?  Perhaps.  Duolingo is certainly on a winning trajectory in terms of acceptance in key receiving markets, market share, brand awareness, test taker engagement, use of technology, and baffling social media campaigns.  But it is important to note that IELTS still does more test administrations than everyone else combined.  There are still people paying $530 to take an IELTS test.  IELTS will be the market leader for many years to come.  Accordingly, there is still plenty of time for the IELTS partnership to develop a “next-gen” IELTS that eats Duolingo’s lunch.

Heck, Pearson is hoping to do just that in about four months.

Recent messaging from the British Council and Cambridge University Press & Assessment suggests that the IELTS partners plan to double-down on their more traditional approach to assessment.  It seems like they don’t plan to change the way they assess students, but instead encourage score users to more carefully consider which tests they choose to accept.

Regardless, I’m convinced that Cambridge has got top people working on… something.  Check out yesterday’s article in TESOL Quarterly, for instance.

A well-informed industry watcher recently expressed some incredulity that LLMs haven’t totally disrupted the high-stakes language testing sector.  He was shocked that people still pay hundreds of dollars to take a test.  My response was that this stuff takes time.  Everyone knows that university governance is a slow process.  Immigration regulations are even slower.  But things might finally be coming to a head – coincidentally both ETS and Pearson announced the existence of their “next-gen” (my term) tests at NAFSA a few weeks ago.  Things are moving a tiny bit faster now.

By the way, you must get to one of these DETcon events if you get the chance.  They are a charming combination of research presentations, community building and Duolingo’s trademark irreverence.  I understand that at the most recent Pittsburgh-based event, Duolingo CEO Luis von Ahn was subjected to an unannounced Yinzer Test.  Not sure what that is, but I suspect it is similar to a Voight-Kampff Test. In any case, the results have not been shared publicly.

I read a real grab bag of stuff this month.  It was a good month.

First up, I read the June 2025 issue of National Geographic.  It contained an excellent article called “Could Beavers be the Secret to Winning the Fight Against Wildfires?”.  It explores some positive impacts of beaver dams.  There is also some good stuff here about the history of beavers in the USA.  This is great and accessible academic reading practice.  I read this while sitting in the courtyard of the Doksan Public Library (in Geumcheon).  A nice place to chill for an hour, if you are ever in the area.

Next, I read the May 12, 2025 issue of Time Magazine, which included a great article called “The Return of the Dire Wolf.”  This one is about the science and ethics of resurrecting extinct species.  This topic would make a perfect integrated writing question… or an even better academic discussion question.  I have already added it to my “to-write” list.

I also read the February 22, 2024 issue of the London Review of Books.  It included a great long article about Linnaeus, his life and his classification system.  Perfect TOEFL reading practice.

Moving along, I read Kate Chopin’s novel “The Awakening,” as part of my journey through the Norton Library Podcast.  This one is a bit more accessible than most books covered by the podcast, so check it out if you are interested in reading some classic American literature.  The podcast episode is here.  And you can find a cheap copy of the book on Amazon.

Finally, I engaged my particular brand of madness and read “Cambridge English Exams – The First Hundred Years.”  Yeah… this is a detailed history of the first 100 years of the Cambridge English Exams.  Pretty cool if you are into that sort of thing.  If so, you can read it for free via Cambridge.

That’s all for now, but check back for some more silliness next month.

 

I often get questions about how timers work on the TOEFL test.  So here’s a quick summary.  These details will be accurate until the TOEFL Test changes on January 21, 2026.

Reading

  • There is one 36-minute timer for the whole reading section.  You will have 36 minutes to read both of the articles and answer all of the questions.

Listening

  • There are two separate timers in the listening section.
  • One of the timers is 10 minutes.  You will have 10 minutes to answer 17 questions about two lectures and one conversation.  The timer only counts down when you are answering questions.  It does not move while you are listening to the lectures and conversation.
  • The other timer is 6.5 minutes.  You will have 6.5 minutes to answer 11 questions about one lecture and one conversation.  The timer only counts down when you are answering questions.  It does not move when you are listening to the lecture and conversation.

Speaking

  • Question One:  After you hear the question you will have 15 seconds to prepare and 45 seconds to speak.
  • Question Two:  You will have 45 or 50 seconds to read the announcement.  Then you will listen to a conversation.  Then you will have 30 seconds to prepare and 60 seconds to speak.
  • Question Three:  You will have 45 or 50 seconds to read the article.  Then you will listen to a lecture.  Then you will have 30 seconds to prepare and 60 seconds to speak.
  • Question Four:  You will first listen to a lecture.  Then you will have 20 seconds to prepare and 60 seconds to speak.

Writing

  • Question One: First, you will have 3 minutes to read the article.  Then you will listen to a lecture.  Then you will have 20 minutes to write your response.  The article will be visible as you write.
  • Question Two:  You will have 10 minutes to read everything and write your response.

In all sections, timers only start after instructions have been given.  There are no breaks.

The cost of taking the TOEFL Essentials Test was recently increased to $199 USD. That’s about a 100% increase. In a few markets, the TOEFL Essentials is now more expensive than the TOEFL iBT.

It’s a curious move. I can’t really figure it out. Perhaps someone with a bigger brain than I can explain it.

The TOEFL Essentials test was launched in May of 2021 as a cheaper, shorter and wholly at-home alternative to the TOEFL iBT. At the time, most people assumed its development was a response to the growing prominence of the Duolingo English Test. It never really took off, though. That’s partly because the number of accepting institutions was low and also because it was still twice as expensive (and twice as long) as the DET.

One of the new items developed for the test (the “writing for an academic discussion” task) was folded into the TOEFL iBT in 2023. It appears that a few more of its items (some also shared with the TOEIC) will be added to the iBT in January.

Authorities in Japan recently busted a 27-year-old university student for allegedly cheating on the TOEIC (story here and here).  The Japanese police, acting on a tip, sent undercover agents to a local testing center and nabbed the suspected cheater. Apparently he was planning to use a microphone to communicate answers to others in the room. As he was being dragged away by the cops, 30% of the test takers in the center decided to just go home. 

A few thoughts come to mind:

  1. Obviously, test-center administrations are not automatically better than at-home administrations (or vice versa).  Both approaches have potential weaknesses.
  2. When a particular test center is deemed to have poor security, unscrupulous test takers will flood into that test center to take advantage of it.  Some of these test takers will be domestic, and others will be from abroad.  Frequent audits are necessary to ensure that test centers maintain rigorous standards.  Mystery test-takers should be utilized as well.
  3. Test makers care a lot about security at test centers.  However, not all test-centers are self-operated.  Test makers depend on their partners to uphold their standards.  Frequent audits are necessary to ensure that they do so.
  4. Paper tests create extra challenges since many people (sometimes everyone) in the room have the same test form.  Computer-based delivery can eliminate this concern.
  5. Recent moves by the IELTS partnership to limit use of the paper-based IELTS may be due to some of the aforementioned points, but I suppose they’ll never say.

The Higher Education Policy Institute has published an article written by a managing director at Cambridge University Press & Assessment about what is described as “the shift to remote language testing that removes substantial human supervision from the process.”

Notes the author:

“While some may be excited by the prospect of an “AI-first” model of testing, we should pursue the best of both worlds – human oversight prioritised and empowered by AI. This means, for instance, human-proctored tests delivered in test centres that use tried and proven tech tools.”

And:

“Cambridge has been using and experimenting with AI for decades. We know in some circumstances that AI can be transformative in improving users’ experience. For the highest stakes assessments, innovation alone is no alternative to real human teaching, learning and understanding. And the higher the stakes, the more important human oversight becomes.”

Cambridge has been pushing back against newer tests with a bit more forcefulness in recent months (see also the “The impact of English language test choices for UK HE” report).

To my eye, the debate between at-home testing vs on-site testing is over, with supporters of at-home scoring a decisive victory.  Indeed, Cambridge’s own at-home IELTS is widely accepted at schools across key receiving markets.  But more importantly, it seems that most test takers really like the idea of at-home testing.  Many of those who forgo it in favor of an on-site test do so out of fears that the maker of their chosen test stinks at delivering a seamless at-home product – not because of some love of the test center experience.  As test makers get better at doing at-home testing, more test takers will pile into that option.

There might still be room for a robust debate about the merits of synchronous online proctoring (that is, a proctor watches as you take the test) vs asynchronous online proctoring (a proctor watches a video of your test after the fact).  But maybe that debate will soon reach a conclusion as well.  Note that Pearson seems to be going the async route in their new PEE Test, and that ETS will offer an async option in the new TOEIC Link Test (which is being pitched to higher-ed as an admissions test).  These developments suggest that the writing is on the wall for live proctors.  Indeed, I was a little surprised to learn that ETS will maintain them as part of the revised TOEFL set to launch in early 2026.

Here’s something I’ve been meaning to post since forever.  It’s a score report from the TEPS Test in Korea.  This report is actually from a test taken by Mrs. Goodine, though I’ve replaced her photo with a picture of the comic strip hero Sally Forth.

The TEPS test has been offered in Korea since 1999, though nowadays it’s more interesting as a conversation piece than as a testing option.  Basically, it was developed in the 1990s partly in response to concerns that the national craze for TOEFL and TOEIC scores was resulting in a significant outflow of currency to the USA (a concern exacerbated by the 1997 Asian Financial Crisis).   The test was a hit in its early years.  According to some reports, it was taken more than half a million times a year by the late aughts.  But it has declined in popularity since then – Naver tells me that it was taken just about 83,000 times in the final pre-pandemic year.  The decline is likely a result of the wide range of low and medium stakes exams now on the market. In recent years, for instance, the G-TELP has exploded in popularity.

What makes the test fun is that it bears quite a close resemblance to the TOEFL of the 1990s (and, to a lesser extent, the TOEIC). This is an English test with multiple choice grammar and vocabulary questions… but no speaking and no writing. The test format was revised (and shortened) in 2018, but otherwise this looks a lot like an old-school 1990s English test.  That could have something to do with the decline as well, now that I think of it.

Homegrown tests are probably a good idea as foreign tests can result in significant amounts of cash taking a one-way trip overseas.  But such tests probably need a bit more TLC than this one has gotten over the years. Speaking of this topic, I’d be happy to hear about the successes and failures of the VSTEP test out of Vietnam if anyone feels like sharing.

Anyway.  There ya go.  A quick look at the TEPS.  If anyone should have any questions (why?!) I’ll be happy to answer them.

I read that IELTS UKVI will go fully computer-based in Bangladesh starting July 27. After that date, the paper-based version will no longer be offered. According to this article, “[t]he decision to shift the IELTS for UKVI entirely to computer-based format has been taken internationally.”

Some other countries still allow test takers to book paper-based tests beyond that date, but perhaps they are on different schedules.

I’m curious if the upcoming HOELT test will include a paper-based version. Paper-based testing makes tests more accessible and equitable… but some have raised concerns related to test security.

Dan Isbell has written a guest post about washback in English test preparation for the Duolingo English Test blog.  It discusses preparation for the DET in particular, and for English tests in general.  Isbell divides test prep into three types:  activities that improve your English in general, activities that help you perform better on a particular test (test familiarization), and activities that help you game a particular test (templates and guessing strategies, for instance).

I understand that a full report on this topic is forthcoming.  I will add a link when it is available.

It’s an interesting thing to explore.  Test preparation always includes at least some good washback.  No matter what test they are preparing for, most test takers complete at least a few practice tests.  As a result, they will spend time consuming stuff in English and producing stuff in English.  This is good. But does it have a really meaningful impact on their fluency in the language?  I don’t know.

The TOEFL iBT contains two 800-word articles which are excerpted from actual textbooks.  Students here in Korea take their preparation pretty seriously and might complete 20 or 30 practice tests before test day (or between several test days).  That means they spend a lot of time reading some pretty dense material in English.  Does that improve their fluency?  Of course.  Does it improve their fluency a lot?  I don’t know.

My test prep niche is writing.*  It gives me great joy to know that my students walk away from their lessons with a noticeably stronger command of English grammar and language use conventions.  But, needless to say, there are faster and more economical ways to learn about sentence fragments and collocations.

Does all of this test prep mean that students spend less time on more useful and effective language acquisition approaches?  Maybe.

Is it the job of a test maker to give a darn?  Or is their only job to accurately measure language fluency? I don’t know.

A few stray thoughts come to mind:

  1. I’m interested to know how the age of a test impacts the way that students prepare for it.  As a test ages, people working in test prep become more and more familiar with the design of that test and can use that knowledge to develop better and extremely granular type 2 strategies.  Elderly readers might recall that in the early years of the TOEFL iBT we had just one official book (badly written) and a handful of books from third party publishers (even worse) to go by.  We didn’t know very much about the design specifications of test items, nor about how speaking and writing items were scored.  Things are obviously much different now. Now we know almost everything there is to know. We know so much nowadays that it might be malpractice to not spend quite a lot of time on test familiarization strategies.  Should tests be meaningfully refreshed on a regular basis to mitigate the impact of this factor?
  2. I love reading about the early history of the Princeton Review.  That firm emerged in the early 1980s when the SAT was long in the tooth and probably at its peak terribleness.  Princeton Review taught students how to eliminate answer choices without actually reading questions.  They also taught students how to recognize unscored sections so they could enjoy a refreshing nap part way through the test.
  3. In 2019 Malcolm Gladwell and his assistant both took the LSAT for an episode of his “Revisionist History” podcast.  They got coaching from the one and only John Katzman beforehand.  The point of the episode is that time management (type 2) is the most important thing when it comes to getting a good score on this test. It made LSAT tutors really cranky.

*See also:  “The Whale,” 2022.

Prometric and the IELTS partners have just published a concordance study comparing the CELPIP and IELTS-General tests.

It is a very nice study.  I just want to mention that of the 1089 participants, seemingly not a single one earned an IELTS writing score of 9.0.  Two participants earned a score of 8.5.  This is the fourth concordance study in a row involving IELTS in which not a single person reported a perfect writing score.  I don’t know if that’s meaningful,  but it amuses me.