I saw that Pearson recently wrapped up the first “PTE Global Nursing Pathway Expo” in the Philippines.  I stole the below picture from LinkedIn. 

One of the strengths of Pearson is its ability to identify and foster niche(ish) use cases where the PTE has room to grow (and quickly). 

They do this, partially, by going to the people.  Meeting them where they are, and all that.  Other testing firms could follow their lead.

As I’ve written a few times here, from 2022 to 2024 the percentage of nurses submitting PTE scores to the CGFNS (which does visa screening for nurses who wish to head to the USA) increased from 7% to 50%. The percentage submitting IELTS scores decreased from 84% to 35% in the same period.

The long, long, long awaited Cambridge study about test use in the UK is now available.

For all the heavy lifting the preliminary results have been doing in IELTS marketing over the past year, there is surprisingly little in here about the value (or lack thereof) of specific new tests. That’s not meant to be a criticism, of course. The authors of the study seem to have had a higher purpose.

That’s not to say it is totally bereft of that sort of thing. The study contains statements like this:

“There is a notable divergence in the perceived value of various tests among different groups within institutions. While tests like Cambridge Qualifications are praised for their ability to prepare students for academic study, others, such as the Oxford International Education Group’s (OIEG) ELLT and Duolingo, are viewed with scepticism. Specific concerns were raised about the validity, security, and overall suitability of these newer, more efficient or less established tests. One survey respondent expressed dissatisfaction with ‘the recent decision to accept OEIG’s online ELLT for the China market only (in order to boost recruitment)’ due to its lack of credibility and associated security concerns.”

And this:

“For instance, one respondent noted that ‘students who came with the Duolingo award were not in practice equipped to deal with HE life and study’, echoing concerns found in studies about the adequacy of such tests.”

And this:

“One of the most consistent findings is that IELTS is widely regarded as the international standard or ‘common currency’.”

One imagines that the IELTS partners will continue to lean on this research study when crafting marketing materials in the years ahead. Score users might be wise to keep in mind that the criticisms mentioned in the study are anecdotal and not presently supported by comparative data about actual student outcomes. Often, the statements seem to be based on the perspectives of very small numbers of individuals.

Apart from the above, most of the study highlights concerns about English fluency on campus (quite separate from the use of particular tests) and provides recommendations for how to properly assess the worth of new tests.

ETS has announced the new format for the TOEFL iBT. Below is a detailed rundown of what the test will contain starting January 21, 2026. Interestingly, the test will no longer contain integrated questions. Nor will it contain an essay task. As has been noted, this format is extremely similar to the existing TOEFL Essentials Test.  For all the details, start reading hereUpdate:  I made a video!

 

Reading Tasks (18-27 minutes)

  1. Complete the Words. This is a “fill in the missing letters” task, like on the Duolingo English Test.
  2. Read in Daily Life. Test takers read a non-academic text between 15 and 150 words like a poster, a menu, an invoice, etc. Then they answer multiple-choice questions about it.
  3. Read an Academic Text. This is a roughly 200-word academic text followed by five multiple-choice questions.

 

Listening Tasks (18-27 minutes)

  1. Listen and Choose a Response. Test takers hear a single sentence and choose the correct response from among four choices.
  2. Listen to a Conversation. Test takers hear a short conversation (ten turns in the sample) and answer multiple-choice questions about it. Topics include everyday life situations.
  3. Listen to an announcement. Test takers listen to a campus or classroom announcement and answer multiple-choice questions about it.
  4. Listen to an academic talk. Test takers listen to a short lecture (100 to 250 words) and answer multiple-choice questions about it.

 

Writing Tasks (23 minutes)

  1. Build a Sentence. Test takers unscramble a mixed-up sentence. The sentence is part of an exchange between students.
  2. Write an email. Test takers have seven minutes to write an email regarding a specific scenario.
  3. Writing for an academic discussion. Same as the current TOEFL.

 

Speaking Tasks (8 minutes)

  1. Listen and Repeat. Test takers listen and repeat (seven sentences).
  2. Take an Interview. Test takers will be asked four questions about a given topic. They will have 45 seconds to answer each one. No preparation time is provided.

 

The whole test will take between 67 and 85 minutes to complete.

ETS is being a little cagey with phrasing, but it appears that the revised test will be wholly scored by AI (which has been trained on human ratings). They note:

“The Speaking and Writing responses will be scored by the ETS proprietary AI scoring engine according to the criteria outlined in the scoring guides. These engines integrate the most advanced natural language processing (NLP) techniques, combining cutting edge research with extensive operational expertise for enhanced performance.”

And:

“Human rating remains a critical component of the overall scoring process of TOEFL’s Writing and Speaking tasks because the automated scoring engines are trained on human ratings. Human ratings not only set the standard for machine learning but also provide oversight to ensure the accuracy and reliability of our scoring.”

The UK Home Office has posted a fourth request for information regarding the HOELT.  As always, Polly Nash has written up all the key details in The PIE.  

Interestingly, according to the Home Office the updated request is being undertaken to “understand the viability of transitioning to a digital service model for English Language Testing” and more specifically “to gather market insights on newly available and emerging technology in relation to remote testing.”

That’s a bit of a shocker.  The original tender did not mention remote testing (nor did any of the earlier updates).

But even if this approach is deemed viable, the HOELT is unlikely to be wholly remote, as the tender also mentions “that there are 268 test centres operating across 142 countries globally.”

Which, by the way, is an oddly specific pair of numbers and a curious verb tense.  But maybe I’m missing something.

A new post on the British Council’s “Future of English” blog titled “Beyond the Score” conveys a now very familiar message.  Stuff like:

“…it’s important to recognise that the focus and robustness of assessment can vary significantly between different tests. Not all tests may evaluate the specific skills and language competencies needed for academic success in the same rigorous ways, while established tests such as IELTS (which the British Council helps deliver) continue to set the benchmark for trust and transparency.”

And:

“Furthermore, the concept of washback is crucial. This refers to the influence of the test on teaching and learning. If tests focus on a narrow range of skills or utilise formats that don’t reflect academic tasks (such as those often seen in shorter, computer-marked tests), this can lead to a narrowing of the curriculum and teaching practices, potentially disadvantaging students by not adequately preparing them for the full spectrum of academic language use.”

Like all of the other posts on this topic, the blog refers to the results of a single as-yet-unpublished survey of score users in the UK.

I share this mostly to emphasize how hard the IELTS partners are leaning into this message right now. There is an absolute ton of this sort of stuff coming out of Cambridge, the BC and IELTS HQ right now. That’s worth noting.

One might assume that this missive (and others before it) is a response to the growing influence of the Duolingo English Test. But it doesn’t seem like the IELTS partners are quite ready to specifically single out that test for criticism.

I really admire the folks at Cambridge and the British Council for having convictions and sticking by them. They truly believe that so-called “AI-First” tests are potentially harmful for learners and institutions. This isn’t just a marketing thing.

That said, this campaign isn’t going to work. The idea that persuasion and gravitas and one unpublished study can stand as a bulwark against AI-First testing is unconvincing.  We might ask all of the people who used to work at ETS how that approach fared in the USA.

Score users in the two biggest receiving markets (USA and Canada) are all-in on the Duolingo Test. The whole Ivy League accepts DET scores now.  All 15 of the biggest Canadian universities accept it as well. The DET team will keep hammering away at institutions in the UK (and marketing the test to UK-bound students) until they win in that country.

If Cambridge and the BC genuinely believe that the DET and similar tests are detrimental to schools and students, the onus is on them to create an alternative test which is equally attractive to students and score users, but lacks the perceived weaknesses of the DET test. If the problems they’ve described time and again truly exist, this is the only way to prevent them from becoming an even bigger concern.

And, needless to say, the current iteration of the IELTS is not such an alternative.

The Higher Education Policy Institute has published an article written by a managing director at Cambridge University Press & Assessment about what is described as “the shift to remote language testing that removes substantial human supervision from the process.”

Notes the author:

“While some may be excited by the prospect of an “AI-first” model of testing, we should pursue the best of both worlds – human oversight prioritised and empowered by AI. This means, for instance, human-proctored tests delivered in test centres that use tried and proven tech tools.”

And:

“Cambridge has been using and experimenting with AI for decades. We know in some circumstances that AI can be transformative in improving users’ experience. For the highest stakes assessments, innovation alone is no alternative to real human teaching, learning and understanding. And the higher the stakes, the more important human oversight becomes.”

Cambridge has been pushing back against newer tests with a bit more forcefulness in recent months (see also the “The impact of English language test choices for UK HE” report).

To my eye, the debate between at-home testing vs on-site testing is over, with supporters of at-home scoring a decisive victory. Indeed, Cambridge’s own at-home IELTS is widely accepted at schools across key receiving markets. But more importantly, it seems that most test takers really like the idea of at-home testing. Many of those who forgo it in favor of an on-site test do so out of fears that the maker of their chosen test stinks at delivering a seamless at-home product – not because of some love of the test center experience. As test makers get better at doing at-home testing, more test takers will pile into that option.

There might still be room for a robust debate about the merits of synchronous online proctoring (that is, a proctor watches as you take the test) vs asynchronous online proctoring (a proctor watches a video of your test after the fact). But maybe that debate will soon reach a conclusion as well. Note that Pearson seems to be going the async route in their new PEE Test, and that ETS will offer an async option in the new TOEIC Link Test (which is being pitched to higher-ed as an admissions test). These developments suggest that the writing is on the wall for live proctors. Indeed, I was a little surprised to learn that ETS will maintain them as part of the revised TOEFL set to launch in early 2026.

ETS has now published a comprehensive guide to the upcoming TOEFL score scale, focused on institutions.  

The guide contains the charts I shared yesterday, as well as a chart to convert 1-30 TOEFL section scores to 1-6 scores.  There is also a chart to convert IELTS scores to 1-6 TOEFL scores.  And some performance descriptors.

A few other things are mentioned in the guide:

  1. It notes that “We have not conducted a score concordance study with the Duolingo English Test (DET). To determine the best TOEFL score, we recommend you select the TOEFL score based on your desired CEFR level, rather than a direct comparison to DET.”
  2. There is a long section addressing a hypothetical score user that currently enjoys the perceived preciseness of the 1-120 scale, and is worried about the new scale’s lack of such precision.  It is worth reading.
  3. “Digital guidebooks” for test takers will be provided in July.
  4. Again, it is confirmed that starting in January 2028, score reports will only contain the 1-6 score scale.
  5. Starting January 2026, paper score reports will be retired.  The guide specifies that institutions will no longer receive paper score reports.  I assume that test takers will not receive them either, but that is not stated.

To be honest, I think that getting 13,000 institutions to update their score requirements will be a Herculean task.  I look forward to watching it happen.

The second “Office Hours” chat about the upcoming TOEFL changes was a big success. About 75 interested test-prep folks showed up, and most stuck around for the whole hour. I’ll try my best to host a third one once the practice tests have been published. Probably July 16.

A few notes from the community:

  1. Many attendees expressed a desire for more clarity on how adaptive testing will work on the TOEFL. I sense that people want assurances that this change will be fair to students and comprehensible to people preparing them for the test. ETS should spend some time on this issue as the launch date approaches.
  2. Many people are still wondering if the revised TOEFL will contain integrated speaking and writing tasks. In some ways, integrated tasks are the TOEFL’s bread-and-butter, setting it apart from competing products. But on the other hand, eliminating them (or at least their current incarnations) could create a faster, cheaper and more streamlined test form.
  3. My friend Pamela Sharpe, who has been preparing students for the TOEFL since 1970, was in attendance. I asked her what she thinks the biggest revision to the TOEFL has been to date. Her response? Maybe this one.
  4. Most people are pretty enthusiastic about the new score scale. Few believe there is a meaningful difference between a kid with a score of 102/120 and a kid with a score of 104/120.
  5. Many people are wondering if the free practice tests from ETS will be adaptive. While the TOEFL Essentials test is adaptive, the practice tests provided by ETS are not.
  6. Some concerns regarding equity in test prep were raised. Right now, no-budget test takers can wander down to their local library and get copies of the official books to prepare for the test. That will no longer be the case in January, when test takers will be more dependent on paid prep products, which can be costly.
  7. Likewise, there were a few concerns about the accuracy of the practice tests set to be released next month. Older teachers remember how flawed some material was when the TOEFL iBT launched in 2005. This could impact our plans to prepare for the revisions in a timely manner.
  8. A few attendees expressed hope that the test will include global English accents.
  9. We talked about whether the TOEFL might become a regularly adjusted test, like the DET. Some attendees figure this would be great for students. Others were not so enthusiastic.
  10. We talked a lot about TOEFL Essentials and TOEIC and how early reports suggest that this test will share item types with those tests. Like it or not, people are starting to believe that this is a glammed-up TOEFL Essentials test. Mostly because of the way it was presented to test prep firms in China a few weeks ago. If ETS views this sentiment as problematic, they ought to nip it in the bud ASAP.

Check out this wonderful new article in TESOL Quarterly by Yasin Karatay and Jing Xu.  It explores the possibility of using an AI-powered spoken dialog system to simulate an IELTS examiner.  Specifically, the researchers used a self-developed SDS powered by GPT-4o to simulate the third part of the IELTS speaking test.

It is important to note that in this research the AI served only as the interlocutor.  Ratings were carried out by trained humans who reviewed recordings of the interactions.

The authors concluded that “the SDS consistently elicited some key interactional competence features, as seen in face-to-face oral proficiency interviews, and that such features were useful in distinguishing between higher- and lower-proficiency test takers.”  They also highlighted areas for improvement around non-verbal clues and some unnatural interactions.

There is a lot more to it, of course.  So take a moment to read the article.

Though the IELTS partnership takes a “cautiously curious” approach to the use of AI in testing, this may be an area worth exploring in more depth.  Some observers have raised concerns about the ability of the partnership to maintain consistent standards across four million annual speaking tests, each carried out by an individual examiner.  It is possible that at least some of these administrations are impacted by things like bias, fatigue and ability in the examiner population.  It’s no secret that IELTS test takers frequently travel to test centers which they feel are more conducive to a higher speaking score. The IELTS partnership is quick to dissuade test takers of this notion, but it isn’t inconceivable that one examiner might be better at their job than another.  Or that one might possess certain biases which another does not and that such differences could impact how the speaking test unfolds.

On top of that, this sort of change could lead to immense cost savings for the organizations that administer the IELTS (even if human raters are retained).  Some of those savings could be passed on to test takers, if only to make the IELTS more competitive in an increasingly crowded market of tests.  That’s a touchy subject so I’ll leave it for another day, but I think most readers can imagine the possibilities.

Finally, it is worth noting that this research was done by researchers at Cambridge University Press and Assessment.  So maybe this could lead to something.

You may also appreciate this similar research funded by ETS back in 2021.  Based on that work, I’m still optimistic that an AI-powered SDS system will find its way into some future version of the TOEFL.

The Educational Testing Service (ETS) has published a few charts illustrating how the new 1-6 TOEFL scoring scale will work and how to convert between old (1-120) TOEFL scores and new (1-6) TOEFL scores. As illustrated below, the scale will increase at increments of 0.5. It has been confirmed that the overall band score will be calculated by averaging the four section scores and rounding to the nearest 0.5.

Notably, ETS has also confirmed that the current 0-120 scale will be eliminated entirely starting in 2028.

A pleasant feature of the new scale is that it will provide more consistency when linking section scores to the CEFR. Currently that is somewhat confusing. For instance, a score of 22 in the TOEFL listening section represents C1 fluency, while a score of 22 in the reading section represents only B2 fluency. Come January, this will no longer be an issue.

I would be remiss if I didn’t link to Nicholas Cuthbert‘s video from DETcon London, since it generated quite a lot of spirited discussion.  Notably, Nicholas describes Duolingo as “ahead of the game.”  And also says:  “Duolingo are winning.”

Hyperbole?  Perhaps.  Duolingo is certainly on a winning trajectory in terms of acceptance in key receiving markets, market share, brand awareness, test taker engagement, use of technology, and baffling social media campaigns.  But it is important to note that IELTS still does more test administrations than everyone else combined.  There are still people paying $530 to take an IELTS test.  IELTS will be the market leader for many years to come.  Accordingly, there is still plenty of time for the IELTS partnership to develop a “next-gen” IELTS that eats Duolingo’s lunch.

Heck, Pearson is hoping to do just that in about four months.

Recent messaging from the British Council and Cambridge University Press & Assessment suggests that the IELTS partners plan to double-down on their more traditional approach to assessment.  It seems like they don’t plan to change the way they assess students, but instead encourage score users to more carefully consider which tests they choose to accept.

Regardless, I’m convinced that Cambridge has got top people working on… something.  Check out yesterday’s article in TESOL Quarterly, for instance.

A well-informed industry watcher recently expressed some incredulity that LLMs haven’t totally disrupted the high-stakes language testing sector.  He was shocked that people still pay hundreds of dollars to take a test.  My response was that this stuff takes time.  Everyone knows that university governance is a slow process.  Immigration regulations are even slower.  But things might finally be coming to a head – coincidentally both ETS and Pearson announced the existence of their “next-gen” (my term) tests at NAFSA a few weeks ago.  Things are moving a tiny bit faster now.

By the way, you must get to one of these DETcon events if you get the chance.  They are a charming combination of research presentations, community building and Duolingo’s trademark irreverence.  I understand that at the most recent Pittsburgh-based event, Duolingo CEO Luis von Ahn was subjected to an unannounced Yinzer Test.  Not sure what that is, but I suspect it is similar to a Voight-Kampff Test. In any case, the results have not been shared publicly.