Beyond Pipplet: what companies learned by switching to AI-powered language assessment
On 20 December 2025, Pipplet shut down for good. For the hundreds of companies and training organizations that relied on it to assess the language skills of their candidates and employees, the news raised one simple, urgent question: what do we use instead?
On 20 December 2025, Pipplet shut down for good. For the hundreds of companies and training organizations that relied on it to assess the language skills of their candidates and employees, the news raised one simple, urgent question: what do we use instead?
Five months on, the market has answered. And the most interesting part is not that one tool replaced another. It is that many companies used the transition to discover a fundamentally different approach to language assessment: a test driven by artificial intelligence, calibrated to the candidate's actual job, evaluating speaking as well as writing, with no human grader and no delay.
This article looks at what changed, and answers the questions HR and training leaders are actually asking.
Why traditional tests were hitting their limits
Pipplet had genuine strengths. It relied on expert linguists and delivered reports within 24 hours, based on the CEFR. It was a good solution for its time.
But the model carried two structural constraints. First, the human grader: every test had to be reviewed by an evaluator, which meant a delay, a cost, and a degree of subjectivity from one grader to the next. Second, standardized scenarios: a generic test measures theoretical language level, not a person's ability to run a meeting in their field, handle an upset customer on the phone, or write a technical report.
Yet that is exactly what companies need to know. An export salesperson, a contact-center agent, and an engineer on an international assignment do not need the same English. Assessing out of context means measuring the wrong thing.
What a job-contextualized test changes
The new generation of assessment, which FlashLevel is part of, rests on a different principle: place the candidate in a situation close to their real work, and let the AI adapt the test to their answers in real time.
In practice, the test does not ask you to "conjugate this verb." It says: "here is an unhappy client, respond to them." Scenarios are calibrated to the industry and the role: customer relations for a contact center, multi-site coordination for construction, an English report for an audit firm. The candidate is assessed in context, not in theory.
Field feedback is unambiguous here. Across hundreds of completed tests, the same observation comes up candidate after candidate: <em>"The test was highly relevant and matched real situations."</em> <em>"The questions were contextualized to my role."</em> <em>"Adapted to my work environment."</em> That perceived relevance is not a marketing detail: it is what makes the result reliable, because the candidate is assessed on what they will actually do.
The questions HR teams are asking
Does it really assess speaking?
Yes, and this is an often-overlooked point. Many online tests are limited to multiple choice and writing, because speaking is expensive to grade when it requires a human evaluator. FlashLevel includes spoken expression directly in the test, analyzed by AI. As one employee at a major audit firm put it: <em>"Interesting test, because it includes a speaking part, which is missing from most tests of this kind."</em>
Is a test without a human grader reliable?
This is the most legitimate question. The answer lies in the numbers and the feedback. Across all administered tests, more than 80% of candidates confirm the result accurately reflects their real level, and more than 85% find the difficulty level appropriate. Removing the human grader is not a loss of reliability: it removes evaluator bias. The same standard applies to everyone, everywhere, at any time.
How long, and what result?
Thirty minutes. At the end of the test, the result is immediate: a score on the CEFR scale (A1 to C2), with a detailed report by skill (comprehension and expression, both spoken and written), exportable for HR in one click. No 24- or 48-hour delay, no follow-up, no dependency on an evaluator's availability.
Is the result interpretable and exportable?
Yes. Because the score is expressed on the CEFR scale, the universal European reference, it remains interpretable in any context and can be integrated into existing HR or training tracking tools. You are not locked into a proprietary format.
What does it cost?
This is the argument that speaks most directly to HR. An outsourced bilingual assessment interview typically costs between 150 and 300 € per candidate. A job-contextualized AI test costs a fraction of that, starting from a few dozen euros, with an objective, documented criterion for every decision. The return on investment is immediate, especially at recruitment volume.
What real deployments show
Between April 2025 and May 2026, FlashLevel was deployed across more than 50 client accounts, for 926 tests administered, in three contexts: international recruitment, pre-training placement, and team language mapping. The average level achieved is B2. Customer satisfaction (CSAT) reached 70 out of 100 in the first half of 2026, up sharply.
A few concrete examples, by use case.
In high-volume recruitment, the Armatis group (BPO, more than 10,000 employees) integrated FlashLevel as a filtering step after the first interview, across its sites in France, Portugal and Tunisia. Nearly 700 tests were administered, calibrated to contact-center scenarios: handling an upset client, rephrasing a procedure, follow-up emails. In a sector where a poorly calibrated hire can cost between 15,000 and 30,000 €, having an objective criterion before hiring is a game changer.
For technical profiles with an international dimension, Eiffage Énergie Systèmes validates the English level of its engineers before assigning them to international missions, through construction scenarios: reporting, coordinating with a foreign partner, multi-site project briefs. The challenge: technical profiles often self-declare as "professional" without that reflecting their real ease in a meeting.
In pre-training placement, Forvis Mazars uses FlashLevel ahead of its programs to precisely calibrate each course: exact starting level, realistic objectives, adjusted duration. Every funded hour is based on a validated real level, not an estimate.
In team mapping, groups such as Vinci and Pierre & Vacances use the test to get a reliable picture of real levels before building a training plan, avoiding spend on unnecessary or poorly targeted courses.
The takeaway
Pipplet's closure accelerated a shift that was already underway. Assessing a working language is no longer about measuring abstract grammar: it is about verifying that a person can do their job in that language. That is what a job-contextualized, AI-driven test delivers, with speaking included, an immediate CEFR result, and a controlled cost.
If your organization used Pipplet, or if you still assess languages through interviews or self-declaration, now is a good time to see what a new-generation assessment can bring to your recruitment and training decisions.
Want to assess the language skills of your teams or candidates? Book a FlashLevel demo. We run an assessment on a real case of your choice and walk through the report together. Deployment possible within 24 hours.