Tether’s Paolo Ardoino Makes Case for Small On-Device Translation Models
Tether CEO Paolo Ardoino has turned the spotlight on a very different corner of artificial intelligence: translation that happens entirely on-device, without sending sensitive text to the cloud.
In a recent post, Ardoino framed the issue around privacy, speed, and practicality. His point was simple enough, but it touches a problem that millions of users encounter every day. When someone translates a medical note, a private message, a legal contract, or even a personal journal entry through a cloud service, that text leaves the device and enters someone else’s infrastructure.
In many cases, users do not fully know where the data goes, how long it is retained, or who may be able to access it. Ardoino argued that this is not just a theoretical concern, but a real one, especially in use cases where confidentiality matters.
According to Ardoino, the answer is not to rely on larger and larger general-purpose AI models. Instead, he argued that translation is one of those jobs where small, dedicated models can beat “Goliath.”
In his view, if the task is translating one language into another, there is no need to use a massive model that can also write poems, summarize articles, and perform a dozen unrelated tasks. For translation, a specialized model built for one purpose can be smaller, faster, and more reliable.
Outperform Larger LLMs
Ardoino pointed to the limits of general-purpose language models on edge devices such as phones and laptops. Even relatively small models can consume significant storage, take a long time to load, and still perform too slowly for a smooth user experience.
By contrast, dedicated neural machine translation models can be dramatically lighter, often only a few dozen megabytes in size, while loading in milliseconds and producing translations far more quickly. In Ardoino’s telling, this difference is not just technical trivia. It changes what is possible for real users on real devices.
That privacy-first argument sits at the center of the approach being pushed through QVAC, the project he discussed in the post. The idea is to make translation fully local, so that the entire process happens on the user’s phone, laptop, or embedded hardware. No cloud request is needed.
No third party needs to see the text. For users and developers concerned about compliance, that can also mean fewer>QVAC is not limiting itself to one kind of translation engine. While dedicated NMT models are the long-term goal, the system can also support LLM-based translation in the meantime.
Practical Bridge Strategy
Ardoino described that as a practical bridge strategy. If a new language pair needs to be shipped quickly, a larger model can be deployed first, while a dedicated translation model is trained in parallel. That way, users get immediate support, and the experience can improve over time as the smaller model replaces the temporary fallback.
Another theme in the post was batch translation. Ardoino said this became important once the team moved beyond demos and started thinking about production use cases such as documents, chat histories, and multi-sentence inputs.
Translating one sentence at a time may be fine for a simple interface, but batch processing makes a huge difference in real applications. The team said the result was around 2.5 times faster throughput at scale, with noticeable latency improvements per sentence.
The most ambitious part of the proposal is coverage. Instead of trying to build a separate model for every possible language pair, QVAC uses English as a pivot. That means a translation path, such as Spanish to Italian, can be handled by chaining Spanish-to-English and English-to-Italian models together.
In practical terms, this reduces the number of models needed from an enormous number to something much more manageable. Ardoino suggested that supporting 26 languages could require roughly 50 models instead of 650, making a broad on-device translation system far more realistic.
He also shared benchmark numbers showing why the approach matters on real hardware. On a Linux laptop, the Bergamot English-to-Italian model reportedly loaded in just over 100 milliseconds and delivered high translation quality.
On a Pixel 10 Pro XL running directly on-device, the model loaded in under 80 milliseconds and performed especially well in batch mode. Ardoino said the mobile results showed a clear advantage over sequential translation, with batch processing producing a much more responsive experience.
Looking ahead, the team said it is expanding into Indic languages through IndicTrans and into more African language coverage through AfriqueGemma, while also exploring streaming translation for live chat and subtitle generation. The broader message of the post was that local AI does not have to be a compromise. In translation, at least, Ardoino argued that smaller models may not only be enough, but better.
You may also like
Archives
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- January 2024
- December 2023
- January 2023
- December 2022
- January 2022
- December 2021
- January 2021
- December 2020
- December 2019
Leave a Reply
You must be logged in to post a comment.