Tether’s Paolo Ardoino Makes Case for Small On-Device Translation Models

On May 13, 2026 by voice

Tether CEO Paolo Ardoino has turned the spotlight on a very different corner of artificial intelligence: translation that happens entirely on-device, without sending sensitive text to the cloud.

In a recent post, Ardoino framed the issue around privacy, speed, and practicality. His point was simple enough, but it touches a problem that millions of users encounter every day. When someone translates a medical note, a private message, a legal contract, or even a personal journal entry through a cloud service, that text leaves the device and enters someone else’s infrastructure.

In many cases, users do not fully know where the data goes, how long it is retained, or who may be able to access it. Ardoino argued that this is not just a theoretical concern, but a real one, especially in use cases where confidentiality matters.

According to Ardoino, the answer is not to rely on larger and larger general-purpose AI models. Instead, he argued that translation is one of those jobs where small, dedicated models can beat “Goliath.”

In his view, if the task is translating one language into another, there is no need to use a massive model that can also write poems, summarize articles, and perform a dozen unrelated tasks. For translation, a specialized model built for one purpose can be smaller, faster, and more reliable.

Outperform Larger LLMs

Ardoino pointed to the limits of general-purpose language models on edge devices such as phones and laptops. Even relatively small models can consume significant storage, take a long time to load, and still perform too slowly for a smooth user experience.

By contrast, dedicated neural machine translation models can be dramatically lighter, often only a few dozen megabytes in size, while loading in milliseconds and producing translations far more quickly. In Ardoino’s telling, this difference is not just technical trivia. It changes what is possible for real users on real devices.

That privacy-first argument sits at the center of the approach being pushed through QVAC, the project he discussed in the post. The idea is to make translation fully local, so that the entire process happens on the user’s phone, laptop, or embedded hardware. No cloud request is needed.

No third party needs to see the text. For users and developers concerned about compliance, that can also mean fewer>QVAC is not limiting itself to one kind of translation engine. While dedicated NMT models are the long-term goal, the system can also support LLM-based translation in the meantime.

Practical Bridge Strategy

Ardoino described that as a practical bridge strategy. If a new language pair needs to be shipped quickly, a larger model can be deployed first, while a dedicated translation model is trained in parallel. That way, users get immediate support, and the experience can improve over time as the smaller model replaces the temporary fallback.

Another theme in the post was batch translation. Ardoino said this became important once the team moved beyond demos and started thinking about production use cases such as documents, chat histories, and multi-sentence inputs.

Translating one sentence at a time may be fine for a simple interface, but batch processing makes a huge difference in real applications. The team said the result was around 2.5 times faster throughput at scale, with noticeable latency improvements per sentence.

The most ambitious part of the proposal is coverage. Instead of trying to build a separate model for every possible language pair, QVAC uses English as a pivot. That means a translation path, such as Spanish to Italian, can be handled by chaining Spanish-to-English and English-to-Italian models together.

In practical terms, this reduces the number of models needed from an enormous number to something much more manageable. Ardoino suggested that supporting 26 languages could require roughly 50 models instead of 650, making a broad on-device translation system far more realistic.

He also shared benchmark numbers showing why the approach matters on real hardware. On a Linux laptop, the Bergamot English-to-Italian model reportedly loaded in just over 100 milliseconds and delivered high translation quality.

On a Pixel 10 Pro XL running directly on-device, the model loaded in under 80 milliseconds and performed especially well in batch mode. Ardoino said the mobile results showed a clear advantage over sequential translation, with batch processing producing a much more responsive experience.

Looking ahead, the team said it is expanding into Indic languages through IndicTrans and into more African language coverage through AfriqueGemma, while also exploring streaming translation for live chat and subtitle generation. The broader message of the post was that local AI does not have to be a compromise. In translation, at least, Ardoino argued that smaller models may not only be enough, but better.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tether’s Paolo Ardoino Makes Case for Small On-Device Translation Models

Leave a Reply Cancel reply

Archives

Calendar

Categories

Archives

Categories

Tether’s Paolo Ardoino Makes Case for Small On-Device Translation Models

You may also like

Circle pitches stablecoin settlement as alternative to batch banking systems

Hyperliquid dominates weekly blockchain fee revenue as vertical chains gain ground

Velvet and DFlow unite to refine Solana trading precision

Leave a Reply Cancel reply

Archives

Calendar

Categories