Augmenting Text To Increase Translation Difficulty

In the Fall of 2025, I started the Deep Learning course at ETH, which had a large, open-ended group research project component. Several of the TAs provided potential topics, and one of them, Vilém, proposed a topic that our group found interesting: find a way to synthetically generate hard-to-translate sequences. Vilém had also just released a paper on Sentinel, a model to estimate translation difficulty of texts. At the time I was researching GCG-style adversarial attacks on language models and had a simple idea: what if we just did GCG-style optimization over a sequence, with the objective being to maximize Sentinel's predicted difficulty instead of a traditional adversarial objective?

The result is this paper, accepted as a conference paper at EAMT 2026.

Open PDF in a new tab · download