Post-Editors REALLY Hate Poor MT Output
Translator anger over being asked to post-edit bad machine translation is at the boiling point. The hottest topic of discussion in 2011 on LinkedIn’s Automated Language Translation Group was translator anger as related to MT.
This is a real issue, and not just for translators*. At last year’s GALA event I also heard a lot of venting from translation company owners who felt that they had to do an unfair amount of mopping up when their clients sent them MT output to post-edit.
At Lexcelera, we never have angry post-editors. We’ve never had someone on our team throw up his or her hands and say, “I don’t want to do this.” Our translators don’t complain they have too much clean-up to do. In fact, last week a post-editor was so pleasantly surprised by how fast she was able to work, SHE wrote US a testimonial letter.
That’s when you know your MT process is working. But so often that isn’t the case. And when MT under-performs, translators are not wrong to be angry. It’s unfair for them to have to mop up bad MT.
So today I want to talk about how to make translators happy about post-editing MT.
I don’t mean to be controversial, but here’s what we have discovered over five years of intensive MT work:
If you want happy post-editors, use a rules-based machine translation system, and train it well.
First of all, let me say that I am not against statistically-based machine translation (SMT). In fact, Lexcelera is part of a group (along with Symantec, Acrolinx and the universities of Edinburgh and Geneva) which has won a 1.8 million euro EU grant to work with SMT on user-generated and community content.
In our experience, SMT is better than RBMT for what I call the “wild west” content that can come out of communities. That is, content that is unstructured and highly variable. Patent translations too can be a bit wild west in that they cover so many domains that it’s hard to train a high-performing MT system. Here again, SMT is a good fit.
When deciding whether to use an SMT engines (such as Google, Language Weaver or Moses) or a rules-based one (such as Systran, Reverso or Promt), I am in agreement with Fred Hollowood’s narrow and broad domain distinction. As a general, and perhaps oversimplified rule: if the domain is narrow enough that you can set the vocabulary for it (as with virtually any software localization, for example), then RBMT performs better. If you can’t set the domain terminology anyway, then SMT is a better fit.
But what accounts for so much translator anger over MT? Here’s what our experience tells us.
An MT engine that is not properly trained on your material and your terminology, whether rules-based or statistical, is going to waste post-editors’ time as they correct errors that ought not to be there in the first place.
But statistical output is particularly unsatisfying to work with because the translators are not empowered to make lasting changes and because it typically takes until the next training cycle to see any improvement at all. But perhaps most disturbing with SMT is its lack of predictability, which means that translators waste a lot of time verifying terminology that ought already to be automatically verified. And therein lies the rub.
Being experienced with both systems, we conclude that RBMT systems are more translator-friendly for three very important reasons.
1) Post-editors can affect the quality of the output. Directly. They see a translation they don’t agree with, they signal it to us. And we change it. So in our experience they don’t have that frustration that SMT post-editors have when faced again and again with the same issue they’ve already fixed.
2) The improvement cycles of RBMT are considerably shorter. It’s so easy to customize RBMT that you don’t have to wait for enough new data to do a whole new training cycle, as with SMT, which could be 6 months, or 1 year in the future. You may not want to retrain your RBMT engine every day, but in our process we make all the improvements before we output another batch of content.
This makes post-editors happy because not only can they see that their corrections have been taken into account, but they also get the benefit of everyone else’s corrections on the same text.
3) RBMT is faster to post-edit, and improves rapidly over time. If the terminology has been hard-coded, and in a well-trained engine it is, then post-editors know it’s the right terminology. They can see that a term came directly from the client- and domain- and even product-level glossaries. SMT post-editors, on the other hand, complain what a ‘time sink’ it is when they have to constantly verify that the terms employed are the right ones.
If you ask me, translators are going to be angry and stay angry until their time is respected by:
- using the right MT system for the right task,
- properly training that system on the relevant terminology,
- giving translators the opportunity to improve the system,
- implementing their improvements in as close to real time as possible.
As a pioneering MT services firm and one of the very few experienced in both SMT and RBMT in wide range of European and Asian languages, we would say without hesitation that for localization projects, RBMT is faster and more satisfying to post-edit to publication quality.
* I use post-editors and translators interchangeably because post-editors are frequently translators trained to do post-editing tasks.