There are different kinds of machine translation: rules-based and statistical. Each system has advantages and disadvantages and will perform better in certain situations (depending, for example, on the language pair, domain and corpus availability for training).
Here’s what you need to know as a business decision maker. Machine translation is not a tool. Machine translation is a process. The method of machine translation matters but so too do a lot of other factors such as language pairs, content, data available for training and so on. Our process methodology is built with a deep understanding of the different use cases. Below are some of the main differences between SMT and RBMT.
Rules-Based vs. Statistical Machine Translation
- SMT is better for User Generated Content and broad domain material such as patents
- RBMT is better for documentation and even software
- RBMT protects software tags
- SMT may translate software tags
- RBMT is better suited to post-editing and durable changes
- SMT is better suited to on-the-fly translations of short-shelf-life content
- RBMT retains corrections to terminology (and applies the correct grammar)
- SMT will use the most likely term, but not necessarily the one you wanted
- RBMT is predictable: the sentences may not be pretty, but you know what you will get and will get the same result every time
- SMT is unpredictable but sentences are more fluid
- RBMT is faster to update, maintain (can be done daily or more frequently)
- SMT has longer updating cycles (once or twice a year is typical)
- RBMT is expensive to license
- SMT can be free open source
- RBMT is heavy on linguistic resources
- SMT is heavy on processing resources
- SMT makes more fluid sentences
- RBMT makes less fluid sentences
- SMT can handle bad grammar, and doesn’t improve much with controlled authoring
- RBMT does significantly better when controlled authoring is in place
- SMT is the only choice for minority languages
- SMT and RBMT are matched for languages like French and Spanish
- RBMT performs better for Japanese, German, Russian, Korean
- SMT can handle over 50 languages out of the box (Google and Bing/Microsoft Translator)
- RBMT can handle around 20 target languages out of the box
- RBMT is ready to use off the shelf but needs customization on your domain and preferred terminology
- SMT may need millions of bilingual and monolingual segments but engines may be pre-trained for a particular domain
In the final analysis, any MT engine will get you there. It’s just a question of how fast and whether the wheels will come off and have to be put back on. When the wrong engine is used, the end customer will still get a good translation. But the post-editors pay in their own sweat and tears for the wrong approach.
That’s why we believe in choosing the right engine for the right languages and the right content. Most of the time we take a hybrid approach, combining the best of rule-based machine and statistical translation together into a single engine that we customize and train against your content. But not always: we test our assumptions continually to make sure that we are using the highest performing engine, then we make sure that it is properly trained so that our post-editors are working on the best output possible.