Our language consulting service helps companies identify where their translation and localization process can be optimized. We help our customers meet their goals such as to:
- Reach international markets more quickly
- Support international customers better
- Institute authoring best practices
- Integrate the newest technologies
- Choose a new TM tool or MT engine
- Improve reuse of legacy translated content
- Measure & improve translation quality
- Ensure better terminological consistency
- Integrate with a new CMS
- Improve processes
- Add more languages
- Reduce budgets
RBMT, SMT or Hybrid?
The choice of MT engine typically follows the identification of the project goals, content, languages, and so on. The process is aimed at determining which engine will give the best results – whether Rules-Based, Statistical or Hybrid, or a combination of approaches. The answer to this question will depend on factors such as language pairs, content, quality needed, technology to integrate with, what resources – human, technical and data – are available, and so on.
LexWorks is technology agnostic because we know there are cases when a Rules-Based approach works best, cases where a Statistical approach works best, and cases where a Hybrid works best. The trick is knowing which engine to use in which situation.
The process typically may involve some or all of the following:
- Deciding what content might be suitable
- Consulting with your internal staff about what expertise and what resources you have in-house
- Determining what systems the TM tool would have to integrate with and what functionalities are most important
- Identifying internal and external resources
- Piloting initial solutions, measuring and comparing results
- Establish ROI expectations and roadmap
Engine Choice and Language Pairs
Language is one of the most important determinants of engine performance. Some languages, like French and Spanish, tend to work well in most SMT and RBMT engines. Other languages, like Japanese and German, in our experience perform best with a rules-based approach. But by far the greatest number of language pairs – once you leave the dominant languages – do not exist in an off-the-shelf RBMT engine, so in this case the choice is easy: training an SMT engine from scratch.
On the other hand, if you do not have enough data – we’re talking millions of segments of in-domain bilingual and monolingual segments – there may not be enough corpora to train a new engine in the desired language combinations, so a pre-trained engine might be the best choice: today there are pre-trained RBMT, Hybrid and SMT engines which can be used “out of the box” with just fine-tuning customizations.
Engine Choice and Domain
As a general rule of thumb, if the terminology is fixed in a narrow domain such as automotive or software documentation, RBMT or a Hybrid are generally the best choices. This is because the rules component protects terminology better.
Content where the terminology comes from a number of domains, such as patents, works better with SMT. On the other hand, if there are meta data tags, SMT doesn’t preserve tags well, so RBMT or Hybrid technology will save you some headaches.
The source of the content is also important – SMT is better suited to user generated content such as forums, whereas RBMT is better suited to documentation that needs to be post-edited to human quality.
Engine Choice and Process
Another question in the consultation phase is to decide how you want to manage MT. Do you want an internal engine that you manage with your own staff? In that case, you may only need a customization service to provide you with the initial engine, as well as other peripheral services such as staff training and maintenance/updates.
If you are managing your MT process internally, the engine choice will be dictated by what kind of resources you have available: SMT will require more processing power, and strong engineering; RBMT managed internally will require more language resources.
Or do you plan to outsource all MT activities, from customizing and processing to post-editing and maintaining? Then the service that will fit you best will be a turnkey solution that delivers the translated texts to you exactly as in your traditional process, only faster and at less cost.
For an all-in-one solution, the engine choice will be determined solely on performance.
If it seems a bit complicated to know what engine to use, one best practice we use is testing. Being engine agnostic means that before starting any project we fully test our assumptions by running the content through an SMT engine, an RBMT engine and a Hybrid. Sometimes expectations based on rules of thumb can be wrong. The best way to test assumptions and also to validate engine choice is through testing.
The engine that reliably delivers the best quality output is the one we choose.