How does it work to be technology agnostic in practice? While there are general rules of thumb – for example, that SMT works better with user-generated content – LexWorks has found it valuable to test assumptions pretty extensively. We often test all three approaches before launch of a major project, whether a customer support site in 9 languages or documentation in 17.
To give real-world examples of our findings, here are some case studies showing why we chose one engine over another. Being neutral about what technology to use is important to us. As John Papaioannou, the CEO of Lexcelera-LexWorks, says: “Being technology agnostic means using the very best technology for the task, without being bound by a supplier monopoly”
Machine Translation in the Real World: SMT vs RBMT vs Hybrid
Factory, Novocherkassk Russia
Challenge: The challenge of this three-year project was to translate two to 200 pages of English to Russian each and every day. The content was mainly technical specifications and contracts.
Constraints: There was no bilingual data available at project start to train engines.
Solution: RBMT. Without data to train an SMT engine, a rules-based engine was the de facto choice. In any case, we often pair a rules-based or hybrid engine with Russian, as it is a morphologically complex language.
Challenge: To translate 30,000 pages, mostly emails, technical reports and meeting minutes from Japanese to English in order to identify information that could be considered a smoking gun.
Constraints: Content was written with little attention to grammar or spelling, and was highly colloquial.
Solution: Hybrid. We chose a hybrid engine because the SMT part works best with grammatically incorrect and colloquial sentences, while the RBMT part tends to perform best in the Japanese-English pair.
Response to a 3400-Page Technical RFP in one week
Challenge: To translate 3400 pages in one week from French & English to Brazilian Portuguese for a response to a Request for Proposals (RFP).
Constraints: Limited data at project start, and limited time for training. The content came in many different files, and there were multiple passes on each file as the customer rewrote while the translation process was going on.
Solution: Hybrid. The SMT component of the Hybrid was helpful in allowing us to input TMs as training material and also adapt to changing source text. The RBMT component allowed us to enter key terminology and to save on post-editing time.
Online Customer Support Site
Challenge: To make dynamic content available on a customer support website in 9 languages in order to solve customer issues before they became a call to the help desk.
Constraints: Extremely colloquial user-generated content with little attention to correct grammar and spelling, extensive use of abbreviations and content unlike what is found in product documentation. The server needed 24/7 uptime.
Solution: Online SMT. To ensure that the system was trained with both in-domain and out-of-domain material (the latter including sentence constructs not found in user documentation) and also available online 24/7, we chose the Microsoft Translator Hub widget. The widget was customized with product names, Do Not Translates, and the results of post-editing spot checks.
Self-Service MT Server for 200,000 Employees
Challenge: One of the top five banks in the world needed an MT system behind their firewall so that their employees would not send sensitive information out to Google for translation.
Constraints: This customer has many different business units such as investment, insurance, construction and automotive leasing, with sometimes competing terminology.
Solution: Hybrid. To manage the very domain-specific terms we needed an RBMT engine to organize and rank terminology by business unit. Having a large amount of bilingual corpora to train the statistical component enabled us to choose a hybrid engine which offers both. The hybrid server is now on the client’s premises, which means we are able to maintain and update remotely.
If you would like more information, or to ask about a particular use case, click here to send me an email.