Adventures in biomechanical translation

Machine translation (MT) is the bugbear of the professional translator. Machine-assisted translation (MAT) is a more devious, and perhaps more pernicious bugbear. Machine translation takes the translator out of the process entirely; machine-assisted translation makes use of the translator’s expertise to create patterns of source/target sentence pairs, and attempts to extrapolate these patterns through the source text. Translation agencies then use the “match rate” as a way to chisel the translator on payments.

Most of the work that I do is not very amenable to MAT (if I used it at all)–my guesstimate is that most of my jobs would have less than a 10% match rate overall. But the job I’m doing right now would be highly amenable to MAT: it’s programming document where a given sentence may be repeated 50 times, with minor variations in predictable spots.

The job was sent to me as a series of MS Word files, which I manually concatenated into one. Word search/replace tools are relatively limited, but BBEdit has a powerful implementation of GREP. So, after much gnashing of teeth, I managed to export a usable HTML file from Word, and cleaned it up. This in itself could be the subject of an even-more-tiresomely long post, which I will spare everyone from reading, and myself from reliving.

Once I got the file whipped into a shape I could stand looking at, I started working out GREP patterns. Some of these were highly productive–one pass would translate 40 or so sentences. Some would only do the one I was looking at. So I’ve been manually reproducing the MAT process, and getting pretty good at GREP syntax to boot. But as I work on it, there’s always a nagging feeling that if I understood that syntax better, I could produce more generalized patterns that would capture more sentences. The ultimate, of course, would be the hideously convoluted pattern that would be required to translate the entire document in one pass–which starts getting into Chomsky territory.

Postscript: I finished that job. What started out as 28 Word files weighing in at a total of 1.2 MB wound up–when I finished concatenating, exporting to HTML, cleaning up, translating, and compressing with Gzip–as a 17.1 KB file. Amazing.

New Green Goddess coming

Kenkyusha is readying a new version (link in Japanese) of its New Japanese-English Dictionary. Despite its many faults, it is considered the standard J-E dictionary, and is referred to by Japanese translators as the Green Goddess, or GG for short. It is interesting that the product announcement explicitly refers to this. On the Honyaku mailing list, Tom Gally writes:

The chief in-house editor of the dictionary
at Kenkyusha first learned that the dictionary has this nickname from
Mayumi Nishioka, a long-time Honyaku contributor, when she contacted
him in 1995 about a term that had been discussed on Honyaku.

Purchasers of this dictionary may note other connections between
Honyaku and the new GG, including terms and translations that have
been discussed on this list in the past and several familiar names
among the contributors.

It is a minor sport among J-E translators to point out bizarre entries in this dictionary, many of which seem preserved from the first edition in 1918–it still contains entries like 鉄道馬車 tetsudo basha or horse-drawn railway car. The fourth edition was released in 1974, and one gets the impression that new glosses were tacked on to the end of existing ones, so one often must skip to the end of the entry to get the most helpful definition. The example sentences also have an antique quality, like “give a wrench at the doorknob.” Anyhow, it’ll be interesting to see how new the fifth edition is.

Consultant, de-jargonize thyself

The NY Times reports that Deloitte Consulting has come up with a Word macro, aptly named Bullfighter, that removes or simplifies annoying consultant jargon. Words like “extensible” and “scalable” are simply eliminated; “ecosystem” becomes “system” (which isn’t much better, frankly).

All specialized professions protect their turf through the use of inscrutable jargon. Once people figure out what the hell Deloitte’s wonks are telling them, they’ll stop hiring them.

They hate translation, translation hates them

The things I miss not being a literary translator.

Apparently the Complete Review (which I’ve never heard of) published a review of a book about translation and some Rilke in Translation, where, among other things, the author writes “We at the complete review hate translation.” This provoked a bit of outrage here, here, and here. The original author responded to this criticism, saying

Translations may be well and good but they are not the originals. They are something different and what we’re interested in is the original. We want to read the author’s work, not the translator’s work. But being illiterate in languages X,Y, Z, etc. we are unable to read the originals and so have to rely on the translations — which in some ways resemble the originals but are still — arguably entirely and fundamentally — different. Reading a translation makes us feel we are blind and merely listening to someone describe the sights around us

This is silly. This is like blaming a banana for not being an orange. If the author doesn’t want to be reminded of his own shortcomings, he should do one of the following:

  • Learn the desired foreign language fluently
  • Abjure all contact with other languages, even through translation
  • Shut up


Memo to those guys out there who come up with names for high-tech widgets. You do nothing to endear yourselves to me or anyone else required to actually type the names of your products when you use intercapping, punctuation marks in the middle of the word, gratuitous exclamation points, and intellectual-property warning signs all over the place (as if anyone would want to embarrass themselves by infringing your use of names like FlabiNatör 2000!™). I am taking time out of my busy, important schedule, in the midst of a super-exciting press-release translation (which the media will hungrily gobble up and regurgitate to an equally eager public in its original form, miraculously unmolested by the editorial digestive processes) to tell all of you to cut it out already.

Language-recognition algorithm

This is fascinating. Italian researchers have found a way to identify the source language of a text just based on how that text has been treated by a compression algorithm. It gets better:

The scientists performed a further test of their technique by analyzing a single text that has been translated into many different languages — in this case the Universal Declaration of Human Rights. The researchers used their method to measure the linguistic “distance” between more than 50 translations of this document. From these distances, they constructed a family tree of languages that is virtually identical to the one constructed by linguists.

If you have an interest in the Japanese language, is a startling, fascinating website. It takes other websites (or whatever plain text you feed it) and displays them with a translation layer, so that when you point at a word in Japanese, the translation appears.

Unfortunately it’s a bit buggy, and it seems to work better with Navigator than IE, but I’m still very impressed.

Translatorese tidbit

This Audi press release contains a fun mistranslation: Bearing in mind its performance figures and ample interior space, with room for five occupants, this is an unbeatably low value.

Usually these press releases are pretty good, the typically German problem of logorrhea and marketing claptrap notwithstanding.