July 2003


Blizg is another one of these blog-affinity finders. It looks for “ICBM” location data (as popularized by GeoURL) and Keywords in your headers, and finds proximate and topically-related blogs for you. Nice. Although it would be more powerful if it could extract keywords from the stuff you actually post about, rather than the stuff you mention in your meta-data.

Adventures in biomechanical translation

Machine translation (MT) is the bugbear of the professional translator. Machine-assisted translation (MAT) is a more devious, and perhaps more pernicious bugbear. Machine translation takes the translator out of the process entirely; machine-assisted translation makes use of the translator’s expertise to create patterns of source/target sentence pairs, and attempts to extrapolate these patterns through the source text. Translation agencies then use the “match rate” as a way to chisel the translator on payments.

Most of the work that I do is not very amenable to MAT (if I used it at all)–my guesstimate is that most of my jobs would have less than a 10% match rate overall. But the job I’m doing right now would be highly amenable to MAT: it’s programming document where a given sentence may be repeated 50 times, with minor variations in predictable spots.

The job was sent to me as a series of MS Word files, which I manually concatenated into one. Word search/replace tools are relatively limited, but BBEdit has a powerful implementation of GREP. So, after much gnashing of teeth, I managed to export a usable HTML file from Word, and cleaned it up. This in itself could be the subject of an even-more-tiresomely long post, which I will spare everyone from reading, and myself from reliving.

Once I got the file whipped into a shape I could stand looking at, I started working out GREP patterns. Some of these were highly productive–one pass would translate 40 or so sentences. Some would only do the one I was looking at. So I’ve been manually reproducing the MAT process, and getting pretty good at GREP syntax to boot. But as I work on it, there’s always a nagging feeling that if I understood that syntax better, I could produce more generalized patterns that would capture more sentences. The ultimate, of course, would be the hideously convoluted pattern that would be required to translate the entire document in one pass–which starts getting into Chomsky territory.

Postscript: I finished that job. What started out as 28 Word files weighing in at a total of 1.2 MB wound up–when I finished concatenating, exporting to HTML, cleaning up, translating, and compressing with Gzip–as a 17.1 KB file. Amazing.


I’ve written before about why I blog, but you probably knew that was a load of hooey. The real reason I keep a blog is because I hope to achieve wealth and fame through it.

So far, my success on the fame part has been limited, and the wealth part hasn’t been working out at all. Until now: I’ve received an offer for a “complimentary review copy” of what I am promised is an “entrancing novel.”

I’ve got a few reviews on epinions, and All Consuming but my guess is that I got this because of my blog–the book has a Japanese angle, and it would be too difficult to find reviewers on those sites with an interest in Japan (my profiles mention nothing about Japan). But contacting would-be reviewers on the basis of their blogs wouldn’t be a first.

Will I take them up on it? I haven’t decided, but I’m not inclined to. I prefer to choose my own reading.