{"id":2038,"date":"2008-12-30T21:58:37","date_gmt":"2008-12-31T03:58:37","guid":{"rendered":"http:\/\/8stars.org\/a\/?p=2038"},"modified":"2008-12-30T21:58:37","modified_gmt":"2008-12-31T03:58:37","slug":"my-gripes-about-translation-memory","status":"publish","type":"post","link":"https:\/\/8stars.org\/a\/2008\/12\/30\/my-gripes-about-translation-memory\/","title":{"rendered":"My gripes about translation memory"},"content":{"rendered":"<p>I recently tweeted that I was experimenting with <a href=\"http:\/\/www.omegat.org\/\">OmegaT<\/a>, a translation-memory tool. When asked by one of its proponents how I liked it, I <a href=\"http:\/\/twitter.com\/adamrice\/statuses\/1084014527\">responded<\/a><\/p>\n<blockquote><p><a href=\"https:\/\/twitter.com\/brandelune\">@brandelune<\/a> do not like omegaT. really only works with plain text. ugly. burdened w\/ typical java on mac shortcomings. not customizable.<\/p><\/blockquote>\n<p>That barely begins to cover what I don&#8217;t like about OmegaT. I&#8217;ve been thinking about what I <em>would<\/em> like in a translation tool for a while now. My desires break down into two categories: the translation-memory engine, and the environment presented to the translator.<br \/>\n<!--more--><\/p>\n<h4>The TM engine<\/h4>\n<p>Translation memory is based on the concept of the <i>segment<\/i>. Typically, a segment is one sentence, and the TM tool can pre-segment the document at sentence breaks, but there&#8217;s no firm rule that all segments must be sentences.<\/p>\n<p>A sentence is a logical unit for segmentation, but it&#8217;s a Procrustean rule that doesn&#8217;t always apply comfortably. I suspect that many texts that would benefit from TM would get more benefit from a different level of segmentation.<\/p>\n<p>A friend of mine is working on a TM tool that allows for nested segments, and I believe that&#8217;s an important piece of the puzzle, but it&#8217;s only one piece.<\/p>\n<p>The problem is that with the current state of the art, these different segments (nested or not) would need to be created manually, which defeats the purpose. The tool is supposed to make you work more efficiently, not work more. I believe that smarter segmentation logic could automatically identify phrases or clauses. Another piece of the puzzle\u00e2\u20ac\u201dand one I confess I haven&#8217;t quite figured out\u00e2\u20ac\u201dis comfortably specifying which segment one is translating at a given moment when working with nested segments.<\/p>\n<p>I believe this could be accomplished, at least in part, by searching for frequency, which would have certain knock-on benefits. If a phrase or clause is repeated in a document (or across multiple documents), it should be marked as a segment. If not, it&#8217;s less important to treat it as one. Just knowing that a segment is going to be repeated later in a document could be useful to the translator, and a CAT tool that could <em>show<\/em> the other contexts in which that segment is used could be very helpful. I don&#8217;t believe any CAT tool gives a prospective view of segments like this. This kind of thing is tougher with, say, a Japanese source text, because there are no spaces as word delimiters, so there needs to be some lexical analysis (which could be tripped up by imperfect grammar or spelling). Still, it could be done.<\/p>\n<h4>The environment<\/h4>\n<p>Translation memory would be only one aspect of the translator&#8217;s work environment, even if it is central. Even just considering that, I don&#8217;t much care for the ones I&#8217;ve seen.<\/p>\n<p>There seem to be two approaches to presenting the TM interface: self-contained apps and floating windows. OmegaT is a self-contained app that reads in the source document and lets you iterate through it, one segment at a time, so that you wind up with a document interleaving source segments with target segments.<\/p>\n<p>Some other computer-assisted translation tools let you work inside Word (or whatever), viewing the source document in situ and providing a floating window for entering translated text that hovers over the main document like a remora, and overwrites the source as you go.<\/p>\n<p>I don&#8217;t particularly care for either view. I like to be able to look at the source document in its entirety, likewise my target document, as it gives me a sense of the flow between sentences. Interleaving source and target or simply replacing source with target makes this difficult. Ideally, I&#8217;d like a large pane showing the source document unmolested and as it was meant to be read. Perhaps this is my old-fashioned paper-oriented mentality showing through.<\/p>\n<p>I like the idea of a self-contained environment, but I recognize that a good one has several hurdles to overcome that the parasitic environment would not, most notably file-format support.<\/p>\n<p>One of OmegaT&#8217;s biggest shortcomings, in my opinion, is that it is effectively limited to plain text. It <em>can<\/em> open HTML, docx, and some other file formats, but any format with, well, formatting is presented with tags surrounding the formatted text. In the case of a (seemingly) lightly formatted document that I tried opening in OmegaT, a chapter heading reading &#8220;\u00e7\u00ac\u00ac2\u00e7\u00ab\u00a0\u00e3\u20ac\u20ac\u00e9\u00ab\u02dc\u00e5\u00ba\u00a6\u00e6\u02c6\u0090\u00e9\u2022\u00b7\u00e4\u00b8\u2039\u00e3\u0081\u00ae\u00e4\u00ba\u2039\u00e6\u00a5\u00ad\u00e6\u2039\u00a1\u00e5\u00a4\u00a7\u00ef\u00bc\u02c61960\u00ef\u00bd\u017e1966\u00ef\u00bc\u2030&#8221; with no apparent formatting at all was rendered as<\/p>\n<p><tt>&lt;w0&gt; &lt;w1&gt; &lt;w2\/&gt; &lt;w3\/&gt; &lt;w4\/&gt; &lt;w5\/&gt; &lt;\/w1&gt; &lt;w6&gt; \u00e7\u00ac\u00ac&lt;\/w6&gt; &lt;\/w0&gt; &lt;w7&gt; &lt;w8&gt; &lt;w9\/&gt; &lt;w10\/&gt; &lt;w11\/&gt; &lt;w12\/&gt; &lt;\/w8&gt; &lt;w13&gt; 2&lt;\/w13&gt; &lt;\/w7&gt; &lt;w14&gt; &lt;w15&gt; &lt;w16\/&gt; &lt;w17\/&gt; &lt;w18\/&gt; &lt;w19\/&gt; &lt;\/w15&gt; &lt;w20&gt; \u00e7\u00ab\u00a0 \u00e9\u00ab\u02dc\u00e5\u00ba\u00a6\u00e6\u02c6\u0090\u00e9\u2022\u00b7\u00e4\u00b8\u2039\u00e3\u0081\u00ae\u00e4\u00ba\u2039\u00e6\u00a5\u00ad\u00e6\u2039\u00a1\u00e5\u00a4\u00a7\u00ef\u00bc\u02c6&lt;\/w20&gt; &lt;\/w14&gt; &lt;w21&gt; &lt;w22&gt; &lt;w23\/&gt; &lt;w24\/&gt; &lt;w25\/&gt; &lt;w26\/&gt; &lt;\/w22&gt; &lt;w27&gt; 1960&lt;\/w27&gt; &lt;\/w21&gt; &lt;w28&gt; &lt;w29&gt; &lt;w30\/&gt; &lt;w31\/&gt; &lt;w32\/&gt; &lt;w33\/&gt; &lt;\/w29&gt; &lt;w34&gt; \u00ef\u00bd\u017e&lt;\/w34&gt; &lt;\/w28&gt; &lt;w35&gt; &lt;w36&gt; &lt;w37\/&gt; &lt;w38\/&gt; &lt;w39\/&gt; &lt;w40\/&gt; &lt;\/w36&gt; &lt;w41&gt; 1966&lt;\/w41&gt; &lt;\/w35&gt; &lt;w42&gt; &lt;w43&gt; &lt;w44\/&gt; &lt;w45\/&gt; &lt;w46\/&gt; &lt;w47\/&gt; &lt;\/w43&gt; &lt;w48&gt; \u00ef\u00bc\u2030&lt;\/w48&gt; &lt;\/w42&gt; <\/tt><\/p>\n<p>I went no further with OmegaT.<\/p>\n<p>Other big hurdles are what would make a translation environment more than just a TM tool: the inclusion of other tools, and the decision of what other tools to include.<\/p>\n<p>One obvious tool is a dictionary. OmegaT does have a facility for job-related glossaries, which is nice as far as it goes, and I believe that other CAT tools can support job, client, and general glossaries, but none of these tie in with general-purpose dictionaries that use the EDICT or EPWING formats, or Chinese-character dictionaries (which are their own very special ball of yarn).<\/p>\n<p>Another obvious tool would be a web browser, or a specialized Wikipedia browser.<\/p>\n<p>I have done a series of jobs where, in addition to a text transcript, I also had video files. It might seem a bit much to integrated video playback into a translation tool, but it would have been fantastically useful for that work. <\/p>\n<p>Something that may be peculiar to my style of translation is the need for some scratch space. Working in between the opening and closing tags for a segment imposes a subtle psychological confinement. I need room to spread out. When I encounter a long, knotty sentence, I&#8217;ll work out the individual clauses on separate lines, and then compose the whole thing into a sentence that hangs together. A self-contained CAT tool would need to offer that.<\/p>\n<p>All in all, I think there&#8217;s a huge amount of untapped potential in CAT tools. Translation is a very narrow market, but the commercial tools out there sell for $350 and up. It&#8217;s also unfortunate that the only ones out there are either Windows-based or Java-based. There&#8217;s an <a href=\"http:\/\/appletrans.blogspot.com\/\">unsupported, primitive, and inscrutable TM tool from Apple<\/a>, and that&#8217;s it on the Mac side. There&#8217;s no reason to think that a TM tool could only succeed on the majority platform: it&#8217;s a specialized enough market that, given a breakthrough product, translators will buy the platform that runs the software, not the other way around. My layman&#8217;s understanding is that the development environment on the Mac would provide a programmer with enough tools to get a decent head-start over a Windows (or multi-platform) alternative.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What I want from a translation tool<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[23,25],"tags":[104,122,436,442],"class_list":["post-2038","post","type-post","status-publish","format-standard","hentry","category-technology","category-translation","tag-cat","tag-computer-assisted-translation","tag-tm","tag-translation-memory"],"_links":{"self":[{"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/posts\/2038","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/comments?post=2038"}],"version-history":[{"count":0,"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/posts\/2038\/revisions"}],"wp:attachment":[{"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/media?parent=2038"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/categories?post=2038"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/8stars.org\/a\/wp-json\/wp\/v2\/tags?post=2038"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}