April 30, 2002

Language-recognition algorithm

This is fascinating. Italian researchers have found a way to identify the source language of a text just based on how that text has been treated by a compression algorithm. It gets better:

The scientists performed a further test of their technique by analyzing a single text that has been translated into many different languages — in this case the Universal Declaration of Human Rights. The researchers used their method to measure the linguistic “distance” between more than 50 translations of this document. From these distances, they constructed a family tree of languages that is virtually identical to the one constructed by linguists.

Blogging & hate speech

The kind of conundrum only a blogger could face: Blogdex recently removed a hate-speech website from its index. A discussion ensued, in the course of which the person who originally complained about the website in question accidentally outed himself. Now, here’s the thing: I really want to dislike this guy for being such a cringing pansy that he promotes censorship to protect his delicate sensibilities, but the thing is, he’s got a good blog.

Ultimate bad date…?

Now this is what I call a bad date. Not to be cruel, but the clue-phone was ringing for a long time before this woman got around to picking it up. I’m signed up on nerve.com too, and I’ve gone on some bad dates (including a comically bad one this past Wednesday), but I haven’t had any experiences remotely like this one.