This is fascinating. Italian researchers have found a way to identify the source language of a text just based on how that text has been treated by a compression algorithm. It gets better:
The scientists performed a further test of their technique by analyzing a single text that has been translated into many different languages — in this case the Universal Declaration of Human Rights. The researchers used their method to measure the linguistic “distance” between more than 50 translations of this document. From these distances, they constructed a family tree of languages that is virtually identical to the one constructed by linguists.