The web giant's translation service might serve up the odd batch of nonsense, but it's still one of the smartest communication tools of all time, as David Bellos explains
Find the right words: human translators behave in a similar way to Google Translate
Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn't an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.
In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.
The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.
Much of the time, it works. It's quite stunning. And it is largely responsible for the new mood of optimism about the prospects for "fully automated high-quality machine translation".
Google Translate could not work without a very large pre-existing corpus of translations. It is built upon the millions of hours of labour of human translators who produced the texts that GT scours.
Read more at The Independent