Steve Arnold pointed out to me that "Languages that are synthetic do not fare well in automatic systems unless the source documents are highly technical."

Which makes sense, once you understand what a synthetic language is. A synthetic language combines bits into really long words. For example, in Mohawk: Washakotya'tawitsherahetkvhta'se = "He ruined her dress" (strictly, "He made the thing that one puts on one's body ugly for her"). One word is used for something that other languages need a multiple words or a whole sentence for. You can see how that can mess with automated systems.

Languages are not synthetic or isolating, they fall into a spectrum: some languages are just more synthetic than others. Examples of common synthetic languages are German, Russian, Turkish, Finnish, Japanese, Korean, and many more.

# Jan 17, 2005