Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques
Cognates are words in diﬀerent languages that have similar spelling and meaning. They can help a second-language learner on the tasks of vocabulary expansion and reading comprehension. The learner needs to pay attention to pairs of words that appear similar but are in fact false friends, have diﬀerent meanings. Partial cognates are pairs of words in two languages that have the same meaning in some but not all contexts. Detecting the actual meaning of a partial cognate in context can be useful for Machine Translation tools and for Computer-Assisted Language Learning tools. In this article we present a method to automatically classify a pair of words as cognates or false friends. We use several measures of orthographic similarity as features for classiﬁcation. We study the impact of selecting diﬀerent features, averaging them, and combining them through machine learning techniques. We also present a supervised and a semi-supervised method to disambiguate partial cognates between two languages. The methods applied for the partial cognate disambiguation task use only automatically-labeled data therefore they can be applied to other pairs of languages as well. We also show that our methods perform well when using corpora from diﬀerent domains. We applied all our methods to French and English.
This work is licensed under a Creative Commons Attribution 3.0 License.
To make sure that you can receive messages from us, please add the 'macrothink.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.
Copyright © Macrothink Institute ISSN 1948-5425
'Macrothink Institute' is a trademark of Macrothink Institute, Inc.