Case Study: Automatic Identification of Romanian Suffixes
Journal Title: Romanian Journal of Human - Computer Interaction - Year 2011, Vol 4, Issue 2
Abstract
Assuming the perspective of automatically identifying derived words and their bases in the Romanian wordnet, with the aim of enriching it with derivational relations and semantic labels associated to them, in this article we present the results of a case study whose aim was to automatically identify suffixes by means of which new words are created in Romanian. In the beginning of the paper we make a brief overview of derivation in Romanian, also anticipating the challenges of our study. For the automatic identification of the suffixes in words we used generalized suffix trees for representing the lemmas in an electronic Romanian lexicon. We imposed a series of filters, identified via observation, by means of which we improved the results. For the evaluation of the identified suffixes we used gold standard lists of suffixes grouped according to the part of speech of the derived words. The evaluation of results was performed in three stages, by comparing them with the gold standard list of suffixes, with the gold standard list of suffixes and suffixoides, and with the unified gold standard list of suffixes and suffixoides of the parts of speech displaying homonymy in Romanian. We showed how the precision and accuracy of the algorithm change with the threshold imposed on the productivity of suffixes. The present study reveals the necessary knowledge for recognizing the morphological structure of words, the difficulties encountered in this process, and also the importance of this study for linguistic research, for the Artificial Intelligence domain, in tasks such as information retrieval, summarization, question answering and all the other tasks relying on natural language processing.
Authors and Affiliations
Verginica Barbu Mititelu
Modeling the repurposing of teaching resources in medical domain through social networks and semantic web
The development of teaching materials has currently became one of the main concerns of the specialists from various domains, including medicine. The continuous development of e-Learning applications demands the creation...
Software system for the automatic and computer assisted diagnosis of some severe abdominal affections, based on ultrasound images
In this article we describe specific methods for the characterization and computer assisted diagnosis of some severe abdominal diseases, based on ultrasound images and the corresponding software system. The objective of...
Mood and Sentiment Assessment Using Latent Semantic Analysis
The analysis of written communication can reveal subtle information, such as speaker’s emotional state, attitude and intentions. However, these cannot always be extracted accurately, at a level comparable to humans’ abil...
Creating And Visualizing Lessons In Egle Elearning Platform
Together with the development of the Internet and with the large use of personal computers, the eLearning Platforms have known an accelerated development providing educational services to significantly lower costs than c...
A ventriloquism perspective on Natural Language Processing
Natural language interfaces and, in general, Natural Language Processing (NLP) state of the art applications cannot handle satisfactory (if they can at all) conversational implicatures, illocutionary force, stylization,...