Case Study: Automatic Identification of Romanian Suffixes

Journal Title: Revista Romana de Interactiune Om-Calculator - Year 2011, Vol 4, Issue 2

Abstract

Assuming the perspective of automatically identifying derived words and their bases in the Romanian wordnet, with the aim of enriching it with derivational relations and semantic labels associated to them, in this article we present the results of a case study whose aim was to automatically identify suffixes by means of which new words are created in Romanian. In the beginning of the paper we make a brief overview of derivation in Romanian, also anticipating the challenges of our study. For the automatic identification of the suffixes in words we used generalized suffix trees for representing the lemmas in an electronic Romanian lexicon. We imposed a series of filters, identified via observation, by means of which we improved the results. For the evaluation of the identified suffixes we used gold standard lists of suffixes grouped according to the part of speech of the derived words. The evaluation of results was performed in three stages, by comparing them with the gold standard list of suffixes, with the gold standard list of suffixes and suffixoides, and with the unified gold standard list of suffixes and suffixoides of the parts of speech displaying homonymy in Romanian. We showed how the precision and accuracy of the algorithm change with the threshold imposed on the productivity of suffixes. The present study reveals the necessary knowledge for recognizing the morphological structure of words, the difficulties encountered in this process, and also the importance of this study for linguistic research, for the Artificial Intelligence domain, in tasks such as information retrieval, summarization, question answering and all the other tasks relying on natural language processing.

Authors and Affiliations

Verginica Barbu Mititelu

Keywords

Related Articles

An Ontology-Based Approach for an E-Learning System Related to Human Resources Management into a University Hospital 

The proposed e-learning system, presented in this paper, meets the training requirements of managers in health, activity which is one of the strategies to improve human resource management (HRM) in order to cope with the...

Mood and Sentiment Assessment Using Latent Semantic Analysis

The analysis of written communication can reveal subtle information, such as speaker’s emotional state, attitude and intentions. However, these cannot always be extracted accurately, at a level comparable to humans’ abil...

System for Defect Detection in Fabrics

In the textile industry, for quality assurance, defect detection is a vital step. This paper presents a system for real time defect detection in fabrics which requires efficient and powerful algorithms. Based on the char...

Extracting human features to enhance the user experience on a training station for manual operations

This paper presents an image analysis approach for extracting human features. These features are used in a physical training station for manual operations to support training environment adaptation based on predefined us...

Aggregating textual and video data from movies

In this paper, we present an automatically annotated corpus2 based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression...

Download PDF file
  • EP ID EP150501
  • DOI -
  • Views 123
  • Downloads 0

How To Cite

Verginica Barbu Mititelu (2011). Case Study: Automatic Identification of Romanian Suffixes. Revista Romana de Interactiune Om-Calculator, 4(2), 109-130. https://europub.co.uk./articles/-A-150501