Recognizing named entities, quotes and events in news and social media items in Romanian
Journal Title: Romanian Journal of Human - Computer Interaction - Year 2013, Vol 6, Issue 2
Abstract
At the border of natural language processing and information retrieval, named entity recognition has represented one of the most important research problems of the two domains, that has not been solved perfectly yet even for English texts. Furthermore, named entity recognition has opened up the path of solving other problems that use these linguistic contructs, such as the identification of quotes and declarations made by persons, in general, but also by companies or other types of organizations, or the extraction of events from texts. The problem of named entity identification and clasification has appeared from the necessity of being able to report the appearances of names of persons, organizations and other types of named entities relevant for various domains within written documents. In this article we shall present a solution for solving these three aforementioned problems for texts written in Romanian from various sources, like news items, blog articles or comments from social newtorks. The paper starts with a short overview of the theoretical underpinnings used for solving these problems, then we will present the methods actually used for the designed solution for Romanian. It combines machine learning algorithms with heuristics based on text patterns and regular expressions. At the end, we shall highlight the accuracy of the various methods used for solving the tasks, together with a comparison between the results obtained by each method.
Authors and Affiliations
Adrian-Nicolae Zamfirescu , Traian Eugen Rebedea
The Analysis of Imaginary in Texts
The paper presents an approach and an implemented system for analyzing the imaginary in texts. This problem is very important because its resolution allows the identification of connotations in texts, with major implicat...
Distributed Multimedia System for Human Computer Interaction
The aim of the paper is to provide some software components developed for acquisition, controlling and management of multimedia streams, of multimedia devices and for human computer interaction. Implemented software comp...
MDL – Computer Assisted Lesson Development Module
In this article we present a new approach to teaching computer science - the evaluation and visual modeling of algorithms based on metaphorical forms - applied within the core of a virtual education system, the developme...
POS tagger based on second-order HMM
Part-of-speech tagging (POS tagging) is the process of grammatical labelling of each word in a sentence, phrase or paragraph with the corresponding part of speech. This process is a component of other modules of natural...
Medical Assistance Through the Internet for Persons with Mobility Impairments and for Persons Residing in Rural and Medically Under-Served Areas
This paper presents an IT system (MeDist), which offers an operative and user-friendly patient-medical system interface. The system will be a useful instrument for healthcare services for persons in rural areas and for p...