DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

Abstract

In this paper a knowledge base concept driven named entity recognition (NER) approach is presented. The technique is used for information extraction from news articles and linking it with background concepts in knowledge base. The work specifically focuses on extracting entity mentions from unstructured articles. The extraction of entity mentions from articles is based on the existing concepts from DBPedia ontology, representing the knowledge associated with the concepts present in Wikipedia knowledge base. A collection of the Wikipedia concepts through structured DBpedia ontology has been extracted and developed. For processing of unstructured text, Dawn news articles have been scrapped, preprocessed and thereby a corpus has been built. The proposed knowledge base driven system shows that given an article, the system identifies the entity mentions in the text article and how they can automatically be linked with the concepts to the corresponding entity mentions representing their respective pages on Wikipedia. The system is evaluated on three test collections of news articles on politics, sports and entertainment domains. The experimental results in respect of entity mentions are reported. The results are presented as precision, recall and f-measure, where the precision of extraction of relevant entity mentions identified yields the best results with a little variation in percent recall and f-measures. Additionally, facts associated with the extracted entity mentions both in form of sentences and Resource Description Framework (RDF) triples are presented so as to enhance the user’s understanding of the related facts presented in the article.

Authors and Affiliations

Adeel Ahmed, Syed Saif ur Rahman

Keywords

Related Articles

Classification based on Clustering Model for Predicting Main Outcomes of Breast Cancer using Hyper-Parameters Optimization

Breast cancer is a deadly disease in women. Predicting the breast cancer outcomes is very useful in determining the efficient treatment plan for the new breast cancer patients. Predicting the breast cancer outcomes (also...

An Improved Homomorphic Encryption for Secure Cloud Data Storage

Cloud computing is the budding paradigm nowadays in the world of computer. It provides a variety of services for the users through the Internet and is highly cost-efficient and flexible. Data storage in the cloud is show...

Software Design Principles to Enhance SDN Architecture

SDN as a network architecture emerged on top of existing technologies and knowledge. Through defining the controller as a software program, SDN made a strong connection between networking and software engineering. Tradit...

Graphic User Interface Design Principles for Designing Augmented Reality Applications

The reality is a combination of perception, reconstruction, and interaction. Augmented Reality is the advancement that layer over consistent everyday existence which includes content based interface, voice-based interfac...

Stemmer Impact on Quranic Mobile Information Retrieval Performance

Stemming algorithms are employed in information retrieval (IR) to reduce verity variants of the same word with several endings to a standard stem. Stemmers can also help IR systems by unifying vocabulary, reducing term v...

Download PDF file
  • EP ID EP261397
  • DOI 10.14569/IJACSA.2017.080954
  • Views 119
  • Downloads 0

How To Cite

Adeel Ahmed, Syed Saif ur Rahman (2017). DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text. International Journal of Advanced Computer Science & Applications, 8(9), 411-418. https://europub.co.uk./articles/-A-261397