Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 11, Issue 8

Abstract

In this paper, two extractive techniques are applied to handle Arabic Single Document Text summarization problem (SDS); the first uses a K-Means clustering approach and the other uses mutual information (MI) which is broadly used to measure the co-occurrence between two words in text mining. A successful Arabic document summarization algorithm should identify noteworthy sentences in the documents as accurately as possible. The terms used in the document (the distinct words) represent the document's identity, and instead of Bag of Words (BoW); a Term-Sentence Matrix (TSM) is utilized. In the first approach, the text themes are extracted using K-Means then one sentence per Cluster is chosen to be part of the summary using TFIDF weights. In the other approach, the pointwise mutual information (PMI) is used to assign weights for each cell in the TSM. The matrix generated from this TSM, is used to extract a summary of the document. experimentations prove that the cluster-based methodology performs slightly better than the first one, but if the end user could tweak the summary percentage to appropriate level then, the PMI-based approach will be slightly better.

Authors and Affiliations

Madeeh Nayer El-Gedawy

Keywords

Related Articles

A Study on Power Saving and Secure WSN

wireless sensor network has a good future in many daily usage of a society system. WSN application is countless. Main benefit of this application is that we can implement in most of daily usage work. Security and power a...

 Low Power Design and Simulation of 7T SRAM Cell using various Circuit Techniques

 Low power memory is required today most priority with also high stability. The power is most important factor for today technology so the power reduction for one cell is vital role in memory design techniques. In...

Effect Of Location Of Lateral Force Resisting System On Seismic Behaviour Of RC Building

In this study the influence of location of lateral force resisting systems on the response reduction factor (R), ductility and plastic hinge status at performance point of the RC buildings were studded. The present paper...

 STONE WASTE AS A GROUNDBREAKING CONCEPTION FOR THE LOW COST CONCRETE

 Stone wastes are generated as a waste during the process of cutting and polishing. Stone industry produces large amounts of Stone waste which causes environmental problems. To produce low cost concrete by replacing...

Impact of Aquacult ure on Physico-Chemical Characteristics of Water and Soils in the Coastal Tracts of East and West Godavari Districts,Andhra Pradesh, India

The Physico-chemical characteristics of water and soil in aquaculture areas were investigated. Water samples were analysed with respect to PH, Phosphates, Sulphates, Total Alkalinity, Total Hardness, Total Dissolved Soli...

Download PDF file
  • EP ID EP105260
  • DOI -
  • Views 116
  • Downloads 0

How To Cite

Madeeh Nayer El-Gedawy (2014). Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches. INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY, 11(8), 379-383. https://europub.co.uk./articles/-A-105260