A Review on Identifying the Main Content From Web Pages

Journal Title: UNKNOWN - Year 2015, Vol 4, Issue 4

Abstract

A web page is a web document in which huge amount of information is available and because of rapid growth of World Wide Web there is a great advantage to anyone, the user can easily access the web pages from any place through the internet. In the web page contains noisy information like menus, footers, unnecessary links, logos, etc and the main content. Most of the users are interested in only main content .But the main problem with the extraction process is to greater performance impact on web summarization; question answering system, information retrieval application because of the web page is collection of noisy and main content. So we propose an extraction process for identifying main content from web pages. In the extraction process consist of an automatic extraction techniques and hand crafted rules. In the automatic extraction techniques process the first step is to the web page is segmented into web page block and the second step is to differentiate main content from irrelevant or noisy content. In the hand crafted rule process extracts the main content from web pages by using rules which are already generated.

Authors and Affiliations

Keywords

Related Articles

A Novel Approach of Modified Run Length Encoding Scheme for High Speed Data Communication Application

This paper presents a Modified Run Length Encoding (RLE) Scheme for High Speed Data Compression. Compression is efficient technique to reduce the memory occupancy and to improve the performance of the system from many av...

Development and Organoleptic Evaluation of Jamun Juice

The present study was to formulate the jamun juice by incorporation of different level of jamun puree. The Organoleptic properties of the formulated juice like color, appearance, flavor, viscosity, taste and over all acc...

Automatic Bottle Filling Inspection System Using Image Processing

The enlargement of Indian food industry is an even matter of fact. Especially the beverage industry is at its peak. So looking at the prevailing conditions it is very crucial to continue an error free fast production lin...

Application of Curve Fitting and Surface Fitting Tools for High Leverage Points and Outliers of Wind Speed Prediction

Climate change is generally accepted as the greatest environmental challenge our world is facing today. Together with the need to ensure long-term security of energy supply, it imposes an obligation on all of us to consi...

Green Building Analysis through Energy Modelling

Global warming and climate change caused due to the release of greenhouse gases comprising mainly carbon dioxide has been recognised as the most deadly threats of the 21st century. The buildings in modern cities account...

Download PDF file
  • EP ID EP367551
  • DOI -
  • Views 145
  • Downloads 0

How To Cite

(2015). A Review on Identifying the Main Content From Web Pages. UNKNOWN, 4(4), -. https://europub.co.uk./articles/-A-367551