Extraction of Information from Web Page Using Content Mining Approach

Abstract

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. However, the web page as the main source of data consists of many parts which are not equally important. Besides the main contents, a web page also comprises of noisy parts that can degrade the performance of information retrieval applications. Thus cleaning the web pages before mining becomes critical for improving the mining results. In our work, we focuses on identifying and removing local noises in web pages to improve the performance of mining. The information contained in these non-content blocks can distract the user and also harm web mining So, it is important to separate the informative primary content blocks from non-informative blocks. So, we propose a system that remove various noise patterns from any web page. There are two steps, Web Page Segmentation and Informative Content Extraction, are needed to be carried out for Web Informative Content Extraction. We are going to analyze the web page and by using methods and algorithm we extract topic information requested by user.

Authors and Affiliations

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil

Keywords

Related Articles

Electric Power Generation Using Refrigeration Waste Heat

Electricity is most important one of the human life and industries. But the available energy is very less. Today the demand of energy is increasing tremendously, but available energy lacks in supply. This problem is ove...

Implementation of Unsigned Multiplier Using Area-Delay-Power Efficient Adder

Multiplication and addition are most widely and oftenly used arithmetic computations performed in all digital signal processing applications. Multiplication is the basic arithmetic operation which is present in many par...

Introduction to Led Based Solar Farming

the solar energy is traditional and proven source of photosynthesis in farming. But it has some limitations like the source is not available throughout the day and in rainy season. Use of led lights for photosynthesis i...

Op-amp based railway track crack detection System with GSM technique

In the rapidly flourishing country like ours, accidents in the unmanned level crossings are increasing day by day no fruitful steps have been taken so far in these areas. In India, we find that rail transport occupies a...

Implementation of the Non-Conventional Machining Technology “The Cavitation Process

This paper introduces and presents a new method of non-conventional machining by using the cavitation process. This method can be called “Cavitation Machining (CM)”. The proposed methodology on the development of Non-co...

Download PDF file
  • EP ID EP19491
  • DOI -
  • Views 264
  • Downloads 5

How To Cite

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil (2015). Extraction of Information from Web Page Using Content Mining Approach. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(2), -. https://europub.co.uk./articles/-A-19491