Hidden Web Data Extraction Using Dynamic Rule Generation
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 8
Abstract
World Wide Web is a global information medium of interlinked hypertext documents accessed via computers connected to the internet. Most of the users rely on traditional search engines to search the information on the web. These search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called Hidden Web which is hidden to present-day search engines. It lies behind search forms and this part of the web containing an almost endless amount of sources providing high quality information stored in specialized databases can be found in the depths of the WWW. A large amount of this Hidden web is structured i.e Hidden websites contain the information in the form of lists and tables. However visiting dozens of these sites and analyzing the results is very much time consuming task for user. Hence, it is desirable to build a prototype which will minimize user’s effort and give him high quality information in integrated form. This paper proposes a novel method that extracts the data records from the lists and tables of various hidden web sites of same domain using dynamic rule generation and forms a repository which is used for later searching. By searching the data from this repository, user will find the desired data at one place. It reduces the user’s effort to look at various result pages of different hidden websites.
Authors and Affiliations
Anuradha , A. K Sharma
e-Governance Applications for citizens - Issues and Framework
To bridge the gap between government and citizens, to provide effective and efficient services, to increase productivity and to extend other benefits to its citizens, the governments of various countries introduced e-Gov...
Common Framework For Unix Scripting Languages
With the thousands of commands available for the command line user to write own application based on some complex shell script or other script. The complexity implies more difficulties to make an efficient monitoring, ma...
Design and Implementation of a Three Dimensional CNC Machine
This paper discusses the design and implementation of low cost three dimensional computerized numerical control (CNC) machines for Industrial application. The primary function of this microcontroller based CNC machine is...
AN ADIABATIC APPROACH FOR LOW POWER FULL ADDER DESIGN
Over the past decade, several adiabatic logic styles have been reported. This paper deals with the design of a 1-bit full adder using several adiabatic logic styles, which are derived from static CMOS logic, without a la...
Recognition of Isolated Handwritten Kannada Numerals based on Decision Fusion Approach
combining classifiers appears as a natural step forward when a critical mass of knowledge of single classifier models has been accumulated. Although there are many unanswered questions about matching classifiers to real-...