Crawler Using Inverted WAH Bitmap Index and Searching User Defined Document Fields

Abstract

 Crawler is a web crawler aiming to search and retrieve web pages from the World Wide Web, which are related to a specific topic. It based on some specific algorithms to select web pages relevant to some pre-defined set of topic. The main features of Crawler consist of a user interest specification module that mediates between users and search engines to identify target examples and keywords that together specify the topic of their interest, and a URL ordering strategy that combines features of several previous approaches and achieves significant improvement. It also provides a graphic user interface such that users can evaluate and visualize the crawling results that can be used as feedback to reconfigure the crawler. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. The crawler should retrieve the web pages of those URLs, parse the HTML files, add new URLs into its queue. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.

Authors and Affiliations

Mr. Sanjay Kumar Singh

Keywords

Related Articles

Offensive Decoy Technology For Cloud Data Attacks

Cloud Computing enables multiple users to, share common computing resources, and to access and store their personal and business information. These new paradigms have thrown new data security challenges. The majority of...

Malware Propagation Detection in Mobile Cloud Infrastructure with Architectural Change

Now a day’s a lot mobile services are converting to cloud depended mobile services with highly communications and higher flexibility. We explore a unique mobile cloud infrastructure that attaches mobiles and cloud servic...

RIP (Resin Impregnated Paper) Bushing for EHV Class Power Transformer

The overall reliability of a power transformer depends to a great extent on the sound operation of bushing. The OIP (Oil Impregnated Paper) Bushing is conventionally used in power transformer. Now a days a new technology...

Addressing Trust Issues in Cloud Computing

Cloud computing is an evolving paradigm with tremendous momentum, but its unique aspects exacerbate trust issues in cloud computing. Data is the most valuable of clients (or) company’s asset; it must be protected with mu...

A Survey on Sensor Cloud: Architecture and Applications

Cloud Computing is a part of computer science and it enables providing Internet services to external customers via very scalable computing capacities. It is abstracted, controlled and high-scalable computer infrastructur...

Download PDF file
  • EP ID EP109273
  • DOI -
  • Views 85
  • Downloads 0

How To Cite

Mr. Sanjay Kumar Singh (2012).  Crawler Using Inverted WAH Bitmap Index and Searching User Defined Document Fields. International Journal of P2P Network Trends and Technology(IJPTT), 2(3), 56-59. https://europub.co.uk./articles/-A-109273