ew Machine Learning Crawling Algorithm For Web Forums
Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2015, Vol 3, Issue 8
Abstract
In this paper, we present FoCUS (Forum Crawler Under Supervision), a supervised web-scale forum crawler. The goal of FoCUS is to only trawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Although forums have different layouts or styles and are powered by different forum software packages, they always have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. Based on this observation, we reduce the web forum crawling problem to a URL type recognition problem and show how to learn accurate and effective regular expression patterns of implicit navigation paths from an automatically created training set using aggregated results from weak page type classifiers. Robust page type classifiers can be trained from as few as 5 annotated forums and applied to a large set of unseen forums. Our test results show that FoCUS achieved over 98% effectiveness and 97% coverage on a large set of test forums powered by over 150 different forum software packages.
Authors and Affiliations
M. Arjun, B. Bharath Kumar
A Survey on Underwater Positioning System Based on GPS and Signals
Underwater acoustic positioning systems are commonly used in a wide variety of underwater work, including oil and gas exploration, ocean sciences, salvage operations, marine archaeology, law enforcement and military act...
Thermo-mechanical Analysis of Functionally Graded Rotating Disc
The study investigates the thermo-mechanical response for a variable thickness rotating disc made of functionally graded materials (FGM). The thickness of FGM disc is varying non-linearly along the radius. The distribut...
Prediction of Corrosion Rates in Structural Steel Using Artificial Neural Networks
A phenomenal outcome for the prediction of corrosion in steel was proposed with the learning ability of artificial neural network using MATLAB software. The prediction of corrosion rate has become an important challenge...
Vehicle Automation using Voice Recognizer
our project is about vehicle automation, as we can see the world is been digitized or automated and our vehicles should also be automated and henceforth we are working on an project on voice recognition, so that the veh...
Next Generation Mobile Network in India and Impact on the Indian Entrepreneurship EcoSystem
The year 2016 was perhaps a watershed in the history of the telecommunications industry in India with the launch of the Reliance Jio. Reported to be the largest ever investment for a single brand in the history of the s...