Enhanced Approach on Web Page Classification Using Machine Learning Technique  

Abstract

The data set contains WWW-pages collected from computer science departments of various universities in January 1997 by the World Wide Knowledge Base project of the CMU text learning group. The 8,282 pages were manually classified into 7 classes: 1) student, 2) faculty, 3) staff, 4) department, 5) course, 6) project and 7) other. For each class the data set contains pages from the four universities: Cornell, Texas, Washington, Wisconsin and 4,120 miscellaneous pages from other universities. The files are organized into a directory structure, one directory for each class. Each of these seven directories contains 5 subdirectories, one for each of the 4 universities and one for the miscellaneous pages. These directories in turn contain the Web-pages. The proposed work performs the data preprocessing to clean the dataset and transform it in to the pattern for classification. Then the feature extraction is performed for extracting only minimum number of representative features or terms extracted from it without using the entire Web page. After that the classification algorithm is used to classify the dataset into one of the seven classed using FP-Growth algorithm. The proposed approach is compared with the existing system apriori algorithm.  

Authors and Affiliations

S. Gowri Shanthi , Dr. Antony Selvadoss Thanamani,

Keywords

Related Articles

Design of Area Efficient High Speed Parallel Multiplier Using Low Power Technique on 0.18um technology

Based on the simplification of the addition operations in a low-power bypassing-based multiplier, a low-cost low-power bypassing-based multiplier is proposed. Row-bypassing multiplier, column-bypassing multiplier and bru...

Efficient and Reliable Resource Management Framework for Public Cloud Computing

The problem of dynamic resource management for a large-scale cloud environment is mitigated with optimized high throughput performance. The resource management framework consists of, Gossip protocol that ensures fair res...

Monitoring And Controlling The Gas Plant By Wireless Sensor Network Using Co-operative Communication  

This paper deals with zigbee based wireless sensor network platform for monitoring and controlling the gas plant using Co-operative communication .it consist of co-ordinator & sensor node .Co-ordinator is Centralized...

NETWORK MONITORING, MANAGEMENT AND ENHANCEMENT USING VPN  

In previous years, A Method fail for increasing opportunities created by devoted in Network monitoring and management. To overcome this type of situation we introduce some Tools and macros to improve the performa...

Implementation of Radix-4 Multiplier with a Parallel MAC unit using MBE Algorithm 

t— A radix-4/-8 multiplier is implemented using modified booth multiplier encoder that demand high speed and low energy operation. Depending on the input pattern, the multiplier operates in the radix-8 mode in 56%...

Download PDF file
  • EP ID EP98924
  • DOI -
  • Views 118
  • Downloads 0

How To Cite

S. Gowri Shanthi, Dr. Antony Selvadoss Thanamani, (2012). Enhanced Approach on Web Page Classification Using Machine Learning Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(7), 278-282. https://europub.co.uk./articles/-A-98924