A Review of Various Techniques of Web Content Mining For HTML and XML Contents

Abstract

World Wide Web is the largest source of information. Most of the data on the web is dynamic and is in unstructured form. It is becoming difficult to get the relevant data from the web. Data Mining is the field of computer science which is used to extract knowledge from very large amount of data. Web mining is the application of data mining, which implements various techniques of data mining to get the efficient knowledge from the web data. This paper presents an overview of various techniques that has been used for web content mining including images, audio, video and semi-structured contents like HTML and XML. Since HTML has many limitations like limited tags, not case sensitive and designed to display data only, Web developers has started to develop Web pages on emerging Web Technologies like XML, Flash etc. XML was designed to describe data and to focus on what the data is. XML also plays the role of a meta- language and allows document authors to create customized markup language for limitless different types of documents, making it a standard data format for online data exchange.

Authors and Affiliations

Rupinder Kaur, Kamaljit Kaur

Keywords

Related Articles

Review Paper on Diagnosis of increased risk of heart failure by using a dynamic risk scores

we are living in the world of technology and there is large number of disorders these days. Heart failure is amongst one of them. Heart failure (HF), often called congestive heart failure (CHF) or congestive cardiac...

Semi-Autonomous Mobile Hexapod Unmanned Ground Robot With Camera based Navigation For Military Reconnaissance And Ethological Research (SAMHUGR)

Conventionally reconnaissance vehicles used in military application have a wheel based design. Such wheel based robots have a major disadvantage of being unadaptive to rough terrain. These robots do not have the capac...

Performance Comparison of Multi-resolution Wavelet Transforms to Hybrid Transforms for Image Compression

This paper proposes multi-resolution based image compression method. Wavelets have ability to analyze the signal at different resolutions to give different levels of details in the image. This property has been used i...

Integrity And Confidentiality for Network Security

We compute the liability metric as a function of the steering and the cryptographic protocols used to secure the complex passage. We formulate the minimum cost node capture attack problem as a nonlinear integer progra...

Intrusion Detection in Web applications Using Double Guard

In web base services having a data transfer from different layer. Web services have separate layer for the data transfer and the process is difficult in the service. In service transferring data is having intrusion f...

Download PDF file
  • EP ID EP27937
  • DOI -
  • Views 283
  • Downloads 0

How To Cite

Rupinder Kaur, Kamaljit Kaur (2014). A Review of Various Techniques of Web Content Mining For HTML and XML Contents. International Journal of Research in Computer and Communication Technology, 3(6), -. https://europub.co.uk./articles/-A-27937