Content Evocation Using Web Scraping and Semantic Illustration

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 3

Abstract

  Abstract: Web scraping is the process of automatically collecting information from the World Wide Web. It is a field with active developments, sharing a common goal with the semantic web vision, an ambitious initiative that still requires breakthroughs in text processing, artificial intelligence and human-computer interactions. It means extraction of content from different web pages using web scrapping and semantic illustration. Web Scrapping is a process of evocation of content from HTML pages and related to web indexing. A commonly used measure for tree similarity is the tree edit distance which easily can be extended to be a measure of how well a pattern can be matched in a tree. An obstacle for this approach is its time complexity, so we consider if faster algorithms for constrained tree edit distances are usable for web scraping, and to reduce the size of the tree representing the web page. Different applications of web scraping are used by current market to achieve best web scraping output, Like Web Data Extraction, Data Collection, Screen Scraping. Many different algorithms are used for web scraping like “tree pattern matching”, “tree mapping”, “approximate tree matching”. But in general “tree edit distance” algorithm is used. But with this algorithm many issues of incorrectness of data, low efficiency and higher time complexity have analyzed. In this research I am focus to improve the performance of tree edit distance problem. And I am also trying to focus on higher bound time complexity of this algorithm.

Authors and Affiliations

Vasani Krunal A

Keywords

Related Articles

 Big Data: The Future of Data Storage

 Abstract: According to Internet World statistics, todayInternet has 1.7 Billion users, compared with the population of 6.7 billion people.Around 40% of the world population is connected via internet across the gl...

 Mobile Phone Embedded With Medical and Security Applications

 Abstract: This paper introduces MOBILE PHONE EMBEDDED WITH ANDROID BASED EMERGENCY ALERT BUTTON AND MEDICAL TOOLS - a mobile phone can serve us with various functions of security applications and medical tools. T...

 VLSI Implementation of High Speed & Low Power Multiplier in FPGA

 We known that different multipliers consume most of the power in DSP computations, FIR filters. Hence, it is very important factor for modern DSP systems to built low-power multipliers to minimize the power &nbsp...

Fault Discovery Probability Analysis for Software Reliability Estimation

Abstract: Software reliability approximation and testing gauge how efficiently software works and meet up the end-user necessities. Software reliability assurance that users can enter the correct information on a day-to-...

 Intent Search and Centralized Sybil Defence Mechanism for Social Network

Abstract: Sybil attacks are the major problems occurred in the distributed systems without trusted identities. It occur when the one-to-one relationship between a node and its identity is violated. This is occurred by an...

Download PDF file
  • EP ID EP116095
  • DOI 10.9790/0661-16395460
  • Views 103
  • Downloads 0

How To Cite

Vasani Krunal A (2014).  Content Evocation Using Web Scraping and Semantic Illustration. IOSR Journals (IOSR Journal of Computer Engineering), 16(3), 54-60. https://europub.co.uk./articles/-A-116095