Friday, 17 May 2013

Web Database Scraping

Web Data scraping can be defined as a data transfer computer software technique between programs using human- readable data structures. This technique simulates the human exploration of the World Wide Web by implementing low-level Hypertext Transfer Protocol. The technique can be considered suitable for end users. Web data scraping is also known as web data harvesting or web data extraction.
Data scraping usually ignores multimedia and images that are binary. When there is no other convenient API, it can be used to interface a third-party system. It can also be used as an interface to a legacy system, where there is no other compatible hardware available.
Marked up languages are used to build web pages. These languages include XHTML and HTML. They contain rich text information that has a low level Hypertext Transfer Protocol. Most web pages are programmed to be human readable and not for automation. A tool used for data scrapping can be termed as a data scraper.
Web data scraping uses include research, web integration, price monitoring, weather forecasting, website alter detection and web mashup. This technique may be against some of the conditions of some websites’ use. This can be viewed when it favors practical solutions based on existing techniques. It works through the provision of different kinds of automation that include:

    Human copy and paste: For websites with barriers still information can be copied and pasted for further examination.
    HTTP programming: This can be done by posting HTTP requests to a server.
    Data mining: This program detects templates containing the same data.
    Web-scraping software: This is used to extract and change form of content.
    Vertical aggregation
    Recognizing Semantic annotation
    HTML parsers
    DOM parsing: Programs here are enabled to extract required parts of pages.
    Text grepping: This method of extraction of information is based on the UNIX grep command.

Data scraping is most definitely considered as the last mechanism to use when other systems can not deliver.

Source: http://thewebscraping.com/web-database-scraping/

No comments:

Post a Comment