Monday, February 11, 2008

Wikipedia defines screen scraping as a "...computerized parsing of the HTML text in web pages. In all cases, the screen scraper has to be programmed to not only process the text data of interest, but also to recognize and discard unwanted data, images, and display formatting."

The first thing that came to mind during our class discussion of screen scraping was -- what does the web master get in return for someone using his or her data in an unauthorized way?  The harsh answer is nothing.  Screen scraping allows the "scraper" to pull whatever information he or she wants from any particular web site and use that information however he or she pleases.  Furthermore, it allows the scraper to use the information in a way which it was never intended to be used.

That's not so bad, right?  The author posted that information on the internet so it should be fair game to all.  Well, most web sites have more in mind than providing information to all its users.  Sure the main goal of most web sites is to provide the "best", "correct", or "most up-to-date" information available, but the truth is, they're all in it for the money.  All those banner ads, annoying sound ads, click here to meet whoever ads are huge revenue earners for web sites that get a fair amount of traffic.

I own a fair share of web site and have advertising on most of them.  It's in my best interest for the users of my site to visit my advertisers or better yet buy something from them under my referral id.  If someone is screen scraping my information and posting it to a different web site, they're ultimately stealing all my information I either gathered myself or paid someone to gather for me.  The user of that new site could be making money off his or her own ads and I could be losing out on what could of been ad revenue.

To make matters worse, a huge part of driving large amounts of traffic to a web site is premium placement on Google.  One of the only ways to achieve this, is to have "original" content.  For all web sites that copy information from other sites, the almighty Google knows this, and places those web sites in a "supplemental index".  If a web site is placed in that index, there's a good chance that site won't recieve any traffic through search engines.

Ultimately, even though many things on the internet may seem "unfair", I think it just makes everything fair game.  If someone wants to screen scrape my site, I have to design it in a way which they are unable.  It's all in the back and forth battle of making money online.  Game on.

posted on 2/11/2008 4:02:12 PM (Central Standard Time, UTC-06:00)  #    Comments [2] Trackback

Referred by:
http://www.google.com/reader/view/ [Referral]
EAI screen scraping (www.google.com.au) [Referral]
screen scraping sba (www.google.com) [Referral]
screen scrapping in EAI (www.google.co.in) [Referral]
screen scraping EAI (www.google.com) [Referral]