The risk of data loss from a Website can come from multiple avenues. There could be an outright data breach where an attacker steals content directly from a database, or an automated bot could scrape the site, stealing data that is out in the open. The challenge of dealing with automated Web-content scraping is one that startup ScrapeDefender is aiming to tackle.
"Websites have all sorts of different types of content that is available, free to the public, and the creators of those sites intend for that content to be consumed by people to use," Robert Kane, CEO of ScrapeDefender, told eWEEK. "What has happened is there is now a whole industry of scraping with bots that harvest mass amounts of data from sites."
Those data-harvesting scraping bots can potentially be grabbing pricing information from retail or travel sites, for example, and then repurposing the data in ways in which the original content creator did not intend. ScrapeDefender is now launching its cloud-based anti-scraping and real-time monitoring service with the goal of tracking and limiting the risk of scraping.
Kane is no stranger to the world of security. In 1992, he founded a company called Intrusion Detection, which he sold to RSA in 1998. He also has experience in the financial services market and is the founder of Bondview, which is a municipal bond information site. It was his experiences at Bondview that helped to identify the need for an anti-scraping service.
"We discovered at Bondview that we were being scraped," Kane said. "There were no tools to help us, and that was the spark for ScrapeDefender."
How It Works
"We receive a copy of the Website activity and analyze the activity on our servers and pass it through a whole bunch of metrics," Kane said.
Those metrics include 25 different parameters that are used to help determine if the traffic is legitimate or if it is an indicator of scraping activity. The parameters include looking to see if there are things like excessive page views from a single address or direct visits to a page that should normally only be found via a referring click or page.