News & Analysis

Legal Battles Erupt Over Hyperlocal Data Mining

March 4, 2013March 4, 2013 by Brian Dengler

Share this:

Hyperlocal social directories and listings sites like Yelp and Craig’s List can be rich resources of user-generated local information — but that data may not always be free for the taking. Frustrated by data scraping by competitors, both companies have recently thrown down the legal gauntlet in an effort to lay exclusive claim to their data.

A fierce battle between Craig’s List and start-up 3Taps is heading for a hearing on April 12, 2013, and the decision could be a landmark for determining who owns user-generated content and other types of local directory data. In its lawsuit, Craig’s List has accused 3Taps of “unlawfully and unabashedly mass-harvesting and redistributing postings entrusted by Craig’s List users to their local Craig’s List sites.” The suit adds that 3Taps “boasts” that it “mass-copies tens of millions of postings” and then makes the data available for profit to other customers through an API. Craig’s List claims 3Taps violated its copyrights, breached its terms of service, made unauthorized access to its computers, and is competing unfairly.

In seeking a dismissal of the complaint, 3Taps has asserted that Craigslist does not have a legitimate copyright claim because the postings are owned by the users and that raw facts, available on the Web through searches, are not protected by copyright law. Meanwhile, in a bold move, 3Taps lobbed an antitrust counterclaim against Craig’s List, saying the site has monopolized the online classified business in markets for on-boarding ads, indexing them, and the search market for index. 3Taps adds that Craig’s List uses its terms of service and also uses bogus copyright claims to restrict competition.

On 3Taps’ home page, CEO Greg Kidd declares “The basis of our antitrust counterclaim and defense against Craig’s List’s baseless lawsuit is simple: public facts are public property — open and equally available to businesses and consumers.”

Meanwhile, last year Yelp sued 80legs, a web crawler that automatically visits Yelp and gathers reviews, and then resells the content to third parties for reposting. In court filings, Yelp alleged that 80legs was selling content scraped and copied from the Yelp site as “Yelp Crawl Packages.” Yelp claimed that 80legs violated Yelp’s terms of use, which prohibit users from crawling or scraping its site. Yelp also alleged that 80legs violated Yelp’s copyright for taking content from Yelp’s site, violated state and federal computer fraud and abuse acts by gaining unauthorized access to Yelp’s computer systems, violated Yelp’s trademarks, and engaged in unfair competition.

Yelp and Datafiniti, the operator of 80legs, settled their dispute, and the court dismissed the case on January 14, 2013. The parties kept the nature of the settlement confidential.

Sabira Arefin, CEO of LocalBlox, suggests hyperlocal directories must take control on how they want to control their data. LocalBlox connects more than 112,000 neighborhoods in the United States and has 23 million comprehensive business profiles. “It is up to the individual sites and system to determine the terms and conditions and then enforce any security mechanism in place if they want to prevent scraping,” said Arefin.

The issue of data mining is contentious and risky. On the one hand, pure facts, such as the address and telephone number of a business posted in a public area on the Internet, are not entitled to legal protection. On the other hand, the way such facts are expressed and organized may deserve copyright protection. For example, the expression used to create a company profile, the compilation of recommendations, and the way all of this information is bundled up to the user may be subject to copyright protection. Therefore, the wholesale scraping and copying of this collective work could result in infringement.

Data miners also must consider the risk of violating a site’s terms of use and running afoul of computer fraud and abuse acts if they engage in wholesale scraping. Sites like Yelp expressly prohibit users to scrape its site and repurpose its content. Such mass harvesting would breach the terms and give the site a breach of contract claim.

Such mass harvesting also exposes a data miner to claims that it violated state and federal computer abuse acts. Hitting servers to scrape a site exceeds the authority that a user typically would have to access a site. The unauthorized attempt to engage in mass harvesting of data imposes burdens on a site’s computer systems and places such data miner in the same bad light as a hacker.

Hyperlocal sites desiring to protect their databases should consider the following steps:

Call out a set of terms alerting the user that you prohibit screen scraping and database mining.
Register your database with the copyright office.
Consider mechanisms to detect unusual spikes in traffic and capture any IP addresses that are related to any spike in traffic.
Consider blocking IP addresses of abusive visitors, although this may create problems with other sites that regularly drive traffic to your sites.
Consider establishing robot exclusion protocols to tell robots not to visit certain portions of your sites.
Consider using a CAPTCHA test for premium areas of your service.
Create a strong login process and limit the number of logins by each visitor.

This article does not give legal advice. Hyperlocal publishers should check with counsel regarding strategies for data use and protection.

Brian Dengler is an attorney and journalist who covers legal and business issues in media and information technology law. He is a former vice president of AOL, a former newspaperman, and an Emmy-winning TV journalist. He teaches media management, media law, and journalism ethics at Kent State University. He recently was honored by being included among the Best Lawyers in America for 2013 for information technology.