We often hear clients ask about how to effectively assemble information from the Internet and, simultaneously, how to separate the noise from the meaningful data. Although there’s no substitute for experience and wisdom, a large quantity of data analyzed properly will often do in a pinch, as long as you can get the data lawfully and with respect for privacy rights. But how do you do that? It’s not exactly efficient to run masses of searches and enter the results by hand.

For some companies, the answer is to use “scraping” tools to pull publicly available data. Scraping is a method of employing bots to access websites to scan for certain kinds of information to pull into a central repository for your own use. It is, in fact, a key method for search engines: Google, for instance, deploys its legion of bots to “read” data on your website and then index the results for analysis and, potentially, access by third parties using Google Search. Scraping is how many businesses scan for data and, indeed, how many businesses operate altogether, which is why there are literally dozens of scraping tools available as an integrated SaaS tool.
Sounds very “internet-y,” right? Something’s on the Web, you send a bot to find it, then you use it for your own purposes. Such is the thinking of many companies that use scraping to collect feed/training data for their services, including San Francisco-based hiQ Labs, which offers data science-driven insights about employees and industries gleaned from scraping profile data from LinkedIn, Microsoft’s social media site/endless notification email generator. Essentially, hiQ pulls all the freely-accessible information from public LinkedIn profiles, analyzes the data for trends, patterns, and anomalies, and then sells its analysis to customers who want to know more about their employees (everything from “what skills should my team develop” to “is Tim in accounting getting ready to quit?”)

Of course, in order to provide those insights, hiQ needs access to those public profiles on LinkedIn, and that’s where the legal difficulties began a few years ago. After years of happily collaborating, LinkedIn informed hiQ that scraping public profiles was a violation of the site’s Terms and Conditions and issued a cease and desist letter. In effect, LinkedIn’s position was that, although the information on its site was available for free to anyone who visited the website, hiQ could not simply appropriate all of that data to run its own business. Oh, also: LinkedIn had just created a product that would directly compete with hiQ, so, there’s that.

hiQ’s entire business model depends on access to that data, and so they not only refused to comply with the cease and desist letter, but immediately sought an injunction against LinkedIn’s threatened lawsuit. Injunctions force a party to take (or not to take) an action when monetary damages won’t suffice, and hiQ’s request was essentially that, unless LinkedIn withdrew its cease and desist letter, hiQ’s entire business would be destroyed, something that money really can’t fix. The federal court agreed, and ordered LinkedIn to back down, which it promptly refused to do by bringing an appeal to the Ninth Circuit Court of Appeals.
The appeal raised two critical questions for business operations online: 1) is scraping lawful, and 2) can a business deny third parties the use of hosted data that the business itself did not create? The implications for those questions are larger than you might initially think. LinkedIn contended that hiQ had violated the Computer Fraud and Abuse Act (“CFAA”), an important part of the Digital Millennium Copyright Act. Effectively, LinkedIn was saying that, by scraping data from the website, hiQ was engaged in a quasi-criminal act, misappropriating property. (That’s also why LinkedIn raised a common law trespass claim, which is the sort of thing that lawyers will understand and every other normal human will find extremely strange).

hiQ responded by noting that the publicly-available data wasn’t LinkedIn’s at all, but that it instead belonged to users who create public profiles. Scraping the data off of a website that does not erect any boundaries to access or use cannot, by definition, be a violation of CFAA because no fraudulent or unlawful access to data took place. To that end, hiQ sought its own relief, claiming that LinkedIn was knowingly interfering in hiQ’s ability to conduct its affairs and carry out its contracts with third parties like Capital One and GoDaddy. A number of high profile advocacy groups filed amicus briefs in support of hiQ’s arguments, including the Electronic Frontier Foundation (EFF) and the Electronic Privacy Information Center (EPIC), as well as companies like Duck Duck Go.
Ultimately, the Ninth Circuit agreed with hiQ. It determined that hiQ had met its burden of establishing a facial claim against LinkedIn for intentional interference with contracts. hiQ, the Court reasoned scraped data that was merely presented, but not owned, by LinkedIn. “LinkedIn has only a non-exclusive license to the data shared on its platform, not an ownership interest. Its core business model—providing a platform to share professional information—does not require prohibiting hiQ’s use of that information, as evidenced by the fact that hiQ used LinkedIn data for some time before LinkedIn sent its cease-and-desist letter.” LinkedIn’s actions would cause hiQ to go out of business based on a flimsy assertion of a right to control data and so, on the record before it, the Court of Appeals concluded that hiQ was entitled to its injunction.
Next, and perhaps more importantly, the Court concluded that LinkedIn’s assertion of a claim under CFAA was misplaced. The Act makes it a crime to “knowingly access a computer without authorization or exceeding authorized access,” which, on its face, could apply to hiQ’s conduct — after all, LinkedIn issues a cease and desist letter, formally stating that hiQ lacked authorization to scrape the data on the site. Not so, the Court concluded. “CFAA is best understood as an anti-intrusion statute and not as a ‘misappropriation statute,’ [and we reject] the contract-based interpretation of the CFAA’s ‘without authorization’ provision adopted by some of our sister circuits.” For the Ninth Circuit, then, the CFAA is all about preventing unlawful access to a computer or system as a matter of forcible entry, rather than contractual preclusion.

What does this mean? Two important points. First, scraping seems safe, at least in the Ninth Circuit and at least for now. While it would be a mistake to say that all scraping is permissible (for instance, scraping proprietary data is still not ok), LinkedIn v. hiQ certainly takes some of the pressure off. Second, we should get ready for an appeal to the Supreme Court. When the Ninth Circuit panel said that they reject the “without authorization provision adopted by some of our sister circuits,” they meant that they disagree with rulings from the First and Eleventh Circuits. The only way to resolve those differences is by going to the Supreme Court. And, given the importance of the issue and the need for national uniformity in applying the CFAA — you can’t have scraping legal in Fresno but illegal in Fort Lauderdale — we anticipate that the Supreme Court will take up the issue. In other words, the hiQ saga isn’t over quite yet.