Data is Not the “New X.” Data is You.

If we’ve had this happen once, it’s happened several dozen times.  We’ll be discussing our book, or our practice, or how we approach privacy issues, and someone will say something like “Yes, well, you know, data is the new x.”  And x can be anything.  Oil, money, gold, papayas, whatever. The analogy is, essentially, that because data is frequently mentioned in the news (or on blogs), and data is the source of a great deal of money in business transactions, data must be a commodity and, like other commodities, is roughly interchangeable within a certain context.  This idea is not only misguided, it leads to an unhealthy approach to data collection and a lowest-common denominator mindset about data privacy.

Related image
Your “privacy laws” are harshing my buzz.

Snowclones are the New Tropes

Worn phrases like “X is the new Y” have a very specific term: snowclones.  The term derives from a well-worn example: “if the Inuit have 100 words for snow, X must have 200 words for Z.”  Of course, the Inuit don’t have 100 words for snow, the entire premise of the phrase is reductive, an intellectual shortcut.  Snowclones are linguistic powertools, meme-like phrases that allow you to replace some terms to communicate an idea while wrapping it in a sense of familiarity.  That’s why when we hear “x is the new y,” we feel that the phrase itself is timeworn, and that it conveys a sense of weight to whatever banal statement we hear.  In reality, “x is the new y” is less than sixty years old, despite its seeming ubiquity.  That’s part of the problem.  We assume that prevalence somehow equates either to inevitability or pedigree, and we accept the thoughts conveyed without a critical eye.

That’s a serious mistake, largely because snowclones are, almost by definition, vehicles for repurposing one trite, cliche thought for another.  At a certain level of abstraction and generality, that might even make sense.  If you’re saying that “X is the new Y” because Y made money in the past, and you believe X will make money now then, yes, there is at least some relationship between the two ideas.  Jim Cramer, responding to an increase in futures prices, once said that “corn was the new oil,” presumably while smashing his keyboard with an ear of buttergold.

Image result for jim cramer gif
Jim in a contemplative mood

Statements at that level of generality rarely move beyond the status of truism, if even that.  And, typically, the first utterance of the phrase aims at something slightly different than its repetitive successors.  Consider “data is the new oil,” for instance.  Clive Humby, widely credited as the person who coined the snowclone in 2006, was trying to convey a more complex idea — namely, that data is like oil because its value comes only after refinement.  Just as oil is only valuable when refined into a usable product, raw data has more potential value than actual value.  That statement, at least in 2006, had merit and a sufficient degree of intellectual heft that it was worth considering, in context.  But now, the endless repetitions of the phrase have stripped it of any meaning.

Wait, Isn’t It Like Oil?

None of this is to say that data lacks value, or that it can’t be refined, or that analogies in general are useless — far from it.  There are plenty of helpful ways to analogize how data works in an industry, but it is, ultimately, sui generis, something that can only be understood on its own terms.  So, while it has characteristics that are similar to oil, data is also like electricity, water, your labor force, your office building, etc, etc.

Let us explain.  Oil, as a commodity, is a finite resource that requires labor, technology, materials, and capital to extract, refine, ship, and sell.  Once consumed, oil can never be used again, and the use of one gallon of gasoline has absolutely no effect on future uses of oil.  Data isn’t like that.  Data is an infinite resource, in that humans, businesses, and the world around us generate a ceaseless supply of data at all times, in everything that they do.  Data is eminently reusable: indeed, licensing the same data for uses by multiple parties is a proven method for deriving value.  And when you consume some data — say, a customer’s spending habits — that data easily affects future data, both in the form of the advertisements that customer sees and the way that they act in response to advertisements.

whisky gif - Alcohol graphics for Facebook, Tagged ...
Now that is a data metaphor.

That last point makes the differentiation between data and oil even more clear: oil is a resource from which we refine and derive petroleum products for sale and use, but you are the resource that produces data for sale and use.  Human beings generate the data that drives companies like Facebook or Google to near-trillion dollar valuations.  We sometimes describe it as an ouroboros, the dragon that consumes its own tail, because the endless data feedback/consumption loop we create by our activities is both paradoxical and self-sustaining.  The central point is that we are the source of data, and we are the ones who consume it.

A Distinction with a Difference

What’s the problem with saying that data is the new oil?  Why can’t we just think of assets as, generally, similar to one another?  It’s a fair point, because very often when people dismiss the data-as-oil idea, they do so without any meaningful explanation.  As we set out above, there are plenty of foundational reasons for differentiating data from other commodities, the most important being that data comes from human beings or, at minimum, human-related activity.

But there are second-order reasons why thinking in snowclones is dangerous.  Doing so ignores the distinct needs of managing data and protecting rights – either your own or someone else’s.  A physical commodity may be stolen, but it can’t be re-stolen.  But data, like intellectual property, is endlessly licensable.  I can sell you a copy of my book, or I can sell you the rights to the profits, or I can sell you the printing rights, or I can sell you the TV rights, and I still remain the author and I can write another book that would be mine alone.  That means that IP is endlessly steal-able, too.  A thief can steal a copy of the book, or copy the text, or unlawfully sell it, or make a knock-off TV show — and even if we caught the thief, another criminal could do the very same thing the very next day.  So it goes with data: the harms from data breaches and theft ripple far beyond the initial loss.

Worse, when you think of data like oil, it is far simpler to adopt what we call the “dehumanized approach” to data, which decouples data from its human-generated context, and treats human sources of data like an oil well.  Contrast that approach with what the GDPR requires, or CCPA, or PIPEDA in Canada, and you’ll find that dehumanization of data is the short route to regulatory scrutiny.  Even if the data in question isn’t directly personal data, the reasoning remains the same, because data decoupled from its context undermines the need for security, transparency, and consistency.

That decontextualization is the central flaw in the “data as oil” and “dehumanized” views of information. Without context, data seems like a self-generated item, a single point of information that appears, without past or creator, simply waiting to be monetized. And sometimes, for practical purposes it is: plenty of data partnerships allow for unfettered access to data flows and derivative rights without cost or condition, an informational smorgasbord.  But those are the rare cases.  Far more commonly, data has a direct tie to the person or the business that created it – as we’ve noted before, even the word “data” means “something given.”  Our data comes from somewhere, and that somewhere is us.

Christopher Guest hates snowclones.



Leave a Reply