Putting a Value on Derivative Data

You may have heard from us (once or twice) that the phrase “data is the new oil” is not a particular favorite.  Ok, more bluntly – we hate it.  It not only ignores the reality of what data is and how it is created, the saying mischaracterizes the nature of personal data collection, how it can and should be deployed, and what needs to be done to protect data.  Not only that, but oil, as a non-renewable physical commodity, is about the least analogous substance we can imagine as a comparator for personal data.  It’s like saying personal data is the new butterscotch sauce — yes, data and butterscotch sauce are both things, and yes, you end up paying for both of them one way or another.  That does not make it a good analogy.  So everyone needs to just stop saying thatStop itSeriously.  /end rant./

A review of Laurell K. Hamilton’s ‘Narcissus in Chains ...
Loki doesn’t like it either.

Okay, calming down a little, the question is really, “why do people say that data is the new oil?”  Because they’re trying to explain that data, as an asset, has value that can be extracted, but often it must be refined first.  (Hey, just like oil!)  The central thrust of the saying is that data is valuable, and we just have to figure out what that value is.  There have been many efforts in this regard.  Some have talked about the creation of cash flow analysis, others use Forrester’s Total Economic Impact tool, still others embrace a specialized modification of the typical asset valuation models from other sectors.  There is some benefit to all of these approaches, to be sure, and we routinely incorporate discounted cash flow models into our valuation approach.  But the problem remains that most businesses use valuation models that still treat data like oil, and that precisely what you don’t want to do.

Related image
“My BV model used the GPCM, but forgot to factor in LEAPS and now I’m SOL!”

Our approach is a little different, because it not only incorporates traditional models and tools, but also specifically data-oriented analytics and valuation methods.  To be clear, there is simply no perfect way to capture the value of a data asset because data assets always exist in relation to other data, proposed use cases, and the regulatory environment: you can’t pin down a guaranteed value under the circumstances.  That’s largely why our book Data Leverage suggests that you place data assets into one of four buckets ($0, $10,000, $100,000, and $1,000,000) when conducting your internal analysis of your data.

Those buckets are helpful, but how exactly do you decide what dataset goes into which?  As mentioned, our approach is a multidisciplinary one, but there are a number of factors we use to help clients reach valuation decisions.  We’ll mention one of the factors we consider here, not only because it’s helpful to think about just as a general matter, but also because it is frequently overlooked: derivative uses of data and data flows.

Derivative Datasets

When you create a new dataset from related data sources, to serve a new purpose or create new value, you have created a derivative-use dataset (or just a “derivative dataset.”). This process takes any source data, whether yours or a third party’s, and enhances it to serve a new purpose or to add value by combining, performing calculations on, reducing, expanding, or in some other way altering the data.

When building data partnerships, many firms forget that the most important task in creating derivative works is to ensure that your company has the legal rights to derived data. For example, imagine that you want to analyze the average dietary intake of pasta by Americans on a monthly basis.  You could go to Yelp, TripAdvisor, and other reviews platform to purchase or create a partnership for access to a feed of all of their review data about restaurants across the desired geography. From here, using a simple natural language processing (NLP) approach, you could quickly use terms for pasta in any form (spaghetti, ziti, and mac&cheese, for example) and extract from the millions of textual reviews of restaurants a compilation of the interest in or mentions of these different pasta types. You could also time stamp these mentions with their date of the review  to build a dynamic time-series view of all pasta types, their popularity, and growth or decline in the United States.

Pasta GIFs - Find & Share on GIPHY
Statistically speaking, you’re probably eating pasta right now.

Unfortunately, everything you just did to create the dataset for pasta tracking in the United States is a derivative work based off of the reviews dataset, which is wholly the property of the source from which you got it.  Some data partners will allow you to create and even sell derivative works, but only so long as you have a current data partnership with their firm as a source of the data. This means, once the contract terminates or expires and you end your partnership, those data partners expect all of your derivative works to remove their valuable data from your dataset. This can prove to be exceedingly difficult, especially if you are combining, as in this example, review data from several platforms.  And it means you’d need to find a new pasta source.

Get Out GIFs - Find & Share on GIPHY
I couldn’t resist.

While negotiating derivative data rights may take considerable effort, there are definite advantages to doing it up front. Unfortunately, many companies, especially in their infancy, skip this process and begin building their datasets without the necessary rights to the data. Case in point: Yelp routinely fights companies that, through data crawlers and scrapers, have been copying reviews and content from Yelp in violation of Yelp’s terms and conditions. These companies then sell this data or tools around the data, essentially creating derivative works and services based off of data they don’t own. These types of legal battles are fierce, costly, and costs discovery alone frequently run into the hundreds of thousands, even in a small case.

Don’t even ask what Apple v. Samsung cost.

Don’t Spoil for a Fight

Disregarding partnerships in favor of solo scraping is a very short-sighted approach by smaller companies that are building out the data aspect of their business. Forward-thinking companies are quite happy to allow smaller data companies and start-ups access to their data assets with all appropriate rights, and the reason for this is simple: for large organizations, there is only so much time to find new value in their own data. Executives who recognize the opportunity to outsource value-seeking research at no cost are, rightly, open to allowing smaller companies to identify, extract, and help monetize derivative works. By doing so, the larger organization ensures that their data, which is the source of the new valuable dataset, increases in its attractiveness and usefulness to the market.

IBM, Yelp, Google, Thomson Reuters, First Data, and Equifax are just some of the large companies who specifically set up programs to foster the use of their data by smaller companies to create valuable new derivatives. Naturally, there are some restrictions, usually including the data supplier’s right to acquire the company creating the derivative data business, or a right of first refusal regarding such acquisitions. But for many building derivative data products, these constraints are not necessarily showstoppers. It may sometimes be easier to ask forgiveness than get permission, but that’s a poor strategy that, ultimately, makes you seem like a selfish (and therefore, unattractive) data partner.

spoiled veruca salt GIF
We call it “Veruca Salt Syndrome.”

Coming back to valuation, it’s clearer now why derivative datasets and your rights are meaningful.  Imagine trying to value the pasta derivative data asset without knowing whether your rights could terminate at any moment — it would be an exercise in pure speculation, to say nothing of the potential for future costs incurred in the midst of a lawsuit of the sort that Yelp brings.  In the same way, think about what it would mean to assign value to a dataset once you have 1) locked down a license agreement, 2) secured exclusive rights to the derivative products you create, and 3) have identified how you intend to use the derivative asset in conjunction with other derivative products.  Obviously, the valuation is not only going to be higher, it’ll be clearer, more reliable, and replicable in the future.

As we said at the outset, valuing data assets is a complicated process, and it requires real expertise to do with any degree of confidence.  When we value assets with clients, it’s a collaborative, multidisciplinary effort.  But the results, when factoring in the relevant details, are always worth the effort, particularly when it means identifying new use cases or potential new partners.  It’s also an essential part of due diligence with partners, vendors, and potential acquirers — if you can meaningfully explain what your data is worth, it immediately becomes more saleable, which is exactly what you want.  Recognizing how to protect and value derivative data is an essential component of this process, and giving the due amount of attention to securing future rights to constituent data is a key to making sure the well doesn’t run dry.

Another Marilyn Monroe Movie? We've Had Enough Already ...
Slick oil pun, eh?

Leave a Reply