E20: Artificial Intelligence vs Data Minimization & GDPR (Podcast)

The opportunities for Artificial Intelligence to transform humanity are enormous. We are seriously excited. However, there are issues with amassing the amount of data necessary for these machine learning based solutions. To become “intelligent” (whether artificially or not) requires immense data and knowledge, and the ability to recall that knowledge. Data Minimization, as a concept, has some very particular issues with the “recall” of data and knowledge and could present a serious impediment to AI development. How will AI evolve while still protecting the privacy and rights of citizens?



Jay: “Are you DataSmart?” A weekly podcast on data security, information management, and all things related to the data you have, how to protect it and maximize its value. I’m Jay Ward.

Christian: And I’m Christian Ward. Jay, today I wanna really tackle a fun, scary, beautiful, glorious, all-those-things topic, which is artificial intelligence, and how it relates to data privacy. And to some degree, well, we can rephrase it, you know, AI versus Data Minimization or AI versus Article 22 of the GDPR. There’s a lot of ways to think about this. But in my world, AI represents so many great opportunities to improve people’s lives. Now, there was an article earlier this week about how AI is being used to track how cancer develops in patients and how it spreads and, you know, you’re talking about just such personal private data. But at the same time, you’re talking about such a greater potential good here by leveraging this type of technology.

At the same time, there was a great video that showed a new robotic arm that we’ve enabled with AI to find Waldo in half a second in the “Where’s Waldo?” series and actually point to it. It’s kind of funny. It’s interesting to watch. That may not be the world-saving, cancer-stopping type of use of this data or this technology that one envisions, but it shows that with all new technologies, there’s the capability to use it and apply it in almost any creative way possible. And when we think about that, we have to compare that with data privacy laws and regulations. And that’s an area that really the GDPR contemplates. But we haven’t really seen that play out yet in many real world situations.

Jay: I think that’s because the GDPR is despite being a mammoth piece of legislation, fairly forward looking in this area, I mean, it contemplates the development of AI, and the growth of AI, and machine learning as a larger component of our day-to-day lives than it is right now. You know, there’s going to come a point in the not too distant future where AI is ubiquitous, and it’s just a part of everything that we do. It isn’t there now, but when it is, you know, it’ll be interesting to see how we counterbalance the benefits that it provides with the very real risks to personal security and privacy. And I think that’s the aim of the GDPR generally. I mean, Article 22 in particular, but you know, it’s in the recitals as well, Recital 71 about sort of peeking behind the curtain and getting an explanation of what is happening. But in any event, the GDPR is really the only law that I know of that’s talking about and tries to grapple with AI in a meaningful way.

Christian: Yeah. So, Article 22, just to remind everyone we’ve talked about a little bit in the past. The title is “Automated decision in individual cases.” And it states, “The data subject,” meaning the human, the human being, you, myself…”The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” It’s very broad, but what it’s trying to protect against is this concept of who gets credit card offers? Who gets approved for loans and mortgages? Who gets allowed to enter certain events and versus not? Or who gets pulled out of line in security? There’s a mammoth amount of ways that we could be applying these types of technologies. But it allows for this subject to have the right not to be solely determined by an automated process without some sort of recourse.

What’s fascinating about it is when you think of how there’s five tribes of thought in artificial intelligence and machine learning, or in machine learning that’s translated into use of artificial intelligence. One of them is sort of the genetic view, the biological view of allowing data sets to mesh and merge, and only the fittest survive. And these sort of genetic algorithms use data, they combine in various ways to hit a fitness function, a goal. Now, when you’re doing that you could delete the data of the biological subjects that don’t survive on to the high goal, I don’t think that’s really what’s happening. And generally, the concern is, if I’m running a massive study of who I will give a mortgage to, based on an incredible AI layer that’s using that type of machine learning, how do I keep that data? Or if I do keep that data, how do I make it available to the end subject to explain, “Hey, this is why you were denied mortgage by my company.” And one of the real concerns I have with this is we’re talking about data and decision process at such a high scale. It’s actually getting harder and harder for humans to understand how the AI arrived at the decision that it did and to sort of step through the decision process. That’s a real concern.

Jay: Yeah, the AI doesn’t have to show its work in a sense, right? Because it’ll eventually learn to analyze these issues in predictive ways that diverge from the way that we would have approached the subject. I do wanna sort of carve out an important distinction here. In the U.S. the Fair Credit Reporting Act has for a very long time required that people get insight into a decision made about a credit decision. And although those decisions are made by entering data into a program, and the program calculates risk and comes up with the decision, that’s not the same thing that we’re talking about.

Christian: No. In fact, that’s a linear process.

Jay: That is a linear process, right? So that’s, you know, I’ve put in these following inputs and then the program has reached, you know, this likelihood of risk score, and then you’re denied. When that’s done there’s a list of factors that you’ll be given. And some of them are probably correct, and some of them are incorrect. But you’re given a list of factors. And that’s it. Don’t think of AI and don’t think of disclosure requirements set in the GDPR in the way that we think about the Fair Credit Report. That’s an act from the 1970s. AI was just a matter of science fiction then. What we’re talking about now is truly independent thinking by these algorithms that reach a decision that a human being by law now under the GDPR Articles 21 and 22, has to be able to provide some insight into. Now, the GDPR doesn’t go so far as to say, “Thou shalt provide an explanation of what happens.”

Christian: Which I’ve seen, there are companies that are very concerned about that, because the amount of money they put into developing these tools…they don’t wanna walk in and show every competitor the exact thought process that they built into the system. Quite frankly, that’s really not fair to them. But there has to be some level of transparency.

Jay: Right. And so the GDPR is a little bit unclear about this because the recitals in the GDPR are sort of the explanation bit at the front where they say, this is what we’re out to do. This is the important stuff. The recitals are really important. If you haven’t read them, it’s hard to really understand GDPR, so you need to look at the recitals. But they’re not binding. So Recital 71 says that individual data subjects have the right to have an understanding and potentially even an explanation of how these decisions were made. But that’s not what Article 21 or 22 says. It means they get the right to have a human provide technical details and to talk to a human about the decision and potentially object to it, say “I don’t wanna be processed in an automated fashion.” So, there is protection for, you know, the trade secrets or the intellectual property of the company using these algorithms.

But that doesn’t mean that there still isn’t a real tension between the ability to object to automated processing, and the operations of businesses that are gonna increasingly rely on this. So, you know, there’s tensions elsewhere, too. I mean, you were just talking about deleting data sets of the, you know, the fittest, not the non-fittest subjects. And only those who make it to the next level get to proceed. Well, but what about minimizing the length of time that we keep the data? If you input data into a machine learning platform or an AI platform it processes in many instances the data by incorporating each datum and factoring in into its thinking. It learns by accumulating and processing just massive amounts of data. And the more data it gets, the more identification points it can achieve.

Christian: And quite frankly, not just the more, so not necessarily the breadth, but the length of time. So, that time has always been one of the most key relational points to build upon and wait upon. So, you know, it gives you the only…really the totally human reference of understanding time is how we also explain our decisions. So it really is critical, but data minimization, again…

Jay: How long are you keeping this data for? I mean, if you put it into… Although you may not as an individual be able to access the information any longer, even if you create a system where the data are entered into a database, and we humans can no longer access that data, it’s incorporated into the AI’s algorithm, it’s processed and understood, we can’t touch it anymore, that doesn’t mean that the data is not still being processed. It’s being processed in a way that we can’t access it, which in some respects is worse, that’s more dangerous, right? We can’t give the data subject access to it anymore. We can’t object…you know, heed a request for a halt to the processing anymore. So, how do you create systems that safeguard data subject rights while at the same time, allowing for the way, pretty much all AI operates at this point. There is no GDPR compliant approach to AI because AI wasn’t developed with GDPR in mind.

Christian: Yeah. You know, we’ve talked about how I was very impressed by the concept of Article 22, because I think, like you said, it is forward looking, but it all comes down to how it’s ultimately interpreted. Because the ability to tell someone, “Hey, here’s why you were denied,” and it’s enough for them to understand, but not necessarily enough to fully unpack and explain. But then what starts to happen is you take those results and someone has to check okay, these people were denied and here’s the explanation they were given… Because the only way to identify some sort of programmatic bias or the, you know, sort of mistreatment of citizens is to look across a larger set to understand the data, which is again, in some ways completely antithetical to data minimization. Because if I can’t store that and analyze it, even a well-meaning company, which I believe most are, can’t understand some of their own biases that are coming from perhaps a certain data set that’s included in the AI. I always flashback to the Microsoft’s Tay AI chatbot, that became immediately racist, after 24 hours, 24 hours exposure to a Twitter feed, which is, you know, we know that that content is there and it just completely biased and destroyed this thing. And at the same time, that’s gonna happen a lot because we’re gonna always experiment with new data sets.

But if we’re limited to the amount of time that we can have it, and we’re limited to the amount of time that it can be used… Like, for example, there was another article here that Google had partnered up with a couple of hospitals to analyze mortality rates of people inpatient in health in the NHS system. And they, I think, found 93% to 95% in each hospital respectively, that Google could predict using DeepMind’s AI on the data which patients were not gonna make it. And that’s kind of amazing because if we’re talking about that level, right? Literally 9.5 times out of 10 they know this person is really in danger compared to another, and I’m not saying that the staff is not understanding, I’m sure they do, but to get to that level there’s clearly an understanding in the data. Now, what the outrage was is that showed that Google had access to the health records of 1.6 million people in the NHS. This is again, the whole sort of balance of you’re not gonna have outrage without knowledge. If there isn’t knowledge then it won’t cause the outrage and the outrage has to be balanced against the greater good. And AI presents so many great opportunities, but it’s gonna have to be balanced with it.

I’m really hoping that at some point, as companies recognize the opportunities in AI, they are at the same time recognizing the need to come up with creative solutions, maybe even new businesses, quite frankly, that are lockbox storage concepts that allowed for this ongoing storage of data that went into decisions that happened five years ago because it’s gonna somehow backfire. If we minimize the data, then we don’t have the information to go back and retrace and understand the mistakes that are made. And that’s concerning, particularly with any decisions that are happening against or for a person.

Jay: I think that’s absolutely right. And, you know, Google does seem to get something of a free pass about stuff like this.

Christian: Sometimes.

Jay: You know, there’s this story, there’s also a story that broke yesterday that Google had a secret deal with MasterCard, where they were tracking offline sales habits that neither one of them disclosed even though it’s post GDPR. But you’re not Google. And so, if you’re doing stuff like that, you’re gonna have a bad time. This is not gonna be good. So, what you have to think about is how to balance the benefits of using AI in-house with the potential risks for you. And I think that balancing test is really what the GDPR is all about.

Christian: Yeah. Well, we’ll have to see that, you know, that… I don’t know of any business that we work with today that isn’t excited and focused on bringing machine learning and AI into their business. The opportunity…there’s a new book called “Unscaled,” and it’s all about how, you know, the last, you know, half decade was all about how do I scale my operation. It’s all about efficiencies so that I could almost provide the same product to more and more people with the same levels of service so that I could ensure a consistent product. AI and machine learning are flipping that on its head where they can personalize things to such an incredible level. It’s literally unscaling everything to, what do I wanna read today? Where do I wanna go today? Where would I like to stay in hotel? Where would I not like to stay?

That type of understanding of a human being by a business or a corporation that has enough data to make those decisions is really a great opportunity. And look, in terms of healthcare alone, the opportunity is so life-saving. It’s worth pursuing. But people have to understand, and regulators have to understand this requires enormous amounts of data. Not because you’ll always have to have it, but you do need to test and learn with it, and if you have to jettison it, because you can’t keep it… And, by the way, this whole stuff of, “Oh, it could be anonymized.” It really can’t in these cases. This is specifically and purposely [crosstalk 00:15:56].

Jay: The point is that it’s not.

Christian: The point is to make sure it only tracks to you. But once again, I see business opportunities there. Is there a business that allows for people to store all of their personal data? With a key they can grant a company access to run their models, back testing again to how they would fare, get a report of “Here’s what we would have done for you as your travel agent AI for the last six years and here’s the savings you would have versus the purchases you made. Would you like to move forward?” If you do you continue to grant and consent the access to your data, but it’s a personal lockbox. There is some version of this that has to happen, yes.

Jay: There are ways but, I mean, for now it’s the challenges because it operates now. If you have any automated decision-making processes it needs to be explainable now. And so if you wanna know if you’re ready to deal with that go to the people who manage the automated systems for your company and say, “Can you explain how this decision was made to me?” And if the first time they explain it to you, it does not make perfect sense, you probably have some work to do.

Christian: Yeah. Well, look, I think that’s the balance here that a lot of times when we talk about data as lifeblood or as currency, or as you know, it’s where really money is, regulation oftentimes follows where the money goes and we’ve mentioned this before. In this case, regulation is trying to get ahead of this, it is forward thinking. And I think it’s very bright. I think the problem is, it’s not really that tricky, because we really don’t know what’s gonna happen with it. This is another situation where as the findings occur, it will guide companies on how they must prepare for this, but the benefits to artificial intelligence and machine learning…it is the next, you know, wave for humanity if we can get this right.

I’m glad to see that we’re stopping, at least pausing for a moment to say the ethics that apply to this from a legal standpoint have to be thought about before we continue down this path, because this path is almost impossible to unpack if we don’t think about it upfront. And systems are many times built with the best of intentions, but in these cases, we won’t even know how we got to the decisions we had. So, it’s exciting to think that they’re thinking about it. It’ll be very interesting to see how it develops. And ultimately, you know, if we have enough data to identify who Waldo is in the picture, then we’ve gotta figure out how to explain to Waldo that we could find him anywhere in the world.

Jay: I mean, we were promised hover boards and rocket cars, when we get…when we find Waldo. I’m disappointed in the future.

Christian: Yes, I’m sorry to hear that. Well, thank you everyone, for listening to this episode of “Are you DataSmart?” We’ll see you next week. Thanks again.

Leave a Reply