AWS Whistleblower says Amazon is Ignoring its own AI Policies

AI the Law and You Featured Image - Whistleblower at AWS

AI the LAW & YOU Newsletter

Release Date:

May 21, 2024

Episode Transcription:

Shannon Lietz: For companies that are starting to adopt things like AI, and Copilot, and ChatGPT, and LLAMA, and you name whatever LLM that’s out there, Are they evaluating their policies with relationship to how data gets used ? My perspective is, if you’re going to bring in public data, or you’re going to bring in copyrighted materials, note that because it could be a concern. It could end up in something that does get flagged for future lawsuits.

Mark Miller: In today’s episode, Joel Shannon and I discuss a case where an employee at AWS blew the whistle, saying the company is ignoring its own policies when it comes to consumption of data for its AI engine. Is being told to ignore company policy illegal? You might be surprised on how our trio comes to grips with this concept.

Joel, you found a case this week. A whistleblower at Amazon has said that Amazon is ignoring its own internal policies and breaking copyright law and policy by what they’re doing with AI. Give us a little foundation there on what’s going on.

Joel MacMull: Yeah, sure. this woman who I think was a, I think she was a senior scientist at Amazon, she contends that when she was working on something, and I think this is after she returned from being out on pregnancy leave,she was told by her, I think, direct supervisor to overlook certain Amazon copyright policies and I think the complaint says inapplicable laws.

Now the complaint doesn’t identify specifically what those policies were or specifically what those laws were that she was asked to essentially ignore. But it does give rise, I think, to an important observation, at least, which is to what extent in the race for AI supremacy are companies, both big and small, perhaps giving short shrift to their own internal policies. Now, again, we don’t know what specifically those policies were,in Amazon’s case, it could be just maybe it has an anti scraping policy, in terms of scraping content, I don’t know, I don’t know what that is.

It could also be, as I think YouTube does, where the copyright, and this is true I think of most, sort of social media companies, is the copyright remains vested in the user. but obviously she was asked to sort of harness data. That, was in violation of one or more policies. Again, I don’t know what that is.

Mark Miller:

That is interesting in that, is it against the law for a company to ignore its own policies? I say no.

Joel MacMull: No, it’s not. And I think the case law is pretty clear on this. so long as it’s not a violation of, What I’ll call an external law. and you come across this stuff all the time with Facebook and stuff, and I’m constantly, fielding emails or stuff from potential clients where they say, Facebook, basically deplatformed me, for they say a violation of their policy, either I don’t know what it is, or that which I did is not a violation of their policy. And, in their terms of service, it’s basically a private contract. all of these companies, Instagram, Facebook, whatever, they can de platform you for any reason or no reason and you have zero, something approaching zero recourse in a court of law.

Mark Miller: What it does is it opens up Pandora’s box for the entire industry. The defense from AWS, it sounds like internally, was, “Hey, everybody’s doing it.” Where does that leave the external law, as Joel talks about it, when it comes to ignoring these types of laws and policies?

Shannon Lietz: A lot of policy, corporate policy, tends to get set by standards, norms, and best practice. What’s interesting about this case for me is really the policy conversation itself.

When I look at the claims and all of the back and forth, my perspective is, where were the lawyers at Amazon? And generally, when you have a policy concern, you would go to the legal business partner for that concern. You might go to the AI policy owner. And have the conversation to make sure that you have clarity if you have concerns about it.

I don’t know if that happened here, I do see somewhere in here, that someone stated that the person should go against policy and that’s not common and that would be an interference with that person’s job responsibilities. But I don’t know what happened because there’s not enough information in the filing to really get a sense of it.

I would say if it was me and I was going through something like this, I would have a conversation with legal. I would have a conversation with the policy owner to understand what the implications are there and to get clarity. so I don’t know if that happened. I also think that if we have management and organizations that are stating that, well, everybody else is doing it, they also have an obligation to go get the policy corrected if there are concerns coming up from their engineering, folks.

there’s something that doesn’t quite smell right with this case is the way I look at it. But if I was taking, had takeaways from this, my takeaways are, as a technologist, policy is really helpful. It’s the constraints of the system. And it’s the constraints of the system that everyone else was following in Amazon, was that they were all following this policy and it wasn’t being broken across the company.

then I think that there’s, again, something wrong with this situation. And so I, I’d have a whole bunch of questions related to this from that perspective. If I look across AI policies across most organizations, I am seeing, some organizations that are saying, don’t pull copyrighted material into the AI engine.

or making sure that there is a license for the data. I have so many questions in this case, I’ll just say, for me, was the data licensed? I don’t know that an engineer is going to know that, by the way. I’m not even certain that’s their job to go figure that out. I think when I’m building something, yeah, I’ve got lots of questions.

But it’s not always on that particular engineer to solve every question they have, towards getting to an outcome. So I don’t know. I don’t know what this person’s role was. I don’t necessarily know all the factors that go into it. Now, if there is a legitimate claim about a policy violation leading to the termination of this employee, that is definitely concerning from my perspective because we have policies and organizations to create governance, to create structure, to create similarity and dependability within those systems. And so that’s where I land.

Mark Miller: You bring up a really interesting point. Joel, I want you to tackle this. As you implicitly stated, where are the lines of responsibility?

If it’s not the engineer’s responsibility to verify the data, whose responsibility upstream was it to verify?

Joel MacMull: when you say verify the data, I think what you’re using that as a synonym for, basically, Having guidelines with respect to the use of the data, right?

that’s really what you’re asking?

Mark Miller: more like, how did copyright data Enter the system, who made the choice to pull in copyright data, and whose responsibility is it before it gets to the engineer, because the engineer is not going to know, as Shannon says, to say that we can’t use this.

Joel MacMull: it may be because I’m a lawyer, but I think in many cases the buck stops with the lawyer.

there’s a reason why in the wake of, for example,the New York Times litigation against Microsoft and OpenAI, what we’ve seen is a rash of now licensing agreements from certain publishers allowing these models to use their product. And that is in direct response, I think, to these copyright litigations.

Now, we don’t know how that case ultimately is going to go, but I certainly think it has legs. And my memory is, wasn’t there a motion to dismiss? no, it’s the Silverman case. Forgive me. I got that confused with the Sarah Silverman case where that litigation was trimmed. but in answer to your question, I think it’s legal responsibility.

they have to know what’s in the pot.

Mark Miller: that, that’s where I was going to go with it, is that Really, as a lawyer, does every piece of Content. Does everything that’s going into the system have to be approved by a lawyer? And I just air quoted that, isn’t

Joel MacMull: it? I think there’s got to be guidance.

and that may be, we’re making an educated guess here, but that may be what the internal policy was that conceivably,if the works are not licensed, conceivably, maybe we’re not going to scrape them because we don’t want to risk a litigation like the one I just described. then it becomes the lawyer’s responsibility to say, okay, these are the licensed publications we can use.

Mark Miller: I can see that going forward. I don’t see how you do that in retrospect.

Joel MacMull: Oh, oh, I agree with you. I don’t think it can be done in retrospect for reasons we’ve discussed, because once it’s in the pot, it’s in the pot, and there’s no way to remove it from the pot, by all accounts.

Mark Miller: What we’re looking at, too, is, is, the Open Policy Group is working on this stuff with the government. Is there something that the government can do to say, here’s the playing field, and here’s how you’re going to be judged? by what you’re pulling in. Is that the government’s responsibility?

If we’ve got, let’s just for a round number say we’ve got five major AI companies. Shouldn’t they all be playing by the same rules as far as Copyright data protection, everything that’s involved.

We’re going to the macro level now.

Shannon Lietz: Oh, I see where you’re going. But I think that’s what we’ve been on talking about is there’s just missing. This is what I’ve been talking about for probably the length of our podcast so far is I really do think that we’re missing some of the constraints that help people rationalize some of these questions.

I think this person was, working in Alexa, they had a layoff, a significant layoff in November, as an example. So something’s clearly not quite going right with that product, if you ask me. And I see that as an aspect. But if I look across the industry, and think about what could the government do for this, There’s plenty that the government can do.

Is copyright the right vehicle for what we’re talking about? I don’t know the answer to that. training data sets. How does that work? How do, creators and makers get compensated for this new capability that’s out there? That folks are saying that copyright isn’t the right protection, potentially.

And so I think there’s a lot within the mechanics of this, and I think this is where, when you’re inside of an organization and, top down tone is get it all done. I can understand why there’s friction in the engineering layer about all of these issues because engineers want to do a great job.

They don’t want to do anything wrong. They’re not trying to, search out and destroy. They’re looking for what are the policies of the organization, how do I do the right thing within the constructs of what I have, because they need those constraints to make creative decisions. And if it was as easy as, hey, I’m working instead of Amazon and I want to go scrape and pull in any data I want to, Then I think there’d be even more challenges in the world.

So my perspective is the government doing a good job? No, I don’t think they’re doing a great job yet on AI. I think there’s a lot to be had. when I look at the industry norms, I still see behaviors I wouldn’t have expected from amazing large technology companies. I would never have expected to see You know, toddler brawls in the basement with, blow up bats.

At this level, I really think we just haven’t seen the leadership we need in AI yet across the industry.

Mark Miller: I haven’t seen toddler brawls in the basement.

Shannon Lietz: I’m seeing toddler brawls in the basement right now. We’ve got a whole bunch of like top level leaders. Trying to figure out how to slug it out to get to the front of the line to get their dollars and cents to add to their revenue. Super revenue driven and the industry is, I think, in need of figuring out what the right things to do, what’s ethical.

Like all of the questions that you would see a mature adult human being try to rationalize and get to some semblance of understanding. And the fact that we’re seeing this kind of case come up in the first place tells me. that these organizations aren’t solving the problems their engineers need to have constraints.

Mark Miller: I, I can’t assume that large companies like this are going to play to ethical standards. They’re not going to hold themselves accountable. That’s where the government comes in and where lawyers…

Shannon Lietz: That’s actually where our dollars and cents come in. Honestly, if you don’t think that a business is being ethical, you can pull back your dollars and cents. Not that I’m seeing a whole lot of people doing that, but that is actually a way to force ethics into the equation is that folks stop buying it.

Joel MacMull: I agree with you, but from a user’s perspective, I don’t take the view that even if these big five companies are violating copyright, that is necessarily going to engender a kind of disengagement from users.

Shannon Lietz: I don’t think it will either. I’m actually saying people are using this stuff like it’s hot. and it is hot. Let’s be honest. You wake up in the morning, you get to work with a Copilot or a AI or a ChatGPT, and most people are reinvigorated at this point. But you’re seeing a whole lot of creators stomped on by technology and AI.

and I still think the question, and, honestly, Joel, going back to some of the things you’ve said about the market will solve this problem, I actually lean into your, the market will solve this problem, but with hopes that these top level leaders will actually act as leaders, because it’s hard to call them leaders when we see these kinds of behaviors.

Mark Miller: they’re leaders of their shareholders. I

Shannon Lietz: understand, but honestly, if you’re going to try and create a market and you want to do the right things, the companies that have done the right things in the past have been rewarded for it. The ones that haven’t quite done the right things, they haven’t.

Mark Miller: Who’s done the right thing?

Shannon Lietz: Yeah, in the technology space, that’s interesting.

Joel MacMull: I was thinking the same thing.

Shannon Lietz: I’ll say privacy and security. I’m actually an Apple fan. I am. I’ll be out there. Think about it. They’ve taken all the slugs. There’s absolutely some friction in between how they run security and privacy for their end users versus some of the monopolistic, again, toddler brawls that are going on behind the closed doors of legislative policy.

There’s a lot of stuff in the mix that I don’t think the average user gets to see.

Joel MacMull: I agree with you that Apple’s security and privacy may be top notch, but I don’t believe that’s why they’re a market leader. They’re a market leader because they design,sexy devices that are intuitive.

Shannon Lietz: I’m in agreement they do sexy devices. But they, I do not think they would actually be a market leader without their privacy and security stance. I think there are people buying their products because they’re ahead in that area.

Joel MacMull: I think that’s a fraction of those people. People are buying it because they believe their technology is sexy and because it’s superior.

That comes down. We can disagree. No, I, and I, evidently we do. And there’s plenty of people who buy androids

Shannon Lietz: because they’re sexy, but I can tell you, when I talk to people about why they have an Apple device, most of the time, even out of end users that are not technology savvy, they’re like, they care about my privacy.

Mark Miller: I’ve never heard that, but I threw that out on this one. So what it comes down to now is, I like the word you’re using, Shannon, constraints. My question is, who applies the constraints? The lawyers internally, or the government externally, or a combination of both?

Joel MacMull: I was just going to say, I don’t think it’s mutually exclusive.

I, I think there is a role for lawyers to play internally in terms of providing good governance to their client. I think that remains and always will remain,an important function. you talk about whose responsibility is it? I’m not sure it’s necessarily the government’s responsibility.

But I do think the government is best positioned to, use your phrase, to provide a level playing field. And that, of course, is why copyright laws is such in flux when it comes to these, It comes to these models because it doesn’t, if copyright law in its current iteration doesn’t change to accommodate this new technology, then you’re going to have a tremendous amount of liability and as a consequence, I think the stagnation of development as it relates to these models, because it means the New York Times and the Sarah Silverman’s and the John Grisham’s of the world are right.

And there is rampant copyright infringement.

Mark Miller: as we’re rounding out the discussion here. Does this have the potential to be a foundational case to call out the industry? Or is this such a small detail internally with AWS, it’s just going to disappear?

Joel MacMull: I think it provides maybe a talking point,in a broader discussion, but this case is not going to be a bellwether in terms of, changing, moving the needle any.

with respect to Amazon or anyone similarly positioned,and their failure to adhere to their own internal policies. And I say that in part because this case, as we’ve all understood, is not about copyright infringement. it’s a larger retaliatory case, and the copyright issue and the, what the woman was instructed to do vis a vis internal policies is just, is tangential at best.

Shannon Lietz: I still think it’s an interesting part of the case and probably the thing that fascinated me most was that, the conversation around AI policies existing, one, and we’re talking about the copywriting policy, which, by the way, I think is probably stuck in an acceptable use. I don’t think they have a copyright policy.

I think they have an acceptable use policy. and, I think what’s really fascinating about it is. When AI got big in the last year and a half, two years, have companies looked at their acceptable use policy? Are there other concerns brewing out there related to this type of activity behavior?

And then also, for companies that are starting to adopt things like AI, and Copilot, and ChatGPT, and LLAMA, and you name whatever LLM that’s out there, Are they evaluating their policies with relationship to how data gets used? Because I’ve looked at and built and made and worked on so many data security policies, data handling policies, data classification policies.

And one of the things that I’ve seen in the last few months has been companies really looking at and trying to re evaluate their data handling policies. And that includes bringing data from the outside into your organization to be leveraged as part of one of these experiences built into your daily productivity tools.

My perspective is, if you’re going to bring in public data, or you’re going to bring in copyrighted materials, note that because it could be a concern. It could end up in something that does get flagged for future lawsuits. So I guess data traceability is a concern in my mind.

Release Date:

May 21, 2024

Episode Transcription:

Episode Guest:

SUBSCRIBE