The RIAA’s lawsuit against generative music startups will be the bloodbath AI needs

June 25, 2024 ndowd

Like many AI companies, Udio and Suno relied on large-scale theft to create their generative AI models. This they have as much as admitted, even before the music industry’s new lawsuits against them have gone before a judge. If it goes before a jury, the trial could be both a damaging exposé and a highly useful precedent for similarly unethical AI companies facing certain legal peril.

The lawsuits were filed Monday with great fanfare by the Recording Industry Association of America, putting us all in the uncomfortable position of rooting for the RIAA, which for decades has been the bogeyman of digital media. I myself have received nastygrams from them! The case is simply that clear.

The gist of the two lawsuits, which are extremely similar in content, is that Suno and Udio (strictly speaking, Uncharted Labs doing business as Udio) indiscriminately pillaged more or less the entire history of recorded music to form datasets, which they then used to train a music-generating AI.

And here let us quickly note that these AIs don’t “generate” so much as match the user’s prompt to patterns from their training data and then attempt to complete that pattern. In a way, all these models do is perform covers or mashups of the songs they ingested.

That Suno and Udio did ingest said data is, for all intents and purposes (including legal ones), unquestionably the case. The companies’ leadership and investors have been unwisely loose-lipped about the copyright challenges of the space.

They have admitted that the only way to create a good music generation model is to ingest a large amount of high-quality music, much of which will be copyrighted. It is very simply a necessary step for creating machine learning models of this type.

Then they admitted that they did so without the copyright owners’ permission. Investor Brian Hiatt told Rolling Stone just a few months ago:

Honestly, if we had deals with labels when this company got started, I probably wouldn’t have invested in it. I think that they needed to make this product without the constraints.

Tell me you stole a century of music without telling me you stole a century of music, got it. To be clear, by “constraints,” he is referring to copyright law.

Last, the companies told the RIAA’s lawyers that they believe swiping all this media falls under fair use doctrine — which fundamentally only comes into play in unauthorized use of a work. Now, fair use is admittedly a complex and hazy concept in idea and execution. But a company with $100 million in its pockets stealing every song ever made so it can replicate them in great deal and sell the results: I’m not a lawyer, but that does appear to stray somewhat outside the intended safe harbor of, say, a seventh-grader using a Pearl Jam song in the background of their video on global warming.

To be blunt, it looks like these companies’ goose is cooked. They clearly hoped that they could take a page from OpenAI’s playbook, secretly using copyrighted works, then using evasive language and misdirection to stall their less deep-pocketed critics, like authors and journalists. If by the time the AI companies’ skulduggery is revealed, they’re the only option for distribution, it no longer matters.

In other words: Deny, deflect, delay. Ideally you can spin it out until the tables turn and you make deals with your critics — for LLMs, it’s news outlets and the like, and in this case it would be record labels, which the music generators clearly hoped to eventually come to from a position of power. “Sure, we stole your stuff, but now it’s a big business; wouldn’t you rather play with us than against us?” It’s a common strategy in Silicon Valley and a winning one, since it mainly just costs money.

But it’s harder to pull off when there’s a smoking gun in your hand. And unfortunately for Udio and Suno, the RIAA included a few thousand smoking guns in the lawsuit: songs it owns that are clearly being regurgitated by the music models. Jackson 5 or Maroon 5, the “generated” songs are just lightly garbled versions of the originals — something that would be impossible if the original were not included in the training data.

The nature of LLMs — specifically, their tendency to hallucinate and lose the plot the more they write — precludes regurgitation of, for example, entire books. This has likely mooted a lawsuit by authors against OpenAI, since the latter can plausibly claim the snippets its model does quote were grabbed from reviews, first pages available online and so on. (The latest goalpost move is that they did use copyright works early on but have since stopped, which is funny because it’s like saying you only juiced the orange once but have since stopped.)

What you can’t do is plausibly claim that your music generator only heard a few bars of “Great Balls of Fire” and somehow managed to spit out the rest word for word and chord for chord. Any judge or jury would laugh in your face, and with luck a court artist will have their chance at illustrating that.

This is not only intuitively obvious but legally consequential as well, since it’s clear that the models are re-creating entire works — poorly sometimes, to be sure, but full songs. This lets the RIAA claim that Udio and Suno are doing real and major harm to the business of the copyright holders and artists being regurgitated — which lets them ask the judge to shut down the AI companies’ whole operation at the outset of the trial with an injunction.

Opening paragraphs of your book coming out of an LLM? That’s an intellectual issue to be discussed at length. Dollar-store “Call Me Maybe” generated on demand? Shut it down. I’m not saying it’s right, but it’s likely.

The predictable response from the companies has been that the system is not intended to replicate copyrighted works: a desperate, naked attempt to offload liability onto users under Section 230 safe harbor. That is, the same way Instagram isn’t liable if you use a copyrighted song to back your Reel. Here, the argument seems unlikely to gain traction, partly because of the aforementioned admissions that the company itself ignored copyright to begin with.

What will be the consequence of these lawsuits? As with all things AI, it’s quite impossible to say ahead of time, since there is little in the way of precedent or applicable, settled doctrine.

My prediction, again lacking any real expertise here, is that the companies will be forced to expose their training data and methods, these things being of clear evidentiary interest. Seeing these and their obvious misuse of copyrighted material, along with (it is likely) communications indicating knowledge that they were breaking the law, will probably precipitate an attempt to settle or avoid trial, and/or a speedy judgment against Udio and Suno. They will also be forced to stop any operations that rely on the theft-based models. At least one of the two will attempt to continue business using legally (or at least legally adjacent) sources of music, but the resulting model will be a huge step down in quality, and the users will flee.

Investors? Ideally, they’ll lose their shirts, having placed their bets on something that was obviously and provably illegal and unethical, and not just in the eyes of nebbish author associations but according to the legal minds at the infamously and ruthlessly litigious RIAA. Whether the damages amount to the cash on hand or promised funding is anyone’s guess.

The consequences may be far-reaching: If investors in a hot new generative media startup suddenly see a hundred million dollars vaporized due to the fundamental nature of generative media, suddenly a different level of diligence seems appropriate. Companies will learn from the trial (if there is one) or settlement documents and so on what could have been said, or perhaps more importantly, what should not have been said, to avoid liability and keep copyright holders guessing.

Though this particular suit seems almost a foregone conclusion, not every AI company leaves its fingerprints around the crime scene quite so liberally. It will not be a playbook to prosecuting or squeezing settlements out of other generative AI companies, but an object lesson in hubris. It’s good to have one of those every once in a while, even if the teacher happens to be the RIAA.

source