An investigation by The Atlantic has shed light on the vast scale of copyrighted music used to train generative AI models. The publication, as reported by Engadget, released four searchable databases that catalogue songs fed into AI training systems. The scope is staggering: one database contains 12 million tracks, another 9 million, and two additional databases each hold about 100,000 songs.
The accompanying article by staff writer Alex Reisner provides context on how much copyrighted material was used. According to Engadget, the databases include hit tracks from artists like Taylor Swift and Bad Bunny. The investigation points to legal cases already underway against generative AI music platforms such as Suno and Udio, which have often claimed fair use as a defense for scraping copyright-protected content to power their platforms.
Legal Precedents and Ongoing Cases
Engadget notes that a similar case in book publishing did not succeed with a judge on copyright infringement claims, but piracy allegations proved to be a more compelling argument. The initial settlement in that suit was $1.5 billion, though the full results and payout are still pending. The databases from The Atlantic could help parties in the music industry pursue similar lawsuits in the future, according to the report.
Industry Response and Challenges
Many music streaming services have taken steps to prevent, identify, or label generative AI creations, but those efforts have seen varying degrees of success. Engadget reports that these measures have not stopped scammers from creating imitations of existing bands and attempting to profit from AI copycats.
The investigation underscores the ongoing tension between AI developers and content creators, with significant legal and financial implications for the technology industry.