The bill would reveal what is inside the training data

admin

1 year ago

The Verge: https://www.theverge.com/2024/4/10/24126382/copyright-ai-bill-congress-schiff-training-data

Using Natural Intelligence to Protect the Rights of Authors and Artistic Directors: A New Law for the Reporting of the Dataset to Adopt Generative AI

Artists, authors, and other creators have been complaining about since the rise of generative Artificial Intelligence which is often trained on copyrighted material without permission. Copyright and artificial intelligence have always been tricky to understand as the question of how much they change or mimic protected content has not been settled. Artists and authors have sued each other to protect their rights.

The training dataset used to build the artificial intelligence model must be reported by the companies not later than 30 days. The bill will not affect existing platforms unless they make changes to their training datasets after it becomes law.

The bill received support from industry groups like the writers Guild of America, the Recording Industry Association of America, the Directors Guild of America, the Screen Actors Guild, and the American Federation of Television and Radio Artists. Notably absent from the list of supporters is the Motion Picture Association (MPA), which normally backs moves to protect copyrighted work from piracy. (Disclosure: The Verge’s editorial staff is unionized with the Writers Guild of America, East.)

Keeping Your Data Slurped to Train AI: How Do You Wanna Scrape, Purchase, or Use to Train Your AI?

If you’ve ever posted something to the internet—a pithy tweet, a 2009 blog post, a scornful review, or a selfie on Instagram—it has most likely been slurped up and used to help train the current wave of generative AI. A lot of reams of data power large language models. And even if it’s not powering a chatbot, the data can be used for other machine-learning features.

Before we get to how you can opt out, it’s worth setting some expectations. A lot of companies building artificial intelligence already have systems that crawl the web. Companies are also secretive about what they have actually scraped, purchased, or used to train their systems. “We honestly don’t know that much,” says Niloofar Mireshghallah, a researcher who focuses on AI privacy at the University of Washington. “In general, everything is very black-box.”

There are lots of ways for an artificial intelligence system to have data removed, but there is not much information about the systems that are in place. Labor-intensive options can be buried. It’s likely that getting posts removed from the data will be difficult. When companies start allowing opt-out for future data sharing, they usually make users opt-in by default.

Source: How to Stop Your Data From Being Used to Train AI

Why do companies spend so much money on security and privacy protection? A Comment from Klosowski, a Privacy and Security Campaigner, at the Electronic Frontier Foundation

“Most companies add the friction because they know that people aren’t going to go looking for it,” says Thorin Klosowski, a security and privacy activist at the Electronic Frontier Foundation. “Opt-in would be a purposeful action, as opposed to opting out, where you have to know it’s there.”