newsweekshowcase.com

Anthropic used thousands of Swiped YouTube Videos to train their Artificial Intelligence

Wired: https://www.wired.com/story/youtube-training-data-apple-nvidia-anthropic/

Investigating YouTube Subtitles and The Verge: A Multi-Agent Lookup Tool for Users to See What You’re Watching

Our investigation found that Apple and Anthropic used subtitles from nearly 187,000 YouTube videos, which sucked from more than 48,000 channels.

Videos from popular creators like MrBeast and Marques Brownlee appear in the dataset, as do clips from news outlets like ABC News, the BBC, and The New York Times. More than 100 videos from The Verge appear in the dataset, along with many other videos from Vox.

As part of its investigation, Proof News also released an interactive lookup tool. If you use its search feature, you can see if your videos appear in the dataset.

“I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” she told The Wall Street Journal at the time. When pressed by the Journal about YouTube content specifically, Murati said she “wasn’t sure about that.”

“We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that’s how I felt about it,” Pichai said.

The David Pakman Show: How Do You Make Your Videos? When AI Gets Your Video, And How It Is Used To Get Its Own

Among the megastars that Proof News found material from was MrBeast, with 289 million subscribers, two videos taken for training and Jacksepticeye, with 31 million subscribers, 377 videos taken. Some of the material used to train AI also promoted conspiracies such as the “flat-earth theory.”

“No one came to me and said, ‘We would like to use this,’” said David Pakman, host of The David Pakman Show, a left-leaning politics channel with more than 2 million subscribers and more than 2 billion views. Nearly 160 of his videos were swept up into the YouTube Subtitles training dataset.

Four people work full time on Pakman’s enterprise, which posts multiple videos each day in addition to producing a podcast, TikTok videos, and material for other platforms. The data should be paid for by the company if they are paid. He pointed out that some media companies have written agreements to pay for their work to train artificial intelligence.

“This is my livelihood, and I put time, resources, money, and staff time into creating this content,” Pakman said. There is no shortage of work.

Exit mobile version