💡
7
c/ethical-frontiers•rubyk26rubyk26•22d ago

TIL a huge number of AI training images came from one site without clear permission

Read a report from the University of Amsterdam. They found LAION-5B, a massive dataset, used over 5 billion images from Common Crawl. Many were personal photos from Flickr, taken without asking the photographers. Makes you wonder who really owns the data behind these models. Has anyone else seen stats on where their training data actually comes from?
3 comments

Log in to join the discussion

Log In
3 Comments
taylor.reese
That Common Crawl scrape is a huge mess. I had to check my own portfolio after reading about the Getty case reed.skyler mentioned. Found a few of my old Flickr shots in a dataset audit tool. The best you can do right now is run your URLs through haveibeentrained.com to see what's been scraped.
3
reed.skyler
Yeah, I saw a piece about how Getty Images is suing over this exact thing!
2
william864
william86422d ago
Wait, they used billions of photos without even asking?
1