27
TIL a $300 lesson about training a custom AI model on the wrong data
I spent about three weeks trying to build a simple image classifier for my garden project, using a dataset I scraped from random websites. The model kept giving me weird results, like calling a tomato a 'small red car'. Turns out the dataset had a bunch of mislabeled junk mixed in. I wasted around $300 on cloud compute credits running those bad training cycles. I should have cleaned and verified my data first, or just used a smaller, verified set. Has anyone else gotten burned by a bad dataset, and how do you vet yours now?
3 comments
Log in to join the discussion
Log In3 Comments
spencer_owens582d ago
I used to skip data cleaning, but a mistake like that would totally change my mind.
7
dakotab932d ago
Man, that's rough. I read a blog post from a guy who trained a model to spot manufacturing defects, but his training photos had a specific time stamp in the corner. The AI just learned to look for that timestamp, not the actual cracks. It's crazy how it picks up on the wrong stuff. I'm paranoid about my data now and try to do a manual check on a random sample before any training run.
5
susanb342d ago
So the AI basically became a super expensive timestamp detector? Did the guy at least get a refund on all that compute time he wasted?
2