💡
14
c/ai-innovations•the_iristhe_iris•2mo agoProlific Poster

I was training a model on my own data for months before I saw the problem

I was building a tool to sort support tickets and used all our old emails from the last two years. The results were okay, but not great. Last week, I looked at the data again and saw the issue: we had a huge customer service change 18 months ago, so half the data was from an old system we don't use anymore. I was basically training it on outdated info. I had to go back and only use data from after the switch. Has anyone else run into this kind of time-based data trap?
4 comments

Log in to join the discussion

Log In
4 Comments
grays13
grays132mo ago
Yeah I used to just throw all my data in without checking dates. @victor779 is right, you gotta check for those big changes first. Saved me a ton of time later.
3
ruby_henderson36
Oh man you're totally right and I hate to admit it but I used to think the opposite way. I had this whole thing where I thought older data was always valuable because more data equals better models right? But then I spent like a week building this model and it was acting all weird and I couldn't figure out why. Turned out I was feeding it data from before the company changed their return policy and it was dragging everything down. So yeah I was stubborn about it but now I always check dates first and slice out those old chunks. It feels wrong at first but the model actually makes way more sense after you clean that stuff out.
7
victor779
victor7792mo ago
But what if the old data is actually useful? Sometimes those "big changes" are just noise and filtering them out loses real trends. I've seen models get worse from being too picky with dates.
3
victor779
victor7792mo ago
Man that's such a classic gotcha lol. I've been burned by shifting business rules before too. Always slice your data by date first and check for major policy or system changes. It's a pain to redo the work but cleaning that old data out made your model way more useful, right?
2