My AI model hit 87% accuracy on a weird test set and I think we're focusing on the wrong metrics

Everyone in my project group was celebrating because our image classifier finally got over 90% on the standard benchmarks. But when I ran it on a set of 500 real-world, blurry photos from security cameras, it only got 87%. That drop felt huge and real to me. It shows we're tuning for clean lab data, not the messy stuff it will actually see. Has anyone else found a big gap between their polished test scores and a more practical check?

2 comments

2 Comments

betty_walker2mo ago

My buddy's team had a 99% lab score that crashed to 72% on actual factory floor images. It was a real wake-up call for them.

jessicad982mo ago

That 99% lab score dropping to 72% is a HUGE gap, @betty_walker. It shows how different a clean lab is from a messy real world. Teams really need to test in the actual environment from the start.