On November 30, 2022, ChatGPT launched. It was revolutionary, AI could understand context and hold conversations in a way we’d never seen before.
Every single day, humans produces billions of gigabytes of data in the form of text, videos, images, audio.
That’s just a single day of human-generated knowledge. Since the internet began, the volume is almost unimaginable. ChatGPT and similar models are trained on this ocean. which gives them their intelligence.
But things changed the moment ChatGPT launched. Now, most of the data on the internet is AI-generated blog posts, YouTube descriptions, Instagram captions, tweets. Web is slowly filling with synthetic data. And if future AI systems are trained mostly on AI-made data, the quality of their reasoning and creativity will inevitably degrade. Garbage In, Garbage Out. Researchers call this model collapse, when AI systems trained on synthetic data gradually lose the ability to generate diverse, high-quality outputs.
Image credit: Fabio Duarte
This brings me back to the main thought of this blog. In the given image, you can see data generation is rapidly growing nowdays, because user's of internet increasing day by day By then, the internet would have purely human-created data more books, more blogs, more authentic conversations. Training AI on big amount of data could have produced even sharper, more reliable intelligence than what we have today.
The question is not just what AI can do for us today. The real question is What kind of data are we leaving behind for the next generation of models? Will the future of intelligence be built on the wisdom of humans? Or will it be built on AI learning endlessly from itself, in a feedback loop that eventually collapses?
This is 3am thought. so just chill!