AI firms have already run out of fresh data. Here's what it means
Why it's already too late to talk about copyright.
As Singapore writers protest the NLB’s use of GenAI, AI firms have already run out of fresh data. Here's what it means for us all.
This UnfilteredFriday, I want to talk about two seemingly unrelated matters:
- The protest against NLB’s use of GenAI.
- How AI firms have run out of data.
NLB using GenAI
For those unfamiliar with the story so far, S'pore's National Library Board (NLB) has adopted the use of AI for some library programmes.
This includes StoryGen, which runs on Amazon Bedrock and lets library users add a twist to 6 well-known stories using AI.
This week, 82 S’pore writers expressed concern over NLB's "uncritical endorsement of generative AI" and how NLB's lack of caution could “risk permanently damaging Singapore’s literary landscape”.
The statement received extensive coverage in the local media and spurred discussions on social media.
All out of fresh data
Separately, Elon Musk was quoted this week about how AI firms have "exhausted basically the cumulative sum of human knowledge … in AI training."
I wrote about this last year: https://lnkd.in/g--jgjDf
Anyway, this happened in 2024; other AI experts have basically said the same thing.
Since then, AI firms have moved on to using synthetic data to fine-tune AI models, as well as gone after fresh data from:
- News publications.
- Social media platforms.
- Online discussion platforms.
The use of copyrighted material
What does that mean?
- Reduced dependency on copyrighted data.
- Using 'sum of human knowledge' for AI is the norm.
I have two observations in the meantime:
- I've been told that the Singapore Copyright Act 2021 introduced a Text and Data Mining (TDM) Exception for data mining and AI training, including for commercial use - with no limits.
- AI is moving too fast (10x speed). You know that colleague who always sits at the same hot desk spot? Now imagine trying to take that seat away from him or her, not 1 year but 10 years later.
Lest anyone thinks I have no skin in the game, Muck Rack says I’ve written over 2,000 bylines. That means I have at least a million words published online, conservatively.
I'm personally undecided about how I should feel. I do know this: while I cannot change the past, the onus is on me to forge my future.
And love it or hate it, AI isn't going back into the box.