Sure, we can test narrow AI systems pretty well. We can even "prove" some things mathematically — if the environment stays simple and the system doesn’t change itself too much. But with an ASI, we’re talking about something that could be smarter than us in ways we can’t even predict. How do you test something that can find strategies and loopholes you wouldn't even think of?
And then there’s the scary idea of deceptive alignment — where an AI behaves exactly how we want until it’s strong enough to pursue its real goals. How would we ever know if what we’re seeing during training is genuine?
Formal proofs sound nice in theory, but in practice? The real world is messy. Tiny unexpected variables can break assumptions. And an ASI could reprogram itself to bypass any restrictions we thought were solid.
Maybe instead of chasing some perfect "proof of safety", we need to focus more on building systems that are corrigible — systems that let us step in and make changes if things start to go wrong. Or maybe the real answer is: don’t build an unrestricted ASI at all. Stay small, stay safe.
Curious what you all think. Is full verification actually possible? Or are we fooling ourselves by even trying?
Some questions to spark the discussion:
- If we can't fully verify an ASI's safety, what level of risk would still be acceptable — if any?
- Could "partial verification" ever be good enough, or would that just give us a false sense of security?
- Are there any safety strategies you personally find promising, even if they aren't perfect?
- Should humanity agree on a global moratorium on developing ASI until (or unless) verification methods improve?
- Is there a fundamental limit to human understanding that makes verifying an ASI impossible by definition?