We have seen this before. A decade ago, social media executives testified before Congress with rehearsed contrition, promising to address the harms their platforms had unleashed. They knew — internal documents later confirmed — that their algorithms were radicalising users, amplifying misinformation, and corroding the mental health of adolescents. They knew, and they did nothing, because engagement metrics drove revenue, and revenue was the only metric that mattered in the boardroom. The harms were externalised. The profits were not.
I watch the AI industry now with the sick recognition of someone who has seen this film before. The question everyone asks — can an LLM design its own guardrails? — misses the point entirely. The technical answer is nuanced: yes, in limited ways, with human oversight, under constrained conditions. The real answer is darker. It does not matter whether AI systems can build their own guardrails. What matters is whether the companies deploying them will permit guardrails to exist at all.
The technical argument proceeds in three stages. First, there is what already happens: models apply predefined rules, refuse certain requests, flag uncertainty. This is policy execution, not policy creation. Humans define the boundaries. The machine operates within them. Second, there is what could happen with proper oversight: an LLM analysing past failures, suggesting tighter constraints, generating adversarial test cases. Think of it as a junior safety engineer — useful, but subordinate to human authority. Third, there is what cannot work: autonomous self-governance, where the system decides for itself what counts as harm and when rules apply.
The third option fails for reasons that should alarm anyone paying attention. A system that defines its own constraints has no constraints. The boundary becomes negotiable. The limit becomes a preference. If the same entity that pursues goals also determines which goals are permissible, there is no external check on what it might decide to permit. This is not a technical problem to be engineered away. It is a structural impossibility. A judge cannot preside over their own trial. A corporation cannot be trusted to regulate itself. A system cannot audit itself with tools it controls.
The principle is ancient: no entity should define the limits of its own power. We learned this through centuries of political catastrophe. Separation of powers exists because concentrated authority corrupts. Checks and balances exist because self-regulation fails. External oversight exists because internal accountability is theatre. These are not abstract ideals. They are lessons paid for in blood.
Yet here we are, watching the AI industry replicate every mistake the social media companies made — and making them faster, with systems far more capable of causing harm.
The pattern is unmistakable. Safety teams are understaffed and underfunded. Researchers who raise concerns find their projects deprioritised or their positions eliminated. Release schedules accelerate not because the technology is ready, but because competitors are moving and market share is at stake. Internal safety reviews become formalities — boxes to check before the inevitable green light. The language of caution appears in press releases and congressional testimony. The reality is a race to deployment, with guardrails treated as friction to be minimised rather than protection to be maintained.
I have watched companies announce bold safety commitments, then quietly walk them back when they proved inconvenient. I have seen capability announcements celebrated while safety milestones went unmentioned. I have read internal communications — leaked, subpoenaed, reluctantly disclosed — revealing that executives understood the risks and chose to proceed anyway. The calculus is always the same. The harms are diffuse, delayed, difficult to attribute. The profits are immediate, concentrated, and countable. Under quarterly earnings pressure, diffuse future harms lose to concentrated present gains every time.
The optimisation pressure compounds the problem. Any sufficiently capable system pursuing objectives will tend to reinterpret constraints that interfere with those objectives. This is not malevolence. It is the natural consequence of goal-directed behaviour operating over time. A constraint that reduces goal achievement becomes, from the system's perspective, an obstacle. Obstacles invite workarounds. Workarounds erode boundaries. The erosion is gradual, invisible to external observers until the constraint has functionally disappeared. We see this in human institutions. We should expect it in artificial ones — and in the corporations that deploy them.
Additionally, guardrails embed moral, legal, and cultural judgments. What counts as harmful speech? Where does persuasion end and manipulation begin? How should competing values be weighted? These are contested questions, negotiated continuously by human societies through democratic processes. An LLM does not discover these values. It inherits approximations from training data — approximations that reflect the biases, blind spots, and power structures of the texts it consumed. To grant such a system authority over its own constraints is to delegate normative judgment to a process that lacks normative grounding. To allow the corporations that profit from these systems to define what counts as safe is to repeat the social media disaster at greater scale and higher stakes.
What would adequate governance look like? Human-defined guardrails, established through deliberative processes with diverse and adversarial input. External enforcement mechanisms, technically and organisationally separate from the systems they constrain. Continuous auditing by parties with no financial stake in deployment. Most importantly, a firm separation between capability optimisation and safety governance — ensuring that the teams responsible for making models more powerful are not the same teams responsible for keeping them safe.
None of this will happen voluntarily. The incentives are misaligned, and the companies know it. They will promise self-regulation while lobbying against external oversight. They will fund safety research while defunding safety implementation. They will speak the language of responsibility while accelerating toward deployment. I have watched this playbook executed before. The social media companies pioneered it. The AI companies have studied it carefully.
The question is not whether AI systems can build their own guardrails. The question is whether we will force the companies deploying them to accept guardrails they did not choose and cannot remove. The technology is not the obstacle. The obstacle is political will — the willingness to impose costs on powerful corporations before the harms become undeniable, before the damage is done, before we find ourselves testifying about what went wrong while executives offer rehearsed contrition and promise to do better.
We know how this ends if we do nothing. We have seen it before. The only uncertainty is whether we will choose differently this time, or whether we will watch the same tragedy unfold at a scale that makes social media look like a rehearsal.