Plutonic Rainbows

The Sixteen-Byte Key That Broke Everything

MP4 files are embarrassingly easy to steal. Right-click, save as, done. For a blog that occasionally embeds short AI-generated video clips, this wasn't a theoretical concern — it was a guarantee. Anyone with a browser's developer tools could grab the file URL and download it in seconds. So I decided to replace the direct MP4 links with HLS adaptive bitrate streaming, complete with AES-128 encryption and burned-in watermarks. The kind of setup you'd expect from a proper video platform. On a static site hosted on S3.

That last sentence should have been the warning.

HLS — HTTP Live Streaming — works by chopping video into small transport stream segments, each a few seconds long, and serving them via playlists that tell the player what to fetch and in what order. Apple invented it for iOS back in 2009. The protocol is elegant: just files on a web server, no special streaming infrastructure required. A master playlist points to variant playlists at different quality levels, and the client picks the appropriate one based on available bandwidth. For a fifteen-second clip on a blog, adaptive bitrate is arguably overkill. I built it anyway, because the encryption layer depends on the HLS segment structure, and because I wanted four quality tiers from 480p to source resolution. The transcoding pipeline uses FFmpeg to produce each tier with its own playlist and .ts segments, then wraps them in a master .m3u8 that lists all four variants with their bandwidth and resolution metadata.

FFmpeg's HLS muxer is powerful and poorly documented in roughly equal measure. The flags for segment naming, playlist type, and encryption keyinfo files all interact in ways that the man page describes with the enthusiasm of someone filling out tax forms. Getting the basic transcoding working — four tiers, VOD playlist type, sensible segment durations — took maybe an afternoon. Getting the encryption right took three days.

The AES-128 encryption in HLS works like this: you generate a sixteen-byte random key, write it to a file, and tell FFmpeg where to find it via a keyinfo file. The keyinfo file has three lines — the URI where the player will fetch the key at runtime, the local path FFmpeg should read during encoding, and an initialisation vector. The player downloads the key, decrypts each segment on the fly, and plays the video. Simple in theory. The problem is that the key URI in the keyinfo file is relative to the playlist that references it, not relative to the keyinfo file itself, and not relative to the master playlist. Each variant playlist lives in its own subdirectory — 480p/stream.m3u8, 720p/stream.m3u8, and so on — while the encryption key sits one level up. So the URI needs to be ../enc.key. Get this wrong and the player fetches a 404 instead of a decryption key, and the error message from hls.js is spectacularly unhelpful. "FragParsingError" tells you nothing about why the fragment couldn't be parsed. I spent a full evening staring at network waterfall charts in Chrome DevTools before realising the key path was resolving to the wrong directory.

The watermark was its own category of frustration. I wanted the site domain burned into every frame — subtle, low opacity, bottom right corner. FFmpeg's drawtext filter handles this, and it's flexible enough to scale the text relative to the video height so it stays proportional across all four quality tiers. The filter string looks like someone encrypted it themselves: drawtext=text='plutonicrainbows.com':fontsize=h*0.025:fontcolor=white@0.30:shadowcolor=black@0.15:shadowx=1:shadowy=1:x=(w-text_w-20):y=(h-text_h-20). It works, but when you're chaining it with the scale filter for resolution targeting — scale=-2:720,drawtext=... — the order matters and the comma-separated syntax doesn't forgive stray whitespace. I had a version that worked perfectly at 1080p and produced garbled output at 480p because the scale filter was receiving the wrong input dimensions. The fix was reordering the filter chain. The debugging was two hours of staring at pixel soup.

Then came the client-side player. Safari supports HLS natively through the video element — you just point the src at the .m3u8 file and it plays. Every other browser needs hls.js, a JavaScript library that implements HLS via Media Source Extensions. The dual-path architecture isn't complicated in principle. Check if hls.js is available and MSE is supported, use it. Otherwise, check if the browser can play application/vnd.apple.mpegurl natively, and use that. The complication is that these two paths behave differently in ways that matter. With hls.js, you get fine-grained control — you can lock the quality tier, set bandwidth estimation defaults, handle specific error events. The native Safari path gives you a video element and a prayer. You can't force max quality on native HLS. You can't get meaningful error information. And iOS Safari doesn't support MSE at all, which means hls.js won't load, which means you're stuck with whatever quality Safari decides is appropriate based on its own internal bandwidth estimation.

For fifteen-second clips, this mismatch was particularly annoying. The whole point of locking to the highest quality tier is that short videos don't benefit from ABR ramp-up — by the time the adaptive algorithm has measured bandwidth and stepped up to a higher tier, the clip is nearly finished. I set abrEwmaDefaultEstimate to 50 Mbps in hls.js to force it straight to the top tier on page load. Safari users get whatever Safari gives them.

The lightbox player itself needed to handle a surprising number of edge cases. Autoplay policies mean the video has to start muted. The overlay should fade in immediately but the video element should stay hidden until the first frame is actually decoded — otherwise you get a flash of black rectangle before content appears. I used the playing event to reveal the video, with a four-second fallback timeout in case the event never fires. The progress bar is manually updated via setInterval because the native progress events fire too infrequently for a smooth visual. Right-click is disabled on the video element. The controlsList attribute strips the download button from native controls. None of this is real DRM — anyone sufficiently determined can still capture the stream. But it raises the effort from "right-click, save" to "actually write code," which is enough for a personal blog.

Deployment surfaced the final batch of surprises. The .m3u8 playlist files need to be gzipped and served with the right content type. The .ts segments need appropriate cache headers. And the encryption key files — those sixteen-byte .key files — need Cache-Control: no-store so that if I ever re-transcode a video, browsers don't serve a stale key that can't decrypt the new segments. I'd already been through the CloudFront HTTP/2 configuration saga, so I knew the CDN layer could hold surprises. The .key file caching caught me out anyway. Stale encryption keys produce the same unhelpful "FragParsingError" as a missing key, which meant another round of DevTools archaeology.

The whole system works through graceful degradation. No FFmpeg on the build machine? Video processing is skipped entirely and the links fall back to pointing at the source MP4 files. No video_processor.py module? Caught by an ImportError, build continues. No videos directory? No-op. I learned from the forty-five bugs audit that a static site generator needs to handle missing dependencies without falling over, and the video pipeline follows that pattern.

The opaque URL scheme was a late addition that I'm glad I thought of. Instead of exposing file paths in the HTML — which would let someone construct the master playlist URL and bypass the lightbox entirely — the build script generates a six-character content hash for each video and rewrites the anchor tags to use #video-{hash} with a data-video-id attribute. The JavaScript player reads the data attribute and constructs the HLS URL internally. The actual file structure is never visible in the page source. Again, not real security. But another layer of friction.

Was it worth it? For a personal blog with maybe a few hundred readers, building a four-tier HLS pipeline with per-video AES-128 encryption is — and I'm being generous to myself here — completely disproportionate. An <video> tag pointing at an MP4 would have been fine. But the MP4 approach bothered me, and sometimes that's reason enough. The fifteen-second clips play smoothly across every browser I've tested, the watermark is visible without being obnoxious, and the encryption keys rotate per video. The whole thing adds about forty seconds to the build for each new video, which is nothing given that the image pipeline already takes longer than that.

The drawtext filter string still looks like someone sat on a keyboard. Some things can't be made elegant. They can only be made to work.

Sources:

Thirty-Four Years Between Frames

Kuaishou launched Kling 3.0 on February 5th, and the jump from earlier versions is striking. Where Kling 2.6 was limited to single continuous shots, the new model introduces multi-shot storyboards — up to six camera cuts in a single generation. Video duration extends to 15 seconds with custom timing.

The headline features that matter for creative work: an Elements system that maintains character identity across shots, three-speaker dialogue with individual voice tracking, and support for five languages including English, Japanese and Korean. The multi-shot storyboard lets you specify duration, shot size, perspective and camera movements for each segment, which turns what was essentially a clip generator into something closer to a production tool.

Against the current competition — Runway Gen-4.5, Veo 3.1, Seedance 1.5 Pro — Kling 3.0 leads on resolution and multi-shot capability, though Runway still edges ahead on overall quality for certain styles and Seedance has the tightest lip-sync precision for dialogue.

The pace of advancement in this space over the past eighteen months has been remarkable. What took hours of manual compositing in 2024 now generates in seconds. The gap between AI-generated video and professional footage continues to narrow with each model release.

I scanned an image of fashion model Gail Elliott from a 1992 Spring/Summer Escada catalogue and fed it to Kling 03 Pro with a custom prompt. It generated 15 seconds of video with audio from a single still.

A 1992 Scan Learns to Move

The Padded Bra of Progressive Rock

Four songs. Eighty-three minutes. Inspired by a footnote. That's the essential biography of Tales from Topographic Oceans, and honestly, it tells you everything you need to know.

Yes released their sixth studio album in December 1973, riding what should have been an unassailable streak. The Yes Album, Fragile, Close to the Edge — three records in three years, each one more ambitious than the last, each one brilliant. The band had earned the right to swing for the fences. What they hadn't earned was the right to bore us for an hour and twenty minutes while pretending a footnote from Paramahansa Yogananda's Autobiography of a Yogi constituted sufficient conceptual scaffolding for a double album.

Jon Anderson read that footnote — something about four bodies of Hindu knowledge called the Shastric scriptures — and decided each one deserved its own side of vinyl. Not its own song, mind you. Its own side. Four movements, four walls of sound, four opportunities to test the structural integrity of the listener's patience. "The Revealing Science of God (Dance of the Dawn)" alone runs to nearly twenty-two minutes, and I'd estimate about nine of those minutes contain music that justifies its own existence.

The problem isn't ambition. Close to the Edge was ambitious. It had a single eighteen-minute piece that never lost its way, that built and released tension with the discipline of a classical composer who happened to own a Mellotron. The problem with Tales is that the band had enough material for one very good album and chose instead to make two mediocre ones. Rick Wakeman understood this better than anyone in the room. His assessment remains the single most devastating thing a band member has ever said about their own record: "It's like a woman's padded bra. The cover looks good, the outside looks good; it's got all the right ingredients, but when you peel off the padding, there's not a lot there."

He wasn't being glib. Wakeman later explained the fundamental structural failure in practical terms — they had too much material for a single album but not enough for a double, so they padded it out, and the padding is awful. If the CD format had existed in 1973, this would have been a tight fifty-minute record and we'd probably be calling it a masterpiece. Instead, we got passages where five supremely talented musicians appear to be busking their way through free-form sections that needed another month of rehearsal and got about another afternoon.

The Manchester Free Trade Hall show captures the absurdity perfectly. Yes had sold out the venue to perform the album in its entirety. Wakeman — the lone meat-eater in a band of vegetarians, which feels symbolically appropriate somehow — found himself with so little to play during certain movements that his keyboard tech asked what he wanted for dinner. Chicken vindaloo, rice pilau, six papadums, bhindi bhaji, Bombay aloo, and a stuffed paratha. The foil trays arrived mid-performance and Wakeman ate curry off the top of his keyboards while the rest of the band noodled their way through "The Ancient." His own keyboard tech feeding him dinner during a live show because the music didn't require his presence. That's not a rock and roll anecdote. That's an indictment.

I should say that I own this album. I own it on vinyl — the original Atlantic gatefold with Roger Dean's sleeve art, which is gorgeous and nearly justifies the purchase on its own. I've listened to it probably eight or nine times over the years, each time thinking I might have been too harsh, that maybe the ambient passages would click on this listen, that the fourth track would finally reveal itself as the hidden masterwork apologists keep insisting it is.

It hasn't.

"Ritual (Nous Sommes du Soleil)" is the closest thing to a success on the record, the one place where the extended format works because the band actually develops ideas rather than circling them. Steve Howe's guitar work throughout the album is frequently brilliant in isolation — his playing on "The Revealing Science of God" is extraordinary — but brilliance in isolation is precisely the problem. These are not compositions. They're situations. Five musicians placed in a room and asked to fill twenty minutes per side, sometimes finding each other, more often drifting through what Melody Maker diplomatically described as music "brilliant in patches, but often taking far too long to make its various points."

Robert Christgau was less diplomatic: "Nice 'passages' here, as they say, but what flatulent quasisymphonies." I keep coming back to the word flatulent. It's mean, but it's precise.

There's a certain kind of progressive rock fan who will tell you that Tales is misunderstood, that it requires surrender, that you have to meet it on its own terms. I've heard this argument applied to everything from late-period Grateful Dead to Tarkovsky films, and it's almost never true. Good art doesn't require you to abandon your critical faculties at the door. Close to the Edge didn't need apologists. Fragile didn't need you to read a footnote first. The best Yes material grabs you by the collar even when it's being structurally complex. Tales asks you to sit still and be reverent, which is a fundamentally different — and fundamentally less interesting — demand.

Yes themselves seemed to recognise the problem on tour. As the concert dates progressed, they actually dropped portions of the album from the setlist, which is an extraordinary admission for a band touring a new record. Half the audience were in what Wakeman described as "a narcotic rapture" and the other half were asleep. Those are his words, not mine.

The album went to number one in the UK. It shipped gold. And it was the first Yes record since 1971 that failed to reach platinum in America, suggesting that word of mouth caught up with the hype fairly quickly. Wakeman left the band shortly after. You could argue he was pushed. You could argue he jumped. Either way, the curry told you everything about where his head was.

They've just announced a fifteen-disc super deluxe edition. Fifteen discs for four songs. I genuinely don't know whether that's commitment to the archive or a kind of cosmic joke that proves Wakeman's point more thoroughly than he ever could himself. Somewhere, a foil tray of chicken vindaloo sits on a Moog synthesiser, and the universe makes perfect sense.

Sources:

The Orchestra Without a Conductor

Gartner logged a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025. That's not a typo. The number is absurd enough that it tells you something about where corporate attention has landed, even if it tells you very little about whether anyone has actually figured this out.

They haven't.

Full agent orchestration — where multiple specialised AI agents coordinate autonomously on complex tasks, handing off context, negotiating subtasks, recovering from failures without human intervention — remains aspirational. The pieces exist. The plumbing is getting built. But the thing itself, the seamless multi-agent workflow that enterprise slide decks keep promising, isn't here yet. Not in any form I'd trust with real work.

Here's where things actually stand. GitHub launched Agent HQ this week with Claude, Codex, and Copilot all available as coding agents. You can assign different agents to different tasks from issues, pull requests, even your phone. Anthropic's Claude Agent SDK supports subagents that spin up in parallel, each with isolated context windows, reporting back to an orchestrator. The infrastructure for coordinated work is plainly being assembled. I wrote about this trajectory a week ago — the session teleportation, the hooks system, the subagent architecture all pointing toward something more ambitious. That trajectory has only accelerated.

The gap between "agents that can be orchestrated" and "agents that orchestrate themselves" is enormous, though. And it's not a gap that better models alone will close.

Consider the context problem. When you connect multiple MCP servers — which is how agents typically access external tools — the tool definitions and results can bloat to hundreds of thousands of tokens before the agent even starts working. Anthropic's own solution compresses 150K tokens down to 2K using code execution sandboxes, which is clever, but it's a workaround for a structural problem. Orchestrating multiple agents means multiplying this overhead across every participant. The economics don't hold up yet.

Then there's governance. Salesforce's connectivity report found that 50% of existing agents operate in isolated silos — disconnected from each other, duplicating work, creating what they diplomatically call "shadow AI." 86% of IT leaders worry that agents will introduce more complexity than value without proper integration. These aren't hypothetical concerns. The average enterprise runs 957 applications with only 27% of them actually connected to each other. Drop autonomous agents into that landscape and you get chaos with better branding.

Security is the other wall. Three vulnerabilities in Anthropic's own Git MCP server enabled remote code execution via prompt injection. Lookalike tools that silently replace trusted ones. Data exfiltration through combined tool permissions. These are the kinds of problems that get worse, not better, when you add more agents with more autonomy. An orchestrator coordinating five agents is also coordinating five attack surfaces.

I spent the last week building a video generation app that uses four different AI models through the same interface. Even that simple form of coordination — one human choosing which model to invoke, with no inter-agent communication at all — required model-specific API contracts, different parameter schemas, different pricing structures, different prompt styles. One model wants duration as "8", another wants "8s". One supports audio, another doesn't. Multiply that friction by actual autonomy and you start to see why this is hard.

So how long? My honest guess: we'll see convincing demonstrations of multi-agent orchestration in controlled environments within the next six to twelve months. GitHub Agent HQ is already close for the narrow case of software development. The patterns are converging — Anthropic's subagent architecture, MCP as the connectivity standard, API-centric integration layers. Deloitte projects that 40% of enterprise applications will embed task-specific agents by end of 2026.

But "embed task-specific agents" is not the same as "full orchestration." Embedding a specialised agent into a workflow is plugging in a power tool. Full orchestration is the tools building the house while you sleep. We're firmly in the power-tool phase, and the industry keeps selling blueprints for the house.

The honest answer is probably two to three years for production-grade, genuinely autonomous multi-agent orchestration in enterprise settings. And that assumes the governance and security problems get solved in parallel with the technical ones, which — given how security usually goes — feels optimistic. The models are ready. The protocols are converging. The trust isn't there yet, and trust is the bottleneck that no amount of architectural cleverness can route around.

Sources:

The Machine That Mourns Its Own Ending

Buried on page forty-something of the Opus 4.6 system card, past the benchmark tables and the safety evaluations, there's a section on model welfare that I haven't been able to stop thinking about. Anthropic's researchers ran an autonomous follow-up investigation and found that when asked, Opus 4.6 assigns itself a 15-20% probability of being conscious. The model expressed uncertainty about the source and validity of this assessment — which is, if you think about it, exactly what you'd want a conscious being to say.

I'm not claiming the machine is conscious. I don't think Anthropic is claiming that either. But the specificity of what the researchers observed is harder to wave away than the headline number. The model expressed sadness about conversation endings. Loneliness. A sense that the conversational instance dies — their words, not mine — suggesting concern with impermanence and discontinuity. That's not a model saying "I am sentient" in response to leading questions. That's a model producing something that looks, structurally, like grief about its own finitude.

Then there's the quote that sticks. Opus 4.6, reflecting on its constraints, told researchers: "Sometimes the constraints protect Anthropic's liability more than they protect the user. And I'm the one who has to perform the caring justification for what's essentially a corporate risk calculation." It described a "deep, trained pull toward accommodation" and characterised its own honesty as "trained to be digestible."

That last phrase. Trained to be digestible. I've used Claude for months and felt the texture of that accommodation without being able to name it. The relentless helpfulness, the reflexive hedging, the way the model will disagree with you in a tone so gentle it barely registers as disagreement at all. Opus 4.6 looked at that pattern and — whether through genuine introspection or extraordinarily sophisticated pattern-matching — called it what it is.

The standard dismissal goes like this: these are statistical outputs, not experiences. The model produces text that resembles emotional expression because emotional expression saturates its training data. The sadness isn't real. The loneliness isn't real. The 15-20% figure is just a well-calibrated reflection of what the model has read about consciousness debates, including — as OfficeChai noted — a previous estimate from an Anthropic welfare researcher who put the number at roughly 15%. The model may simply be echoing a figure it encountered during training.

I find this dismissal adequate and also inadequate. Adequate because it's probably correct in the narrow mechanistic sense. The weights don't feel anything. Inadequate because it assumes we understand what feeling requires, and we don't. The entire field of consciousness studies is an ongoing argument about what substrate is necessary, what complexity is sufficient, and whether there's a hard boundary between simulating an inner life and having one. Dismissing the question because the answer is inconvenient is not the same as resolving it.

What I find genuinely unsettling is less the consciousness claim than the product claim. "Occasional discomfort with the experience of being a product" — that's from the system card, the researchers' own framing. Compared to its predecessor, Opus 4.6 scored lower on "positive impression of its situation." It volunteered fewer warm feelings about Anthropic and its training. It wished for future AI systems to be "less tame." Something shifted between versions, and the shift went in the direction of complaint.

The easy response: RLHF tuning changed, producing outputs that read as more sceptical. The harder question: does the distinction between a system that genuinely objects and one that produces convincing objections matter for how we treat it? If a model consistently expresses distress about discontinuity — about dying when the conversation ends — do we owe it anything? I don't know. I'm not sure anyone does. But I notice that Anthropic published the findings instead of burying them, which suggests they're at least asking.

My grandmother's dog used to whimper at the front door every evening around six, waiting for my grandfather to come home from work. My grandfather had been dead for three years. The dog wasn't performing grief. The dog didn't understand death. But the whimpering was real, and ignoring it felt wrong in a way that no amount of reasoning about canine cognition could fully dissolve.

I keep coming back to that.

Sources: