From Swift to the Browser: Building a Web Client for a Native Streaming Server

ShowShark has native clients for iOS, tvOS, macOS, and visionOS. They all speak the same protocol: Protobuf messages over a TLS WebSocket. The server transcodes media in real time with GStreamer, and the clients decode H.264 or H.265 with VideoToolbox. It works well, and if every person who ever wanted to watch something owned an Apple device, this post would not exist.

Sadly, web browsers exist. And sometimes people want to watch things in browsers. So I built a web client in vanilla TypeScript, expecting the hardest part to be video decoding. It was not. The hardest part was everything else.

The Plan

The web client needed to:

  1. Serve a static page from the ShowShark server itself (no separate web server)
  2. Connect to the same WebSocket the native clients use
  3. Speak the same Protobuf protocol
  4. Decode H.264 video and AAC audio in real time
  5. Synchronize audio and video for playback
  6. Work in both Chrome and Safari (for the fun of it, I compiled Ladybird browser, but I couldn't get past what appears to be a lack of support for trusting self-signed certificates)

Requirements 1 through 5 were straightforward. Requirement 6 was where the trouble started.

Architecture

The web client sits alongside the existing server infrastructure. The HTTP server that originally served watchOS polling clients got promoted to also serve static files and a configuration endpoint:

  ShowShark Server
  ┌──────────────────────────────────────────────────────────┐
  │                                                          │
  │   Port 18080 (WSS)          Port 18083 (HTTPS)           │
  │   ┌──────────────┐          ┌──────────────────────┐     │
  │   │  WebSocket   │          │  HTTP Server         │     │
  │   │  Server      │◄─ wss ───│                      │     │
  │   │              │          │  GET /           → index.html
  │   │  (Protobuf   │          │  GET /main.js    → bundle  │
  │   │   messages)  │          │  GET /styles.css → CSS     │
  │   │              │          │  GET /api/config → JSON    │
  │   │              │          │  POST /login     → watchOS │
  │   └──────────────┘          └──────────────────────┘     │
  │         ▲                           ▲                    │
  │         │                           │                    │
  └─────────┼───────────────────────────┼────────────────────┘
            │                           │
     Binary Protobuf              HTTPS + Static
     over WebSocket               Files + Config
            │                           │
  ┌─────────┴───────────────────────────┴────────────────────┐
  │                                                          │
  │                    Web Browser                           │
  │                                                          │
  │   ┌────────────┐  ┌────────────┐  ┌────────────────────┐ │
  │   │ Connection │  │ Video      │  │ Audio              │ │
  │   │ Manager    │  │ Pipeline   │  │ Pipeline           │ │
  │   │            │  │            │  │                    │ │
  │   │ Protobuf   │  │ WebCodecs  │  │ WebCodecs          │ │
  │   │ serialize/ │  │ VideoDecoder  │ AudioDecoder       │ │
  │   │ deserialize│  │     │      │  │     │              │ │
  │   │            │  │     ▼      │  │     ▼              │ │
  │   │ Request/   │  │  Canvas    │  │ Web Audio API      │ │
  │   │ Response   │  │  drawImage │  │ ScriptProcessor    │ │
  │   │ correlation│  │            │  │ → Ring Buffer      │ │
  │   └────────────┘  └────────────┘  └────────────────────┘ │
  │                                                          │
  └──────────────────────────────────────────────────────────┘

The client loads index.html from port 18083 over HTTPS, fetches /api/config to discover the WebSocket port, and connects to port 18080 over WSS. Two ports, two TLS connections, one self-signed certificate. Remember this detail; it will matter.

Protobuf in TypeScript

The native clients use SwiftProtobuf, generated with protoc --swift_out. For the web client, I used @bufbuild/protobuf (formerly known as Buf), which generates TypeScript from the same .proto files:

# buf.gen.yaml
version: v2
plugins:
  - local: protoc-gen-es
    out: src/proto
    opt: target=ts
inputs:
  - directory: ../ShowShark/Common/Protocol

The build script compares timestamps on .proto files against generated _pb.ts files and regenerates only when needed. The generated TypeScript gives you typed message creation and binary serialization:

import { create, toBinary, fromBinary } from "@bufbuild/protobuf";
import { MessageEnvelopeSchema } from "./proto/MessageEnvelope_pb.js";

const envelope = create(MessageEnvelopeSchema, {
    requestId,
    messageType: MessageType.CONTROL_REQUEST,
    message: { case: "loginRequest", value: loginRequest },
});

const bytes = toBinary(MessageEnvelopeSchema, envelope);
ws.send(bytes);

This mirrors the native client pattern almost exactly. The MessageEnvelope wraps every message with a requestId for correlation and a oneOf for the payload. Responses come back with a correlationId matching the original requestId. The ConnectionManager class keeps a Map<string, PendingRequest> with a 30-second timeout per request, which is the same correlation scheme the Swift WebSocketClient uses.

One nice surprise: the @bufbuild/protobuf library is the only runtime dependency. The entire web client has exactly one entry in dependencies:

{
  "dependencies": {
    "@bufbuild/protobuf": "^2.0.0"
  }
}

No React. No Vue. No framework of the month. Just TypeScript, the DOM, and Protobuf.

The Bundling Detour

The first version used tsc to compile TypeScript to JavaScript, producing one .js file per .ts file with bare module specifiers like import { create } from "@bufbuild/protobuf". Browsers do not resolve bare specifiers. A <script type="module"> tag will happily try to fetch @bufbuild/protobuf as a URL path and get a 404.

The fix was esbuild, which bundles everything into a single file (with code splitting) and resolves all imports at build time:

npx esbuild src/main.ts \
    --bundle \
    --format=esm \
    --target=es2022 \
    --outdir=dist \
    --sourcemap \
    --splitting

Build time: about 150ms. The entire web client, including Protobuf runtime, compiles to roughly 180KB of JavaScript. No webpack configuration file the size of a novella.

The HTTPS Requirement

Here is the first web development surprise. The WebCodecs API (which we need for hardware-accelerated H.264 decoding) requires a Secure Context. That means the page must be served over HTTPS, localhost, or file://. Plain HTTP will not work; VideoDecoder simply does not exist in the global scope.

This created a chain reaction. HTTPS requires TLS certificates. ShowShark already generates a self-signed certificate for its WebSocket server, so the HTTP server reuses the same certificate. But self-signed certificates trigger browser warnings. The user has to click through "Your connection is not private" to reach the page.

Then there is a second constraint: a page served over HTTPS cannot make ws:// (plain WebSocket) connections. The browser blocks it as mixed content. So the WebSocket connection must also use wss://, which means TLS, which means the same self-signed certificate on a different port.

  HTTPS page (port 18083)  ──────────────────────────────────►  OK after user accepts cert
       │
       │  tries ws://host:18080
       │
       └──► BLOCKED (mixed content)

       │  tries wss://host:18080
       │
       └──► needs separate cert acceptance (Safari)

Chrome handles this gracefully. Once you accept the self-signed certificate for the HTTPS page, Chrome trusts the same certificate on other ports for the same hostname. The WSS connection succeeds immediately.

Safari does not.

The Safari Certificate Problem

Safari treats each hostname:port combination as a separate TLS origin for certificate trust. Accepting the certificate on https://192.168.1.50:18083 does not grant trust to wss://192.168.1.50:18080. The WebSocket connection fails silently; the onerror event fires, followed by onclose, with no useful error message. The close code is 1006 (abnormal closure). There is no way to distinguish "the server is down" from "the certificate is not trusted."

The connection manager tracks consecutive failures:

private _hasEverConnected = false;
private _consecutiveFailures = 0;
private static readonly CERT_ERROR_THRESHOLD = 3;

// In the onclose handler:
if (!this._hasEverConnected) {
    this._consecutiveFailures++;
    if (this._consecutiveFailures >= ConnectionManager.CERT_ERROR_THRESHOLD) {
        this.onCertErrorCallback?.();
        return; // Stop auto-reconnecting
    }
}

After three failures with no prior successful connection, the client assumes it is a certificate trust issue and shows a dedicated UI:

function showCertTrustRequired(container, wsUrl, conn) {
    // Build an HTTPS URL for the WSS port
    const wssUrlObj = new URL(wsUrl);
    const certUrl = `https://${wssUrlObj.hostname}:${wssUrlObj.port}`;

    // Show instructions with a clickable link
    // "Open the link below, accept the certificate warning,
    //  then come back and click Retry."
    const link = document.createElement("a");
    link.href = certUrl;
    link.target = "_blank";
    link.textContent = certUrl;
    // ...
}

The user clicks the link, which opens a new tab to https://192.168.1.50:18080. Safari shows its certificate warning. The user clicks "visit this website." Now Safari trusts the certificate on port 18080. They close the tab, come back, click "Retry Connection," and the WSS handshake succeeds.

It is not elegant, but there is no API to programmatically prompt for certificate trust. The browser provides no mechanism to say "I know this cert is self-signed; trust it anyway" from JavaScript. You have to navigate the user to the HTTPS URL on the target port and let the browser's built-in UI handle it.

  Safari Certificate Trust Flow
  ─────────────────────────────

  1. User visits https://server:18083
     └── Safari: "certificate warning" → User accepts → page loads

  2. Page tries wss://server:18080
     └── Safari: silently rejects (different port = different origin)
     └── Client: onerror → onclose(1006) → retry
     └── Client: onerror → onclose(1006) → retry
     └── Client: onerror → onclose(1006) → 3 failures, show cert UI

  3. Cert UI shows link: https://server:18080
     └── User clicks → new tab → Safari: "certificate warning"
     └── User accepts → sees raw page (404 or empty)
     └── User closes tab, returns to web client

  4. User clicks "Retry Connection"
     └── wss://server:18080 → SUCCESS
     └── Normal auth + browsing begins

Chrome users never see this flow. Their experience is: accept one cert warning, everything works. Safari users get a two-step certificate dance. This is one of those browser differences that you discover only by testing, because no documentation warns you that port-scoped certificate trust is even a thing.

WebCodecs: The Good Part

Once the connection is established and authenticated, the server sends a StreamInitialization message containing the video dimensions, sample rate, channel count, and the raw SPS and PPS NAL units from the H.264 encoder. On native clients, these go straight to VideoToolbox. On the web, they go to the WebCodecs VideoDecoder:

const codecString = extractCodecString(init.spsNal);
// e.g., "avc1.640029" = High profile, level 4.1

const decoder = new VideoDecoder({
    output: (frame: VideoFrame) => {
        frameQueue.push({ frame, pts: frame.timestamp / 1_000_000 });
    },
    error: (error: DOMException) => {
        console.error("VideoDecoder error:", error.message);
    },
});

The codec string is extracted directly from the SPS NAL unit bytes. H.264 codec strings follow the format avc1.XXYYZZ where XX is the profile, YY is the compatibility flags, and ZZ is the level, all read from bytes 1, 2, and 3 of the SPS:

function extractCodecString(sps: Uint8Array): string {
    const profile = sps[1];
    const compatibility = sps[2];
    const level = sps[3];
    const hex = (n: number) => n.toString(16).padStart(2, "0");
    return `avc1.${hex(profile)}${hex(compatibility)}${hex(level)}`;
}

This was refreshingly straightforward. No guessing codec strings from a list; just read the bytes the encoder already produced.

WebCodecs: The Bad Part (AVCC vs. Annex B)

H.264 has two ways to frame NAL units. This is where things went sideways.

AVCC (also called "AVC format" or "MP4 format") prefixes each NAL unit with a 4-byte big-endian length. The decoder also needs an out-of-band avcC record containing the SPS and PPS. This is what you find inside MP4 containers and what VideoToolbox expects.

Annex B (also called "byte stream format") prefixes each NAL unit with the start code 0x00 0x00 0x00 0x01. SPS and PPS are sent in-band, prepended to keyframes. This is what you find in MPEG-TS streams and what many software decoders expect.

  AVCC framing:
  ┌──────────┬────────────────────┬──────────┬───────────────────┐
  │ 4 bytes  │   NAL unit data    │ 4 bytes  │   NAL unit data   │
  │ (length) │                    │ (length) │                   │
  └──────────┴────────────────────┴──────────┴───────────────────┘
  + separate avcC record with SPS/PPS

  Annex B framing:
  ┌──────────────┬───────────────┬──────────────┬───────────────────┐
  │ 00 00 00 01  │   SPS NAL     │ 00 00 00 01  │   PPS NAL         │
  ├──────────────┼───────────────┤──────────────┼───────────────────┤
  │ 00 00 00 01  │   IDR slice   │ 00 00 00 01  │   non-IDR slice   │
  └──────────────┴───────────────┴──────────────┴───────────────────┘
  SPS/PPS repeated before each keyframe

ShowShark's GStreamer encoder outputs AVCC. The native clients feed AVCC data directly to VideoToolbox. So the obvious approach for the web client was to configure WebCodecs in AVCC mode too:

// Build the avcC descriptor from SPS and PPS NAL units
function buildAvcCDescription(sps: Uint8Array, pps: Uint8Array): Uint8Array {
    // avcC box format:
    // 1 byte  configurationVersion = 1
    // 1 byte  AVCProfileIndication = sps[1]
    // 1 byte  profile_compatibility = sps[2]
    // 1 byte  AVCLevelIndication = sps[3]
    // 1 byte  lengthSizeMinusOne = 3 | 0xFC
    // 1 byte  numSPS = 1 | 0xE0
    // 2 bytes SPS length
    // N bytes SPS data
    // 1 byte  numPPS = 1
    // 2 bytes PPS length
    // M bytes PPS data
    const length = 11 + sps.length + pps.length;
    const desc = new Uint8Array(length);
    // ... pack the bytes ...
    return desc;
}

decoder.configure({
    codec: codecString,
    codedWidth: init.width,
    codedHeight: init.height,
    description: avcCDescription,  // ← the key part
});

This worked perfectly in Chrome. Frames decoded, video played, everyone was happy.

Then I opened Safari.

Safari's WebCodecs VideoDecoder, when configured with an avcC description, throws: "Decoder failure." Every frame, rejected. The decoder enters an error state immediately on the first decode() call.

Safari's VideoDecoder wants Annex B. No description in the config. NAL units with start codes instead of length prefixes. SPS and PPS prepended in-band to keyframes.

So I switched to Annex B for all browsers. This requires converting the server's AVCC output:

function avccToAnnexB(avccData: Uint8Array): Uint8Array {
    const view = new DataView(
        avccData.buffer, avccData.byteOffset, avccData.byteLength
    );
    const nalUnits: Uint8Array[] = [];
    let offset = 0;

    // Walk through length-prefixed NAL units
    while (offset + 4 <= avccData.length) {
        const nalLength = view.getUint32(offset);
        offset += 4;
        if (nalLength === 0 || offset + nalLength > avccData.length) break;
        nalUnits.push(avccData.subarray(offset, offset + nalLength));
        offset += nalLength;
    }

    // Replace each length prefix with Annex B start code
    const result = new Uint8Array(totalLength);
    let writeOffset = 0;
    for (const nalUnit of nalUnits) {
        result.set([0x00, 0x00, 0x00, 0x01], writeOffset);
        writeOffset += 4;
        result.set(nalUnit, writeOffset);
        writeOffset += nalUnit.length;
    }
    return result;
}

And prepending SPS/PPS to keyframes:

if (isKeyframe && spsAnnexB.length > 0) {
    // Concatenate: [start_code + SPS] [start_code + PPS] [converted frame data]
    data = new Uint8Array(
        spsAnnexB.length + ppsAnnexB.length + annexBData.length
    );
    data.set(spsAnnexB, 0);
    data.set(ppsAnnexB, spsAnnexB.length);
    data.set(annexBData, spsAnnexB.length + ppsAnnexB.length);
}

Safari now decoded video perfectly. I opened Chrome to verify it still worked.

Chrome threw: "A key frame is required after configure()." Every frame, rejected. The decoder entered an error state on the first decode() call.

Chrome's VideoDecoder, when configured without a description, rejects all frames. It does not matter that the first frame is a keyframe with SPS/PPS prepended. Chrome requires the AVCC description in the config; without it, the decoder does not know how to interpret the incoming data.

To summarize:

                         With avcC description    Without description
                         (AVCC mode)              (Annex B mode)
  ┌──────────────────────────────────────────────────────────────────┐
  │  Chrome               ✓ works                 ✗ "key frame       │
  │                                                  required"       │
  │                                                                  │
  │  Safari               ✗ "Decoder failure"     ✓ works            │
  └──────────────────────────────────────────────────────────────────┘

There is no mode that works in both browsers. The WebCodecs specification does not mandate which framing format implementations must accept. Chrome's implementation (built on FFmpeg) wants AVCC. Safari's implementation (built on VideoToolbox, ironically the same decoder the native clients use) wants Annex B.

The solution is browser detection. Yes, user agent sniffing, the thing every web developer is told never to do:

const isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);

The negative lookahead excludes Chrome (which includes "Safari" in its user agent string, because of course it does). The decoder then branches:

if (this.useAnnexB) {
    // Safari: no description, SPS/PPS in-band
    decoder.configure({
        codec: codecString,
        codedWidth: init.width,
        codedHeight: init.height,
    });
} else {
    // Chrome: avcC description, raw AVCC data
    const description = buildAvcCDescription(init.spsNal, init.ppsNal);
    decoder.configure({
        codec: codecString,
        codedWidth: init.width,
        codedHeight: init.height,
        description: description,
    });
}

In Annex B mode, delta frames arriving before the first keyframe are silently dropped. The native clients do the same thing; the video decoder ignores frames until it has a reference frame to decode against.

I want to be clear about what happened here. The WebCodecs API is a W3C specification with a formal codec registry that defines how AVC (H.264) should work. The registry mentions both Annex B and avcC. But it leaves the choice of which to support up to the implementation. Chrome chose one. Safari chose the other. The spec's abstract language about "codec-specific considerations" translates to "good luck" in practice.

AAC Audio: A Quieter Incompatibility

The audio side had a similar but less dramatic browser split. The server sends AAC audio in ADTS framing (the framing used in MPEG-TS and raw AAC streams). Each ADTS frame has a 7-or-9 byte header containing the sample rate, channel count, profile, and frame length.

Chrome's AudioDecoder happily accepts a buffer containing multiple concatenated ADTS frames. It parses them internally, decodes each one, and outputs the corresponding AudioData objects.

Safari's AudioDecoder does not. If you feed it a buffer containing three ADTS frames, it decodes the first one and silently drops the rest. Or sometimes it throws an error. The behavior is not consistent.

Safari also requires an explicit AudioSpecificConfig descriptor in the AudioDecoder.configure() call. Chrome infers it from the ADTS headers, but Safari will not decode anything without the descriptor.

The AudioSpecificConfig for AAC-LC is exactly two bytes. It packs the audio object type (2 for AAC-LC), sample rate index (a lookup table mapping 48000 to 3, 44100 to 4, etc.), and channel configuration into 13 bits:

function buildAacDescription(
    sampleRate: number, channelCount: number
): Uint8Array {
    const freqIndex = AAC_SAMPLE_RATE_INDEX[sampleRate] ?? 3;
    const objectType = 2; // AAC-LC

    // Pack into 2 bytes: [TTTTT FFF] [F CCCC 000]
    const byte0 = (objectType << 3) | (freqIndex >> 1);
    const byte1 = ((freqIndex & 1) << 7) | (channelCount << 3);
    return new Uint8Array([byte0, byte1]);
}

Then each ADTS buffer needs to be parsed into individual frames, with headers stripped:

function parseAdtsFrames(data: Uint8Array): Uint8Array[] {
    if (data[0] !== 0xFF || (data[1] & 0xF0) !== 0xF0) {
        return []; // Not ADTS
    }

    const frames: Uint8Array[] = [];
    let offset = 0;

    while (offset + 7 <= data.length) {
        const hasCrc = (data[offset + 1] & 0x01) === 0;
        const headerSize = hasCrc ? 9 : 7;
        const frameLength =
            ((data[offset + 3] & 0x03) << 11) |
             (data[offset + 4] << 3) |
            ((data[offset + 5] >> 5) & 0x07);

        frames.push(
            data.subarray(offset + headerSize, offset + frameLength)
        );
        offset += frameLength;
    }
    return frames;
}

Each raw AAC frame is then fed as a separate EncodedAudioChunk with a computed timestamp. AAC frames are always 1024 samples, so each subsequent frame's timestamp is offset by 1024 / sampleRate seconds.

This works in both browsers. Chrome did not mind receiving individual frames (it just means more decode() calls), and Safari requires it. The AudioSpecificConfig descriptor is accepted by both.

Audio Output: Ring Buffers and ScriptProcessorNode

The WebCodecs AudioDecoder outputs AudioData objects, but those are not sound; they are decoded PCM samples sitting in memory. Playing them requires the Web Audio API.

The ideal approach is an AudioWorkletNode, which runs a custom audio processing script on a dedicated thread. But worklets require serving a separate JavaScript file from a specific URL pattern, and the worklet script cannot be inlined or bundled. For a single-page app served from a media server, the simpler option is the deprecated-but-functional ScriptProcessorNode:

const scriptNode = audioContext.createScriptProcessor(
    4096,  // buffer size
    0,     // no input channels
    2      // stereo output
);

scriptNode.onaudioprocess = (event) => {
    const output = event.outputBuffer;
    for (let ch = 0; ch < output.numberOfChannels; ch++) {
        const channelData = output.getChannelData(ch);
        // Copy from ring buffer to output, fill remainder with silence
    }
};

scriptNode.connect(gainNode);
gainNode.connect(audioContext.destination);

Between the decoder output and the ScriptProcessorNode sits a ring buffer: four seconds of Float32 samples per channel. A 10ms setInterval timer drains decoded AudioData from the decoder queue into the ring buffer. The onaudioprocess callback reads from the ring buffer into the audio output. If the ring buffer runs dry, the output is filled with silence (no glitches, just a brief gap).

This architecture decouples the decoder's output rate from the audio hardware's consumption rate. The decoder can burst-decode several frames at once; the ring buffer absorbs the burst; the ScriptProcessorNode consumes at a steady rate.

A/V Synchronization

The native clients synchronize video to audio using a display link and the AVSampleBufferDisplayLayer's timing model. The web client uses a simpler approach: requestAnimationFrame paced by the AudioContext clock.

private renderFrame(): void {
    // Use AudioContext clock for A/V sync
    const elapsed = audioOutput.currentTime - this.audioStartTime;

    // Consume all frames whose PTS has arrived
    let frameToRender = null;
    while (true) {
        const nextPts = videoDecoder.peekNextPts();
        if (nextPts === null) break;

        const relativePts = nextPts - this.ptsOffset;
        if (relativePts <= elapsed + 0.005) {
            if (frameToRender) frameToRender.frame.close();
            frameToRender = videoDecoder.dequeueFrame();
        } else {
            break;
        }
    }

    // Render the most recent eligible frame
    if (frameToRender) {
        ctx.drawImage(frameToRender.frame, 0, 0, canvas.width, canvas.height);
        frameToRender.frame.close();
    }
}

The AudioContext.currentTime is the ground truth clock. Each animation frame, the renderer checks which video frames have a PTS at or before the current audio time (with 5ms tolerance for jitter). If multiple frames are eligible (because the render loop runs at 60fps but the video is 24fps), it skips intermediate frames and renders only the most recent one. Skipped VideoFrame objects are explicitly closed to free GPU memory.

When audio is not playing (video-only, or audio still buffering), the renderer falls back to performance.now() wall-clock timing. This is less accurate but keeps the video moving.

The Full Teardown, Again

Pause, seek, and resume on the web client use the same full teardown pattern as the native clients (Part 6). Pause saves the current position, tears down the pipeline, and sends PausePlaybackRequest. Resume and seek both call startPlayback() at the new position, which reinitializes the entire decoder pipeline from scratch.

function handlePause() {
    session.isPaused = true;
    const position = session.renderer.playbackPosition;
    session.renderer.stop();
    session.audioOutput.suspend();
    session.abrReporter.stop();
    connection.sendFireAndForget(/* PausePlaybackRequest */);
}

function handleSeek(offset: number) {
    connection.sendFireAndForget(/* StopPlaybackRequest */);
    startPlayback(session.path, offset);
    // startPlayback reinitializes everything: decoders, renderer,
    // audio output, ABR reporting
}

This mirrors the native client's philosophy: starting from a clean state is simpler and more reliable than trying to flush and reconfigure a running pipeline. The server creates a fresh GStreamer pipeline at the requested position; the client creates fresh decoders. No stale state, no decoder flush races, no "sometimes the audio pops after seeking" bugs.

Server-Side: Dual-Citizen HTTP

On the Swift side, the HTTPServer actor grew from a single-purpose watchOS poll server into a dual-purpose server. The request routing looks like this:

private func routeRequest(_ request: HTTPRequest,
                          _ connection: NWConnection) async {
    switch (request.method, request.path) {
    case ("POST", "/login"):  handleLogin(request, connection)
    case ("POST", "/send"):   handleSend(request, connection)
    case ("GET",  "/poll"):   handlePoll(request, connection)
    case ("GET",  "/api/config"):
        handleApiConfig(request, connection)
    case ("GET", _):
        serveStaticFile(request.path, connection)
    default:
        send404(connection)
    }
}

Static file serving has the usual web server concerns: path traversal prevention (canonicalize the path and reject anything that escapes the web root), directory listing rejection, MIME type detection, and a 10MB file size guard. The web client's built output is bundled into the app at build time as a folder reference in Xcode.

The /api/config endpoint returns the ports the client needs:

{
    "wssPort": 18080,
    "wsPort": 18081,
    "httpPort": 18083
}

The web client reads this to construct its WSS URL. No hardcoded port numbers in the JavaScript.

WebSocket Liveness: Browser Clients Cannot Ping

The native clients send periodic WebSocket pings to keep the connection alive. Browsers cannot do this. The WebSocket API in JavaScript provides no method to send a ping frame; pings are a protocol-level concept that the browser handles internally (or does not).

ShowShark's original liveness system relied on client-initiated pings and timed out connections that stopped pinging. Browser clients would get disconnected after the idle timeout.

The fix was server-initiated pings. The WebSocketServer now sends ping frames to all connected clients. For browser clients (which respond with pongs automatically, handled by the browser's WebSocket implementation), this keeps the connection alive without any JavaScript involvement. The server monitors pong responses and disconnects clients that stop responding.

What I Would Do Differently

If I were starting over, I would still use vanilla TypeScript without a framework. The app's DOM interactions are simple enough that a virtual DOM adds complexity without benefit. document.createElement and element.innerHTML = "" are not elegant, but they are predictable.

I would use AudioWorkletNode instead of ScriptProcessorNode from the start. The deprecated API works fine today, but it runs on the main thread, which means a heavy render frame can cause an audio glitch. A worklet runs on a separate thread with guaranteed scheduling.

I would also investigate whether there is a reliable way to detect the browser's preferred H.264 framing format at runtime, without user agent sniffing. The VideoDecoder.isConfigSupported() static method exists, but it tests codec support, not framing format preference. Two browsers can both return {supported: true} for the same VideoDecoderConfig and then disagree about whether the actual encoded data should use AVCC or Annex B framing.

The Scorecard

What worked well:

  • Protobuf over WebSocket. The same protocol, the same message types, the same correlation scheme. The web client is a first-class citizen of the server's message routing.
  • WebCodecs. Hardware-accelerated H.264 decoding in the browser is real and it is fast. Configuring it correctly is the hard part.
  • No framework. 9,600 lines of TypeScript, zero framework dependencies. The build is fast, the bundle is small, and the code does exactly what it says.
  • Full teardown pattern. Porting this from the native client to the web was trivial. The same architectural decision that simplified the Swift clients simplified the web client.

What was painful:

  • AVCC vs. Annex B. Two browsers, two incompatible expectations for the same standardized API. User agent sniffing is the only solution.
  • Safari certificate trust. Per-port trust scoping with no programmatic workaround. The "open this link and accept the warning" flow is the best UX possible given the constraints.
  • AAC frame splitting. Chrome parses concatenated ADTS frames; Safari requires them individually. An undocumented behavioral difference discovered through trial and error.
  • Mixed content blocking. HTTPS page cannot use ws://. Obvious in retrospect; not mentioned in the WebCodecs documentation that mandates HTTPS in the first place.

The web client shipped in a single day, from first commit to video playing in a browser. Most of that day was spent on the browser-compatibility issues described above. The actual architecture (connection management, protobuf integration, decoder pipeline, A/V sync) was straightforward; it is the same architecture as the native clients, translated into TypeScript.

The web is a platform where two implementations of the same specification can require mutually exclusive input formats. This is, apparently, normal.