Adaptive Bitrate in a 1x-Paced System - Part 5 of 6

Part 5 of 6: Audio/Video Streaming with Swift and GStreamer

Every major streaming platform uses adaptive bitrate (ABR) to match video quality to network conditions. The usual approach is straightforward: measure how fast data arrives at the client, estimate network throughput, and request a bitrate that fits within that capacity. Netflix, YouTube, and Apple's HLS all work this way, with variations in smoothing and switching logic.

ShowShark cannot use this approach. The reason is fundamental to its architecture, and the alternative we developed touches on an aspect of congestion detection that most ABR literature ignores.

Why Client Throughput Does Not Work

In Part 2, we described how the server paces output at 1x real time. A 4 Mbps stream sends approximately 4 megabits of data per second, metered by the pull loop's pacing sleeps. The data does not arrive in bursts; it trickles out at the presentation rate.

The client measures throughput by tracking how many bytes arrive per unit of time. On a network that can handle 20 Mbps, the client measures roughly 4 Mbps, because that is how fast the server is sending. On a network that can only handle 3 Mbps, the client still measures roughly 4 Mbps, because TCP's send buffer absorbs the difference and the data eventually arrives. The measured throughput reflects the sending rate, not the network capacity.

  Conventional streaming (burst delivery):
  ────────────────────────────────────────
  Server sends segment at full speed
  Client measures: arrival_bytes / arrival_time = network capacity
  ✓ Throughput measurement works

  1x-paced streaming (ShowShark):
  ────────────────────────────────
  Server sends frames at presentation rate (4 Mbps → 4 Mbps out)
  Client measures: arrival_bytes / arrival_time ≈ sending rate
  ✗ Throughput measurement reflects sending rate, not network capacity

There is a second problem, equally fundamental. In a burst-delivery system like HLS, the client downloads a segment faster than real time, and the buffer grows. If network capacity exceeds the bitrate, the buffer steadily increases. This buffer growth is the signal that the client can safely switch to a higher quality.

In a 1x-paced system, the buffer cannot grow during normal playback. The server sends exactly one second of content per second. The client consumes exactly one second of content per second. The buffer level is determined by the startup burst and then holds roughly steady, drifting only due to timing jitter and network variability. Buffer growth is not available as an increase signal.

These two constraints eliminate the foundations of standard ABR algorithms (BOLA, BBA, MPC). We needed something different.

Buffer-Based AIMD with Server-Side Send Timing

ShowShark's ABR controller uses a buffer-based AIMD (Additive Increase, Multiplicative Decrease) algorithm with two key innovations: the congestion signal comes from the server, not the client; and increases are gated on buffer stability rather than buffer growth.

The Feedback Loop

  ┌─────────────────────────────────────────────────────────────────┐
  │                        Server                                   │
  │                                                                 │
  │  ┌──────────────────┐    ┌───────────────────────┐              │
  │  │ GStreamer Encoder│───▶│ sendStreamData()      │              │
  │  │ (bitrate = N)    │    │ ┌───────────────────┐ │              │
  │  └──────────────────┘    │ │ SendTiming        │ │              │
  │         ▲                │ │ Accumulator       │ │              │
  │         │                │ └─────────┬─────────┘ │              │
  │  ┌──────┴───────────┐    └───────────┼───────────┘              │
  │  │ ABR Controller   │◀──────────────┘                           │
  │  │                  │◀──────────────────── PlaybackStatusReport │
  │  └──────────────────┘                            ▲              │
  │                                                  │              │
  └──────────────────────────────────────────────────┼──────────────┘
                                                     │
  ┌──────────────────────────────────────────────────┼───────────────┐
  │                        Client                    │               │
  │                                                  │               │
  │  ┌──────────────────┐                            │               │
  │  │ RawFramePlayback │──── every 2 seconds ───────┘               │
  │  │ Manager          │                                            │
  │  └──────────────────┘                                            │
  │  Reports: video buffer duration, audio buffer duration,          │
  │           dropped frame count                                    │
  │  (Throughput estimate included but NOT used for decisions)       │
  └──────────────────────────────────────────────────────────────────┘

Every 2 seconds, the client sends a PlaybackStatusReport containing its current buffer levels and dropped frame count. The server's AdaptiveBitrateController combines this with its own send timing data to make a bitrate decision.

Server-Side Send Timing

The key insight: when the network is congested, the server's WebSocket send calls take longer to complete.

NWConnection.send() completes when the data has been accepted by the kernel's TCP send buffer. Under normal conditions, this takes microseconds. When the network cannot keep up, the TCP send buffer fills, and the send call blocks until space becomes available. The blocking duration directly measures TCP backpressure.

// In ClientConnection.sendStreamData()
let sendStart = CFAbsoluteTimeGetCurrent()

try await connection.send(content: data, ...)

if data.count >= 1024 {
    let durationMs = (CFAbsoluteTimeGetCurrent() - sendStart) * 1000.0
    sendTimingAccumulator.record(bytes: data.count, durationMs: durationMs)
}

The SendTimingAccumulator tracks the maximum send duration since the ABR controller last queried it:

final class SendTimingAccumulator {
    private var abrMaxDurationMs: Double = 0
    private let lock = NSLock()

    func record(bytes: Int, durationMs: Double, logger: AppLogger) {
        lock.lock()
        if durationMs > abrMaxDurationMs {
            abrMaxDurationMs = durationMs
        }
        // ... logging accumulator (2-second summaries) ...
        lock.unlock()
    }

    func consumeMaxSendDuration() -> Double {
        lock.lock()
        let max = abrMaxDurationMs
        abrMaxDurationMs = 0
        lock.unlock()
        return max
    }
}

When consumeMaxSendDuration() returns a value above 200ms, the network is congested. This signal arrives before the client's buffer starts draining, providing early warning that allows the controller to reduce bitrate proactively.

The Decision Tree

The controller processes each report through a priority-ordered decision tree:

  processReport(bufferSeconds, maxSendDurationMs, ...)
  │
  ├── Is maxSendDurationMs > 200ms AND buffer >= 0.5s?
  │   └── YES: SEND-CONGESTED → decrease 15%
  │
  ├── Is buffer < 0.5s?
  │   └── YES: CRITICAL → halve bitrate (bypass cooldown)
  │
  ├── Is buffer < 1.5s?
  │   └── YES: LOW → decrease 15%
  │
  ├── Is buffer < 3.0s?
  │   └── YES: HOLD → no change
  │
  ├── Is bitrate at resolution ceiling?
  │   └── YES: AT-CEILING → no change
  │
  ├── Is buffer draining (delta < -0.3s)?
  │   └── YES: DRAINING → no change (don't increase into decline)
  │
  └── Buffer >= 3.0s AND stable
      └── INCREASE → add 15%
  Buffer Zones
  ──────────────────────────────────────────────────────────────▶
  │ CRITICAL │    LOW     │      HOLD      │     INCREASE      │
  │  halve   │ -15%       │   no change    │    +15%           │
  0s       0.5s         1.5s             3.0s                  ▶
                                           │
                                    only if stable
                                    (not draining)

Asymmetric Cooldowns

After an increase, the controller waits 6 seconds before making another adjustment. After a decrease, it waits 8 seconds. The asymmetry reflects a deliberate philosophy: increases should be cautious (the system just reached a comfortable buffer level; do not rush to consume it), and decreases should take effect quickly (the buffer is draining; let the reduced bitrate stabilize before trying to increase again).

The critical zone (buffer < 0.5s) bypasses cooldowns entirely. If the buffer is about to underrun, waiting 8 seconds for the cooldown to expire would guarantee a stall.

Overshoot Memory

This is the mechanism that prevents oscillation. When the controller decreases bitrate, it records the pre-decrease bitrate as an "overshoot":

// On decrease
lastOvershootBitrate = currentBitrate
lastOvershootTime = now

// On increase (if overshoot memory active)
if now - lastOvershootTime < 60.0 {
    let cap = lastOvershootBitrate * 0.90
    if targetBitrate > cap {
        targetBitrate = cap  // Cap at 90% of what caused the last decrease
    }
}

Without overshoot memory, the controller would exhibit a classic sawtooth pattern: increase to 6 Mbps, detect congestion, decrease to 5.1 Mbps, buffer recovers, increase back to 6 Mbps, detect congestion again, and so on. The 90% cap breaks this cycle. After overshooting at 6 Mbps, future increases are capped at 5.4 Mbps for 60 seconds. If 5.4 Mbps is sustainable, the system settles there. If not, a new (lower) overshoot is recorded.

The 60-second decay allows the system to recover if network conditions improve. A brief WiFi congestion event should not permanently suppress bitrate for the rest of the movie.

EWMA Smoothing

Decreases are smoothed with an exponentially weighted moving average (alpha = 0.3):

let smoothedBps = 0.3 * targetBps + 0.7 * previousEWMABps

This prevents a single bad report from causing a dramatic bitrate drop. Increases are not smoothed; they are applied directly but quantized to 100 kbps steps to avoid micro-adjustments that would stress the encoder.

A minimum change threshold of 5% suppresses adjustments that would not produce a visible quality difference:

let relativeDelta = abs(Double(newBitrate - currentBitrate)) / Double(currentBitrate)
if relativeDelta < 0.05 {
    return nil  // Suppress; not worth the encoder disruption
}

Resolution-Based Ceilings

There is no point encoding 480p content at 10 Mbps. The quality plateaus well below that. The controller enforces resolution-based maximum bitrates:

  Resolution        Max Bitrate
  ─────────────     ───────────
  4K (3840x2160)    20 Mbps
  1080p (1920x1080) 10 Mbps
  720p (1280x720)    6 Mbps
  480p and below     3 Mbps

These ceilings also serve as the upper bound for the overshoot cap. The system will never attempt to increase beyond the ceiling for the current output resolution.

The Client's Role

The client's contribution to ABR is a 96-line extension on RawFramePlaybackManager. Every 2 seconds, it constructs a PlaybackStatusReport:

let relativePTS = currentPosition - playbackBasePosition
let videoBuffer = max(0, (videoBufferEndPTS ?? 0) - relativePTS)
let audioBuffer = max(0, (audioBufferEndPTS ?? 0) - relativePTS)

let report = PlaybackStatusReport.with {
    $0.sessionID = sessionID
    $0.videoBufferSeconds = videoBuffer
    $0.audioBufferSeconds = audioBuffer
    $0.estimatedThroughputBps = throughputEstimator.estimatedThroughputBps()
    $0.framesDroppedSinceLastReport = droppedFramesSinceLastReport
}

The estimatedThroughputBps field is populated but not used for decisions. It exists for diagnostic logging. The actual congestion signal is the server-side send timing, which the client never sees.

Buffer durations are computed relative to the current playback position, not as absolute quantities. If the client has received data up to PTS 15.3 and is currently playing at PTS 12.1, the buffer is 3.2 seconds. This calculation is independent of startup boost behavior and handles seek/resume correctly because playbackBasePosition adjusts for each new session.

The reporting loop includes a 3-second initial delay to let the startup burst settle before sending the first report. Without this delay, the first report would show a rapidly growing buffer (from the burst) that would trigger an unnecessary increase.

Encoder Bitrate Changes

When the ABR controller decides on a new bitrate, it calls setEncoderBitrate() on the transcoding session. As described in Part 1, this uses g_object_set_property with a GValue to set the VideoToolbox encoder's bitrate property. The encoder adjusts its rate controller over the next several frames; the change is not instantaneous, but it takes effect within one GOP (typically 3-4 seconds at our keyframe interval).

No pipeline teardown is required. No renegotiation. The encoder simply starts targeting the new bitrate on its next rate-control decision. This is one of the advantages of using VideoToolbox's hardware encoders: they are designed for real-time rate adaptation.

Session Reset

When playback stops (pause, seek, session end), the ABR controller resets to its initial state: 2 Mbps starting bitrate, cleared overshoot memory, cleared adjustment timestamps and buffer history. Each new playback session starts fresh. This is correct because a seek might land on a visually different section of the content (a dark scene versus a bright action sequence), and the network conditions may have changed since the last session.

The reset is a consequence of the full teardown pattern described in Part 6. Since pause and seek destroy the entire pipeline and start a new one, there is no pipeline state to preserve, and the ABR controller's history would be misleading for the new session.

Results

The system converges to a stable bitrate within 20-30 seconds of playback start (4-5 reporting cycles plus cooldowns). On a stable network, it reaches the resolution ceiling and stays there. On a congested network, it finds the sustainable bitrate and holds it, with occasional gentle decreases when send timing spikes.

The overshoot memory is the most important component. Without it, the system oscillates. With it, the system converges to a bitrate 10% below the congestion threshold and stays there until conditions change. The 60-second decay is long enough to prevent re-oscillation but short enough to respond to genuine improvements.

The send-congested zone provides early warning that is genuinely useful. On WiFi networks with periodic interference, send timing spikes appear 2-4 seconds before the client's buffer starts draining. This lead time lets the controller reduce bitrate while the buffer is still healthy, avoiding a stall entirely.