# Realtime Video Background Removal API

Remove or change backgrounds from video streams in real-time over a persistent WebSocket connection. Send JPEG frames, receive grayscale JPEG masks (0 = background, 1 = foreground), and composite client-side.

[**Try out this capability in Bria's sandbox**](https://platform.bria.ai/video-editing/video-streaming-remove-background/sandbox)

Working end-to-end client (browser): [streaming-rmbg-example](https://github.com/bria-ai/streaming-rmbg-example)

> 💡 **Processing finite video files instead?** For batch workflows that need audio preservation, codec/container selection, or full editing capabilities, use the REST [`/remove_background`](/video-editing#operation/remove-background) endpoint.


## Performance

US-based clients can achieve **~24 FPS** real-time throughput with **<100ms RTT** (~74 ms p50, ~97 ms p95).

Hitting the 24 FPS ceiling typically requires client-side optimizations — see [Client-Side Compositing & Optimization](#client-side-compositing--optimization) below.

## Connection


```
wss://streaming.prod.bria-api.com
```

## Authentication

Authenticate using either an API token **or** an OAuth access token — provide only one per connection.

### API token


```
wss://streaming.prod.bria-api.com?api_token=<YOUR_API_TOKEN>
```

### OAuth


```
wss://streaming.prod.bria-api.com?oauth=<YOUR_OAUTH_ACCESS_TOKEN>
```

## Message format

The protocol uses two message types over the same connection:

- **Binary messages** — video/audio frame data. 24-byte header + JPEG payload.
- **JSON messages** — session control, errors, and debug telemetry.


### Binary frame header (24 bytes)

| Offset | Size | Field | Type | Value |
|  --- | --- | --- | --- | --- |
| 0–3 | 4 B | Magic | string | `BRIA` (`0x42524941`) |
| 4 | 1 B | Version | uint8 | `3` |
| 5 | 1 B | App ID | uint8 | `1` (Remove Background) |
| 6 | 1 B | Media Type | uint8 | `1`=Video, `2`=Audio* |
| 7 | 1 B | Codec | uint8 | `1`=JPEG |
| 8–15 | 8 B | Frame ID | uint64 | Big-endian frame ID† |
| 16–23 | 8 B | PTS | int64 | Big-endian timestamp‡ |
| 24+ | var | Payload | bytes | JPEG frame (client→server) or mask (server→client) |


* Video is processed and a mask is returned; audio passes through untouched.
† **Frame ID** is echoed by the server on the corresponding mask, so the client can pair masks with their source frames.
‡ **PTS** is opaque to the server and echoed back unchanged for client-side timing logic.

### Constants

| Constant | Value |
|  --- | --- |
| `MAGIC` | `BRIA` (`0x42524941`) |
| `VERSION` | `3` |
| `APP_RMBG` | `1` |
| `MEDIA_VIDEO` | `1` |
| `MEDIA_AUDIO` | `2` |
| `CODEC_JPEG` | `1` |
| `HEADER_SIZE` | `24` |


## Client → Server

### Video frame (binary)


```
[24-byte header][JPEG payload]
```

Encode each captured frame as JPEG, prepend the header, and send. Use a **monotonically increasing `frame_id`** so the server can echo it back on the corresponding mask.

### Stop session (JSON)


```json
{ "type": "stop" }
```

### Debug mode (JSON)

Enables per-frame timing telemetry on the server's responses.


```json
{ "type": "debug", "enabled": true }
```


```json
{ "type": "debug", "enabled": false }
```

## Server → Client

### Mask frame (binary)

Same 24-byte header layout as the inbound video frame. The payload is a **grayscale JPEG mask**, with `frame_id` echoed from the source frame.

> ⚠️ The mask is **not RGBA** and contains no color from the original frame. Use it as alpha against the matching source frame and composite client-side. See [Client-Side Compositing & Optimization](#client-side-compositing--optimization).


### Error (JSON)


```json
{ "type": "error", "message": "..." }
```

### Frame timing (JSON, debug only)

Returned only when debug mode is enabled.


```json
{
  "type": "frame_timing",
  "frame_id": 4,
  "frames_dropped_since_last": 0,
  "server_ms": {
    "decode_jpeg": 0.5,
    "pre": 0.17,
    "infer": 38.83,
    "post": 0.11,
    "encode_mask": 0.22,
    "engine_total": 40.22,
    "session_total": 41.34
  },
  "bytes_in": 60881,
  "bytes_out": 23254
}
```

## WebSocket close codes

| Code | Reason | Trigger |
|  --- | --- | --- |
| 1008 | `unauthorized` | Authentication failed or invalid credentials |
| 1013 | `capacity exceeded, please try again later` | Service is temporarily overloaded |
| 4003 | `session limit reached` | Session duration limit reached (plan restriction) |
| 4008 | `session timeout` | No media frames received within allowed time |


## Client-side compositing & optimization

The API returns **mask JPEGs only**. Your integration is responsible for:

- Producing video frames as JPEG with monotonically increasing `frame_id` values.
- Keeping each source frame (or its JPEG) until the matching mask returns.
- Compositing mask × foreground (and any background) on your side. Backgrounds are **never inferred by the server** — solid colors, images, or video behind the subject are entirely client-side.


### Algorithm patterns

1. **Capture and encode on a steady clock** — Typically once per display frame (or your pipeline's tick). Run JPEG encode and heavy blending **off the UI thread** (worker, native module, GPU queue) so you do not block interaction or capture.
2. **Latest-frame-wins (unsent queue)** — If encoding or the network falls behind, **drop older frames that were never sent** and keep only the **newest** ready-to-send JPEG. A deep FIFO of unsent frames adds latency without improving quality.
3. **Bounded in-flight (AIMD-style window)** — Cap how many frames you have **sent but not yet received a mask for**. Grow that cap slowly when round-trip times are stable, **shrink it** when RTT spikes or when sends time out ("stale"). This self-throttles bandwidth and protects the server and your own memory.
4. **Same binary envelope both ways** — Outbound video and inbound masks both use the same 24-byte header + JPEG payload; parse inbound `frame_id` with the same rules you used when packing outbound frames.
5. **Mask handling** — Decode the mask JPEG to a bitmap (single-channel / grayscale weights after compression). **Look up the source frame for that `frame_id`**, multiply foreground by mask alpha, add background where you want it, output RGBA or premultiplied BGRA as your renderer needs. Optionally **serialize or cap** compositor work so a burst of replies cannot stall the app.


### Practical tuning

- Reduce **frame size** and/or **JPEG quality** to lower CPU and uplink bytes.
- Prefer **GPU compositing** for full-resolution blends; separating **encode** and **composite** work avoids one stage blocking the other.
- Throughput will track **network RTT**, **server capacity**, and **how fast you can produce JPEGs** — measure masks/sec vs. composite time separately if you instrument both.


A browser app implementing all of the patterns above is available as a reference: [streaming-rmbg-example](https://github.com/bria-ai/streaming-rmbg-example).

## Examples

A complete end-to-end client (capture, send, receive, composite) lives in [streaming-rmbg-example](https://github.com/bria-ai/streaming-rmbg-example). The snippets below illustrate the wire protocol.

### JavaScript: connect and send


```js
const HEADER_SIZE = 24;
const VERSION = 3;
const APP = 1;
const MEDIA_VIDEO = 1;
const CODEC_JPEG = 1;

function buildWsUrl(serverUrl, auth) {
  const url = new URL(serverUrl);

  if (auth.apiToken) {
    url.searchParams.set("api_token", auth.apiToken);
  } else if (auth.oauth) {
    url.searchParams.set("oauth", auth.oauth);
  } else {
    throw new Error("Missing authentication");
  }

  return url.toString();
}

function buildHeader(frameId, ptsUs) {
  const buf = new ArrayBuffer(HEADER_SIZE);
  const view = new DataView(buf);

  view.setUint8(0, 0x42);
  view.setUint8(1, 0x52);
  view.setUint8(2, 0x49);
  view.setUint8(3, 0x41);

  view.setUint8(4, VERSION);
  view.setUint8(5, APP);
  view.setUint8(6, MEDIA_VIDEO);
  view.setUint8(7, CODEC_JPEG);

  view.setUint32(8, Math.floor(frameId / 0x100000000), false);
  view.setUint32(12, frameId >>> 0, false);

  const ptsHi = Math.floor(ptsUs / 0x100000000);
  const ptsLo = ptsUs >>> 0;

  view.setInt32(16, ptsHi, false);
  view.setUint32(20, ptsLo, false);

  return buf;
}

const ws = new WebSocket(buildWsUrl(
  "wss://streaming.prod.bria-api.com",
  { apiToken: "YOUR_TOKEN" }
));

ws.binaryType = "arraybuffer";

ws.onmessage = (event) => {
  if (typeof event.data === "string") return;

  const buf = event.data;
  // Same 24-byte header as outbound; remainder is the mask JPEG.
  const jpegMask = buf.slice(HEADER_SIZE);
  // Parse frame_id from the header, load the matching source frame you kept for that id,
  // decode both JPEGs, composite (foreground × mask + optional background).
  // Reference: https://github.com/bria-ai/streaming-rmbg-example
};
```

### JavaScript: parse mask header


```js
function unpackHeader(buffer) {
  const view = new DataView(buffer);

  if (
    view.getUint8(0) !== 0x42 ||
    view.getUint8(1) !== 0x52 ||
    view.getUint8(2) !== 0x49 ||
    view.getUint8(3) !== 0x41
  ) return null;

  return {
    frameId:
      view.getUint32(8, false) * 0x100000000 +
      view.getUint32(12, false),
    ptsUs:
      view.getInt32(16, false) * 0x100000000 +
      view.getUint32(20, false),
    payload: buffer.slice(24),
  };
}
```

### Python: connect and send


```python
import struct
import asyncio
import websockets

HEADER_SIZE = 24

def build_header(frame_id, pts_us):
    buf = bytearray(HEADER_SIZE)

    buf[0:4] = b"BRIA"
    buf[4] = 3
    buf[5] = 1
    buf[6] = 1
    buf[7] = 1

    struct.pack_into(">I", buf, 8, frame_id >> 32)
    struct.pack_into(">I", buf, 12, frame_id & 0xFFFFFFFF)

    pts_hi = pts_us >> 32
    pts_lo = pts_us & 0xFFFFFFFF

    struct.pack_into(">i", buf, 16, pts_hi)
    struct.pack_into(">I", buf, 20, pts_lo)

    return bytes(buf)

def build_ws_url(server_url, api_token=None, oauth=None):
    if bool(api_token) == bool(oauth):
        raise ValueError("Provide exactly one auth method")

    sep = "&" if "?" in server_url else "?"
    if api_token:
        return f"{server_url}{sep}api_token={api_token}"
    return f"{server_url}{sep}oauth={oauth}"

async def main():
    uri = build_ws_url(
        "wss://streaming.prod.bria-api.com",
        api_token="YOUR_TOKEN",
    )

    async with websockets.connect(uri) as ws:
        # On each binary message: parse the 24-byte header, read frame_id and mask JPEG bytes,
        # pair with the stored source for that id, decode and composite.
        # Reference: https://github.com/bria-ai/streaming-rmbg-example
        pass

asyncio.run(main())
```