# Generate Structured Instruction

Description

Translates a user's text-based edit instruction and source image/mask into a detailed, machine-readable structured edit instruction in JSON format.

This endpoint uses the state-of-the-art Gemini 2.5 Flash VLM bridge to understand the edit context. It only returns the JSON string and does not generate an image.

Context-Aware Masking

When a mask is provided, the VLM analyzes the specific region of interest in relation to the rest of the image. It generates a structured_instruction tailored specifically for that area (e.g., ensuring lighting and perspective match the unmasked background), ensuring seamless integration when the edit is applied.

Why use this endpoint?
- Decoupling: Decouples the "intent translation" step from the "image editing" step, giving you maximum flexibility.
- Control & Auditability: Allows for a "human-in-the-loop" to inspect, programmatically edit, or version the JSON before generating an image (e.g., for a custom UI).
- Consistency & Automation: Generate one structured_instruction and pass it to /v2/image/edit multiple times to create consistent, auditable variations.
- Hybrid Deployment: Use Bria's state-of-the-art VLM bridge via API while self-hosting the open-source FIBO image model on your own private cloud.

The resulting structured_instruction can be used as input for the /v2/image/edit endpoint.

---

Input Combination Rules
The request body must use exactly one of the following combinations:
* Global Instruction: images + instruction
* Masked Instruction: images + mask + instruction

---

API Access

You can register and access the API Token through Bria's platform by clicking here.

Endpoint: POST /structured_instruction/generate

## Header parameters:

  - `api_token` (string, required)

## Request fields (application/json):

  - `instruction` (string, required)
    Required. Text-based edit instruction (e.g., "make the sky blue", "add a cat"). This parameter serves as the text prompt.

  - `images` (array, required)
    Required. The source image to be edited. Publicly available URL or Base64-encoded. Must contain exactly one item.

  - `mask` (string)
    Optional. Publicly available URL or Base64-encoded mask image (black and white). Black areas will be preserved, white areas will be edited. If omitted, the edit applies to the entire image.

  - `seed` (integer)
    Optional. Seed for deterministic generation. If omitted, a random seed is generated and used.

  - `sync` (boolean)
    Specifies the response mode. Optional.
  - When false (default), the request is processed asynchronously: the API immediately returns a status URL to track progress.
  - When true, the request is processed synchronously: the API hold the connection open until the proccess is complete and then returns the final result in the response.

  - `ip_signal` (boolean)
    If true, returns a warning for potential IP content in the instruction. Optional.

  - `prompt_content_moderation` (boolean)
    If true, returns 422 on instruction moderation failure. Optional.

  - `visual_input_content_moderation` (boolean)
    If true, returns 422 on images or mask moderation failure. Optional.

## Response 200 fields (application/json):

  - `result` (object, required)

  - `result.seed` (integer, required)

  - `result.structured_instruction` (string, required)

  - `request_id` (string, required)

  - `warning` (string)
    Returned only when ip_signal = true and the instruction field included IP content.

## Response 202 fields (application/json):

  - `request_id` (string, required)

  - `status_url` (string, required)

## Response 400 fields (application/json):

  - `error` (object, required)

  - `error.code` (integer, required)
    Example: 123

  - `error.message` (string, required)

  - `error.details` (string, required)

  - `request_id` (string, required)


