Generative AI has historically operated as a "black box": you provide a text prompt, and the model returns an image. While this enables rapid creative iteration, it often lacks the precision, consistency, and reproducibility required for enterprise workflows.
Bria introduces a paradigm shift with Visual GenAI Language (VGL). This section explains the fundamental concepts behind VGL and how it transforms visual generation from a game of chance into a structured engineering discipline.
To achieve full control over visual generation—and to enable true automation—developers and enterprises need a language that goes beyond natural language prompts. They need a structured, explicit representation where every parameter is defined, independent, and readable by both humans and machines.
VGL is that language. It represents a move away from ambiguous prompt-based generation toward structured specification.
- Prompt-Based Generation: Relies on natural language. It is fast for creative brainstorming but interpretation varies between runs, making it difficult to reproduce specific results.
- VGL-Based Generation: Uses a structured specification. It is built for precision, offering reproducible, auditable outputs and the ability to systematically modify specific parameters.
Most generative models are opaque; you have no visibility into how the model interpreted your prompt or made its decisions. VGL changes this by "opening the black box".
The Fibo model family is trained natively on VGL. It is not a wrapper or a post-processing layer—VGL is the internal representation the models use. When you generate an image, you receive the complete VGL specification as part of the output.
This transparency means every visual attribute is exposed, including:
- Objects: Positions, shapes, and relationships.
- Lighting: Conditions, directions, and shadows.
- Camera: Angles, depth of field, and lens choices.
- Aesthetics: Mood, color palettes, and artistic styles.
VGL is designed to give you ownership over the AI's output through specific structural properties:
- Explicit: Every visual attribute is fully declared. Nothing is implied or hidden. If a specific lighting condition is used, it is stated in the VGL,.
- Disentangled: You can modify one parameter without affecting others. For example, you can change the lighting direction without accidentally changing the object's texture or the background setting,.
- Physically-Grounded: Parameters map to real-world dimensions. VGL deals in pixel-level space, camera optics, and physical lighting conditions rather than abstract concepts,.
- Natively-Typed: Attributes use their natural data representations—semantics are words, quantities are numbers, positions are coordinates, and colors are RGB values,.
VGL transforms complex image manipulation into structured data operations. By treating images as data structures, VGL enables workflows that were previously impossible with standard prompting.
VGL transforms what would be difficult pixel-level edits into simple structured text edits. Because the system understands the scene structure (e.g., knowing that a "person" is an addressable object with specific clothing properties), you can programmatically execute logic like "change shirt colors" without manual masking or reprompting.
In enterprise environments, knowing why an image looks the way it does is critical. With VGL, compliance teams can inspect every parameter used to generate an asset. Furthermore, the same VGL specification will produce the same output, allowing you to share specifications across teams or store them for future use,.
Because VGL is machine-readable JSON, AI agents and programmatic systems can read, validate, transform, and chain VGL specifications without human interpretation,. This allows for the creation of "Agentic Workflows" where systems autonomously analyze and modify visuals based on logic.
Instead of asking a model to "make it look like our brand" (which is abstract and unreliable), VGL allows you to encode brand guidelines as concrete parameters. You can lock specific color palettes, lighting conditions, and moods to ensure every generated asset is compliant by definition,.
The schema below defines every dimension of an image — what's in it, where things are, how it's lit, and how it feels.
- short_description string required The semantic anchor — a concise description of what the visual represents. "short_description": "A warm and inviting outdoor cafe patio scene, featuring a woman enjoying a coffee at a small bistro table. A prominent A-frame chalkboard sign stands on cobblestones, announcing 'TODAY'S SPECIAL' and 'Pumpkin Spice.' Potted plants add greenery, and a glowing neon 'OPEN' sign is visible in the window behind. Paper coffee cups are on the table."
- objects array<object> Array of all discrete elements in the scene. Each object is fully specified with positioning, appearance, and relationships. Object Properties (Non-Human):
- description — What the object is
- bounding_box — Normalized coordinates [x, y, w, h] for exact pixel-wise positioning
- shape_and_color — Physical form and primary color characteristics Note: In the next version, shape_and_color will be replaced with primary_color and secondary_color as array<[R, G, B]>
- texture — Surface quality (matte, glossy, rough, smooth, etc.)
- appearance_details — Additional visual characteristics
- relationship — Spatial/semantic relationship to other objects
- orientation — Direction the object faces or is positioned
- relative_distance — Depth positioning (foreground, midground, background) Human-Specific Properties extends object When an object represents a human, the following additional properties are available:
- age (number) — Apparent age in years
- gender (enum: male | female | NA) — Gender presentation
- pose (array<[pose_name, [x, y]]>) — Body keypoints with named positions and normalized coordinates
- expression — Facial expression and emotional state
- clothing — Attire and accessories (Note: In the next version, clothing will be replaced with array<object>)
- action — What the person is doing
- skin_tone_and_texture — Skin appearance description
- monk_scale — Monk Skin Tone Scale value 1-10
- background_setting string required Environment, location, and scene context. "background_setting": "European-style outdoor café patio with cobblestone ground, storefront with large windows, urban street setting, autumn season atmosphere"
- lighting object required Light characteristics in the scene.
- conditions — Overall lighting quality
- direction — Primary light source direction
- shadows — Shadow characteristics
- aesthetics object required Visual treatment and emotional qualities.
- composition — Framing and arrangement
- mood_atmosphere — Emotional quality
- color_palette — Global color governance (RGB array)
- photographic_characteristics object optional Camera and lens simulation properties.
- depth_of_field — Focus depth
- focus — Focus target and sharpness
- camera_angle — Viewpoint
- lens_focal_length — Simulated lens
- style_medium string required The rendering medium or visual style category. "photography" | "digital illustration" | "3D render" | "vector art" | "watercolor" | "oil painting" | "pencil sketch" | "anime" | "pixel art"
- artistic_style string optional Specific stylistic references or artistic influences. "artistic_style": "lifestyle photography, café culture aesthetic, Instagram-style warm tones"
- context string optional Narrative and situational framing that informs generation. "context": "Autumn seasonal marketing image for café or coffee brand, conveying comfort and the pleasure of small moments"
- text_render array<object> optional Array of typography elements embedded in the visual.
- text — The actual text content
- color — Primary text color in RGB
- font — Font family or style description
- appearance_details — Additional styling
- bounding_box — Normalized coordinates for placement [x, y, w, h]
- relative_distance — Layer positioning relative to scene elements
VGL is the universal language spoken by the entire Fibo model family. Whether you are using Fibo Generation for high-fidelity hero images, Fibo Lite for high-volume production, or Fibo Edit for refinement, the VGL specification remains portable and consistent.
A VGL file created in one part of your pipeline can be executed by a different model in another part without translation, ensuring a seamless flow from concept to production.
Ready to implement? Proceed to the API Reference to see how to structure VGL requests.