# Dataset

Manage training datasets

## Create Dataset

 - [POST /tailored-gen/datasets](https://docs.bria.ai/tailored-generation/dataset/create-dataset.md): Create a new dataset.

Datasets use JSON structured data (visual_schema) for training. You must generate a visual schema via /generate_visual_schema before completing the dataset.

Completion Requirements:
Minimum 5 images required to mark as completed.

Upload types:
* Basic upload type: Supports up to 200 images, uploading image files
* Advanced upload type: Supports up to 5000 images, uploading a zip file

## Get Datasets

 - [GET /tailored-gen/datasets](https://docs.bria.ai/tailored-generation/dataset/get-datasets.md): Retrieve a list of all datasets. If there are no datasets, returns an empty array.

## Get Datasets by Project

 - [GET /tailored-gen/projects/{project_id}/datasets](https://docs.bria.ai/tailored-generation/dataset/get-datasets-by-project.md): Retrieve all datasets for a specific project.

## Get Dataset by ID

 - [GET /tailored-gen/datasets/{dataset_id}](https://docs.bria.ai/tailored-generation/dataset/get-dataset-by-id.md): Retrieve a specific dataset including its images.

## Update Dataset

 - [PUT /tailored-gen/datasets/{dataset_id}](https://docs.bria.ai/tailored-generation/dataset/update-dataset.md): Update a dataset.

You can update visual_schema only when the dataset status is draft.
  
Completion Requirements:
To set status to completed, the dataset must have at least 5 images.

## Delete Dataset

 - [DELETE /tailored-gen/datasets/{dataset_id}](https://docs.bria.ai/tailored-generation/dataset/delete-dataset.md): Delete a specific dataset. Deletes all associated images.

## Clone Dataset As Draft

 - [POST /tailored-gen/datasets/{dataset_id}/clone](https://docs.bria.ai/tailored-generation/dataset/clone-dataset.md): Create a new draft dataset based on an existing one. This is useful when you would like to use the same dataset again for another training, but with some modification (create a variation). The cloned dataset inherits the visual_schema from the source dataset.

## Upload Image Files

 - [POST /tailored-gen/datasets/{dataset_id}/images](https://docs.bria.ai/tailored-generation/dataset/upload-image.md): Upload a new image to a dataset.

Image Requirements:
- Recommended minimum resolution: 1024x1024 pixels for best quality
  - By default, smaller images (down to 256x256) will be automatically upscaled (increase_resolution=true)
  - To strictly enforce the 1024x1024 minimum, set increase_resolution=false
- Supported formats: jpg, jpeg, png, webp
- Preferably use original high-quality assets

Dataset Guidelines:
- Recommended: 5-50 images for optimal results
- Maximum supported: 200 images
- Ensure consistency in style, structure, and visual elements
- Balance diversity in content while maintaining consistency in key elements

For optimal training (especially for characters/objects):
- Subject should occupy most of the image area
- Minimize unnecessary margins around the subject
- Transparent backgrounds will be converted to black
- For character datasets: include diverse poses, environments, attires, and interactions

Constraints:
- Can only be used by "basic" upload type datasets. Use images/bulk for advanced datasets.
- Dataset must have at least 5 images
- Dataset cannot exceed 200 images
- Cannot upload to a completed dataset

This API endpoint supports content moderation via an optional parameter.

## Get Images

 - [GET /tailored-gen/datasets/{dataset_id}/images](https://docs.bria.ai/tailored-generation/dataset/get-images.md): Retrieve all images in a specific dataset.

## Regenerate All Captions

 - [PUT /tailored-gen/datasets/{dataset_id}/images](https://docs.bria.ai/tailored-generation/dataset/regenerate-all-captions.md): Regenerate captions for all images in a dataset. This is crucial after updating the visual schema, to ensure full compatibility with the new schema.

This is an asynchronous operation. Once called, poll Get Dataset by ID until captions_update_status changes to 'completed'.

## Advanced Image Upload

 - [POST /tailored-gen/datasets/{dataset_id}/images/bulk-upload](https://docs.bria.ai/tailored-generation/dataset/bulk-upload-images.md): Efficiently upload a large volume of images (up to 5000) from a ZIP file to an advanced dataset.

Upload without Schema: You can initiate a bulk upload even if visual_schema is null. Images will be uploaded but caption generation will be skipped. You must call Regenerate All Captions after defining the schema.

General:
* Asynchronous operation; status can be retrieved via {dataset_id}/bulk-upload/status.
* Supported for 'advanced' upload type datasets only.
* If the dataset is not empty, if another bulk upload is in progress, or if any previous bulk upload attempt took place, the request will fail.

Image Requirements:
* Supported formats: jpg, jpeg, png, webp.
* Minimum dimensions: 1024 x 1024 pixels.
* Total size limit: 5 GB zip file.

## Get Image by ID

 - [GET /tailored-gen/datasets/{dataset_id}/images/{image_id}](https://docs.bria.ai/tailored-generation/dataset/get-image.md): Retrieve full image information.

## Update Image Caption

 - [PUT /tailored-gen/datasets/{dataset_id}/images/{image_id}](https://docs.bria.ai/tailored-generation/dataset/update-image-caption.md): Update the caption of a specific image. Two mutually exclusive options:

1. Provide a new caption: Use the caption parameter (sets caption_source to "manual"). The caption must be a string containing a valid JSON structure.
2. Regenerate automatically: Set regenerate_caption to true (sets caption_source to "automatic").

Constraints:
* Cannot update captions in a completed dataset
* Cannot provide both caption and regenerate_caption

## Delete Image

 - [DELETE /tailored-gen/datasets/{dataset_id}/images/{image_id}](https://docs.bria.ai/tailored-generation/dataset/delete-image.md): Permanently remove an image from a dataset. Cannot delete images from completed datasets.

## Get Bulk Upload Status

 - [GET /tailored-gen/datasets/{dataset_id}/images/bulk-upload/status](https://docs.bria.ai/tailored-generation/dataset/get-bulk-upload-status.md): Retrieve the status and progress of a bulk image upload job.

## Generate Visual Schema

 - [POST /tailored-gen/generate_visual_schema](https://docs.bria.ai/tailored-generation/dataset/generate-visual-schema.md): Generates a structured JSON visual schema (backbone) based on the provided sample images.

The visual schema represents mutual characteristics (style, IP, colors, etc.) across training images and is used for:
1. Caption generation during image upload.
2. Prompt translation (user text → structured prompt) during generation.

Usage:
- Provide 5-10 representative images of your style/IP.
- The returned visual_schema string must be added to your dataset using PUT /tailored-gen/datasets/{dataset_id}.

This endpoint supports content moderation via an optional parameter.

## Refine Structured Prompt

 - [POST /tailored-gen/refine_structured_prompt](https://docs.bria.ai/tailored-generation/dataset/refine-json.md): Refines a Structured Prompt object (such as a Visual Schema or an Image Caption) based on user instructions.

Access Control & Validation:
* Requires a valid dataset_id to verify ownership.
* The referenced dataset must be in draft mode.

Use Cases:
1. Refine Visual Schema: Input the initial schema and instructions like "Make the style description more detailed".
2. Refine Image Caption: Input a specific image's caption and instructions like "Fix the description of the hair color".

## Download Dataset

 - [GET /datasets/{dataset_id}/download](https://docs.bria.ai/tailored-generation/dataset/download-dataset.md): Download an advanced dataset. Returns a pre-signed URL for downloading the dataset.