POST
/
generate
/
controlnet-sd15
{
  "images": [
    {
      "image_b64": "<string>",
      "removed_for_safety": true,
      "seed": 123,
      "safety_score": 123
    }
  ],
  "prediction_time_ms": 123
}

Body

application/json
prompt
string
required

Text describing the image content to generate.

prompt_2
string | null

Text with a high-level description of the image to generate. Used only by SD XL.

negative_prompt
string | null

Text describing image traits to avoid during generation.

negative_prompt_2
string | null

Text with a high level description of things to avoid during generation. Used only by SD XL.

checkpoint
string | null

Custom checkpoint to be used during image generation.

controlnet
string | null

ControlNet to be used during image generation

vae
string | null

Custom VAE to be used during image generation.

textual_inversions
object | null

A dictionary of textual inversions to be used during image generation. Textual inversions as keys and trigger words as values.

loras
object | null

A dictionary of LoRAs to apply. LoRAs as keys and their weights (float) as values.

sampler
enum<string>

Sampler name (also known as 'scheduler') to use during image generation.

Available options:
PNDM,
LMS,
KLMS,
DDIM,
DDPM,
HEUN,
K_HEUN,
K_EULER,
K_EULER_ANCESTRAL,
DPM_SOLVER_MULTISTEP,
DPM_PLUS_PLUS_2M_KARRAS,
DPM_SINGLE,
DPM_2,
DPM_2_ANCESTRAL,
DPM_PLUS_PLUS_SDE_KARRAS,
UNI_PC,
LCM
height
integer | null

Integer representing the height of image to generate. None will default to 512 for SD 1.5 and 1024 for SD XL and SSD. Supported resolutions (w,h): SDXL={(1536, 640), (768, 1344), (832, 1216), (1344, 768), (1152, 896), (640, 1536), (1216, 832), (896, 1152), (1024, 1024)}, SD1.5={(768, 576), (1024, 576), (640, 512), (384, 704), (640, 768), (640, 640), (1024, 768), (1536, 1024), (768, 1024), (576, 448), (1024, 1024), (896, 896), (704, 1216), (512, 512), (448, 576), (832, 512), (512, 704), (576, 768), (1216, 704), (512, 768), (512, 832), (1024, 1536), (576, 1024), (704, 384), (768, 512)}, SSD={(1536, 640), (768, 1344), (832, 1216), (1344, 768), (1152, 896), (640, 1536), (1216, 832), (896, 1152), (1024, 1024)}

width
integer | null

Integer representing the width of image to generate. None will default to 512 for SD 1.5 and 1024 for SD XL and SSD. Supported resolutions (w,h): SDXL={(1536, 640), (768, 1344), (832, 1216), (1344, 768), (1152, 896), (640, 1536), (1216, 832), (896, 1152), (1024, 1024)}, SD1.5={(768, 576), (1024, 576), (640, 512), (384, 704), (640, 768), (640, 640), (1024, 768), (1536, 1024), (768, 1024), (576, 448), (1024, 1024), (896, 896), (704, 1216), (512, 512), (448, 576), (832, 512), (512, 704), (576, 768), (1216, 704), (512, 768), (512, 832), (1024, 1536), (576, 1024), (704, 384), (768, 512)}, SSD={(1536, 640), (768, 1344), (832, 1216), (1344, 768), (1152, 896), (640, 1536), (1216, 832), (896, 1152), (1024, 1024)}

cfg_scale
number
default: 12

Floating-point number represeting how closely to adhere to prompt description. Must be a positive number no greater than 50.0.

steps
integer
default: 30

Integer repreenting how many steps of diffusion to run. Must be greater than 0 and less than or equal to 200.

num_images
integer
default: 1

Integer representing how many output images to generate with a single prompt/configuration.

seed

Integer number or list of integers representing the seeds of random generators. Fixing random seed is useful when attempting to generate a specific image. Must be greater than 0 and less than 2^32.

controlnet_image
string | null

Controlnet image encoded in b64 string for guiding image generation. Required for controlnet engines.

init_image
string | null

Starting point image encoded in b64 string for Image to Image generation mode.

mask_image
string | null

b64 encoded mask image for inpainting. White area should indicate where to paint.

strength
number
default: 0.8

Floating-point number indicating how much creative the Image to Image generation mode should be. Must be greater than 0 and less than or equal to 1.0.

style_preset
enum<string> | null

Pre-defined styles used to guide the output image towards a particular style. Pre-defined styles are only supported by SDXL.

Available options:
base,
3d-model,
analog-film,
anime,
cinematic,
comic-book,
Craft Clay,
modeling-compound,
digital-art,
enhance,
fantasy-art,
isometric,
line-art,
low-poly,
neon-punk,
origami,
photographic,
pixel-art,
tile-texture,
Advertising,
Food Photography,
Real Estate,
Abstract,
Cubist,
Graffiti,
Hyperrealism,
Impressionist,
Pointillism,
Pop Art,
Psychedelic,
Renaissance,
Steampunk,
Surrealist,
Typography,
Watercolor,
Fighting Game,
GTA,
Super Mario,
Minecraft,
Pokémon,
Retro Arcade,
Retro Game,
RPG Fantasy Game,
Strategy Game,
Street Fighter,
Legend of Zelda,
Architectural,
Disco,
Dreamscape,
Dystopian,
Fairy Tale,
Gothic,
Grunge,
Horror,
Minimalist,
Monochrome,
Nautical,
Space,
Stained Glass,
Techwear Fashion,
Tribal,
Zentangle,
Collage,
Flat Papercut,
Kirigami,
Paper Mache,
Paper Quilling,
Papercut Collage,
Papercut Shadow Box,
Stacked Papercut,
Thick Layered Papercut,
Alien,
Film Noir,
HDR,
Long Exposure,
Neon Noir,
Silhouette,
Tilt-Shift
use_refiner
boolean
default: true

Whether to enable and apply the SDXL refiner model to the image generation.

high_noise_frac
number
default: 0.8

Floating-point number that defines the fraction of steps to perform with the base model. Used only by SD XL. Must be greater than or equal to 0.0 and less than or equal to 1.0.

controlnet_conditioning_scale
number
default: 1

How strong the effect of the controlnet should be.

controlnet_early_stop
number | null

If provided, indicates fraction of steps at which to stop applying controlnet. This can be used to sometimes generate better outputs.

controlnet_preprocess
boolean
default: true

Whether to apply automatic ControlNet preprocessing.

clip_skip
integer | null

Optionally skip later layers of the text encoder. Higher values lead to more abstract interpretations of the prompt.

outpainting
boolean
default: false

Whether the request requires outpainting or not. If so, special preprocessing is applied for better results.

image_encoding
enum<string>

Define which encoding process should be applied before returning the generated image(s).

Available options:
jpeg,
png
transfer_images
object | null

A dictionary containing a mapping of trigger words to a list of sample images which demonstrate the desired object or style to transfer.

Response

200 - application/json
images
object[]
required

List of ImageGeneration(s) generated by the request.

prediction_time_ms
number
required

Total runtime of the image generation(s).