Image Gen REST API
All of our image generation models are accessible via REST API. Below you can see a simple cURL/Python SDK and Typescript SDK example for our image gen endpoints, along with explanations of all parameters.
Our URL for image generations is at https://image.octoai.run/generate/{engine_id}
, where engine_id is one of the following:
sdxl
: Stable DiffusionXL v1.0sd
: Stable Diffusion v1.5controlnet-sdxl
: ControlNet SDXLcontrolnet-sd15
: ControlNet SD1.5
This includes text-to-image, image-to-image, controlnets, photo merge, inpainting and outpainting.
Input Sample
Image Generation Arguments
prompt
: A string describing the image to generate.- We currently have a 77 token limit on prompts for SDXL and 231 for SD 1.5
- You can use prompt weighting, e.g.
(A tall (beautiful:1.5) woman:1.0) (some other prompt with weight:0.8)
. The weight will be the product of all brackets a token is a member of. The brackets, colons and weights do not count towards the number of tokens.
prompt_2
: This only applies to SDXL. By default, setting onlyprompt
copies the input to bothprompt
andprompt_2
. Whenprompt
andprompt_2
are both set, they have very different functionality. The second prompt is meant for more human readable descriptions of the desired image.- For example,
prompt
is used for “word salad” style control of the image. This is the type of prompting you are likely familiar with from SD 1.5. Prompts like the following work well:
prompt = "photorealistic, high definition, masterpiece, sharp lines"
whereasprompt_2
is meant for more human readable descriptions of the desired image. For example:
prompt_2 = "A portrait of a handsome cat wearing a little hat. The cat is in front of a colorful background.
- For example,
negative_prompt
Optional
: A string indicating a prompt for guidance to steer away from. Unused when not provided.negative_prompt_2
: This only applies to SDXL. This prompt is meant for human readable descriptions of what you don’t want the image, e.g. you would say “Low resolution” innegative_prompt
then “Bad hands” innegative_prompt_2
.sampler
Optional
: A string specifying which scheduler to use when generating an image. Defaults toDDIM
. Regular samplers includeDDIM
,DDPM
,DPM_PLUS_PLUS_2M_KARRAS
,DPM_SINGLE
,DPM_SOLVER_MULTISTEP
,K_EULER
,K_EULER_ANCESTRAL
,PNDM
,UNI_PC
. Premium samplers (2x price) includeDPM_2
,DPM_2_ANCESTRAL
,DPM_PLUS_PLUS_SDE_KARRAS
,HEUN
andKLMS
.height
Optional
: An integer specifying the height of the output image. Defaults to1024
for SDXL and512
for SD 1.5.width
Optional
: An integer specifying the width of the output image. Defaults to1024
for SDXL and512
for SD 1.5.
Supported Output Resolutions (Width x Height) are as follows:
For SDXL:
(1024, 1024),(896, 1152),(832, 1216),(768,
1344),(640, 1536),(1536, 640),(1344, 768),
(1216, 832),(1152, 896)
For SD1.5
(512, 512),(640, 512),(768, 512),(512, 704),
(512, 768),(576, 768),(640, 768),(576, 1024),
(1024, 576)
init_image
and mask_image
will be resized to the specified resolution
before applying img2img or inpainting.
cfg_scale
Optional
: How strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt). When not set defaults to12
.steps
Optional
: How many steps of diffusion to perform. The higher this is, the higher the image clarity will be but proportionally increases the runtime. Defaults to30
when not set.num_images
: An integer describing the number of images to generate. Defaults to1
seed
Optional
: An integer that fixes the random noise of the model. Using the same seed guarantees the same output image, which can be useful for testing or replication. Usenull
to select a random seed.use_refiner
: This only applies to SDXL. A booleantrue
orfalse
determines whether to use the refiner or nothigh_noise_frac
Optional
: This only applies to SDXL. A floating point or integer determining how much noise should be applied using the base model vs. the refiner. A value of0.8
will apply the base model at 80% and Refiner at 20%. Defaults to0.8
when not set.checkpoint
: Here you can specify a checkpoint either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.loras
: Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.textual_inversions
: Here you can specify textual inversions and their corresponding trigger words. Note that using a custom asset increases generation time.vae
: Here you can specify variational autoencoders. Note that using a custom asset increases generation time.
Here’s an example of how to mix OctoAI assets (checkpoints, loras, and textual_inversions) in the same API request.
OoctoAI assets require an “octoai:” prefix but your private assets DO NOT. Asset names need to be unique per account
payload = {
...
"checkpoint": "octoai:realcartoon",
"loras": {
"octoai:crayon-style": 0.7,
"your-custom-lora": 0.3
},
"textual_inversions": {
"octoai:NegativeXL": "negativeXL_D",
},
"vae": "your_vae_name"
...
}
-
style_preset
Optional
: This only applies to SDXL. Used to guide the output image towards a particular style. Defaults toNone
. Supported values for styles present includebase
,3d-model
,Abstract
,Advertising
,Alien
,analog-film
,anime
,Architectural
,cinematic
,Collage
,comic-book
,Craft Clay
,Cubist
,digital-art
,Disco
,Dreamscape
,Dystopian
,enhance
,Fairy Tale
,fantasy-art
,Fighting Game
,Film Noir
,Flat Papercut
,Food Photography
,Gothic
,Graffiti
,Grunge
,HDR
,Horror
,Hyperrealism
,Impressionist
,isometric
,Kirigami
,line-art
,Long Exposure
,low-poly
,Minimalist
,modeling-compound
,Monochrome
,Nautical
,Neon Noir
,neon-punk
,origami
,Paper Mache
,Paper Quilling
,Papercut Collage
,Papercut Shadow Box
,photographic
,pixel-art
,Pointillism
,Pop Art
,Psychedelic
,Real Estate
,Renaissance
,Retro Arcade
,Retro Game
,RPG Fantasy
,Game
,Silhouette
,Space
,Stacked Papercut
,Stained Glass
,Steampunk
,Strategy Game
,Surrealist
,Techwear Fashion
,Thick Layered Papercut
,tile-texture
,Tilt-Shift
,Tribal
,Typography
,Watercolor
,Zentangle
-
init_image
Optional
: Only applicable for Img2Img and inpainting use cases i.e. to use an image as a starting point for image generation. Argument takes an image encoded as a string in base64 format.Use .jpg format to ensure best latency -
strength
Optional
: Only applicable for img2img use cases. A floating point or integer determines how much noise should be applied. Values that approach1.0
allow for high variation i.e. ignoring the image entirely, but will also produce images that are not semantically consistent with the input and0.0
keeps the input image as-is. Defaults to0.8
when not set. -
mask_image
Optional
: Only applicable for inpainting use cases i.e. to specify which area of the picture to paint onto. Argument takes an image encoded as a string in base64 format.- Use .jpg format to ensure best latency
-
outpainting
Optional
: Only applicable for outpainting use cases. Argument takes a boolean value to determine Whether the request requires outpainting or not. If so, special preprocessing is applied for better results. Defaults tofalse
-
transfer_images
Optional
: This is our Photo Merge feature. Applicable for use cases where you wish to transfer the subject in the uploaded image(s) to the output image(s). Argument takes a dictionary containing a mapping of trigger words to a list of sample images which demonstrate the desired object to transfer.
payload = {
...
"prompt": "A trigger_word_1 sitting on a golden throne",
"negative_prompt": "Blurry photo, distortion, low-res, bad quality",
"checkpoint": "octoai:RealVisXL",
"width": 1024,
"height": 1024,
"num_images": 2,
"sampler": "K_EULER_ANCESTRAL",
"steps": 20,
"cfg_scale": 7.5,
"transfer_images": {"trigger_word_1": ["$BASE64_IMAGE_1", "$BASE64_IMAGE_2"]
...
}
-
controlnet
Optional
: Required if using a controlnet engine. Argument takes in the value of ControlNet to be used during image generation. We offer the following list of public OctoAI controlnet checkpoints in the OctoAI Asset Library.octoai:canny_sdxl octoai:depth_sdxl octoai:openpose_sdxl octoai:canny_sd15 octoai:depth_sd15 octoai:inpaint_sd15 octoai:ip2p_sd15 octoai:lineart_sd15 octoai:openpose_sd15 octoai:scribble_sd15 octoai:tile_sd15
Other than using the default controlnet checkpoints, you can also upload private ControlNet checkpoints into the OctoAI Asset Library and then use those checkpoints at generation time via the parameter
controlnet
. For custom controlnet checkpoints, make sure to provide your own ControlNet mask in thecontrolnet_image
parameter -
controlnet_conditioning_scale
Optional
: Only applicable if using Controlnets. Argument determines how strong the effect of the controlnet will be. Defaults to1
-
controlnet_early_stop
Optional
:Only applicable if using Controlnets. If provided, indicates fraction of steps at which to stop applying controlnet. This can be used to sometimes generate better outputs. -
controlnet_image
Optional
: Required if using a controlnet engine. Controlnet image encoded in b64 string for guiding image generation. -
controlnet_preprocess
Optional
:Only applicable if using Controlnets. Argument takes in a boolean value to determine whether or not to apply automatic ControlNet preprocessing. For the privileged set of controlnet checkpoints listed above, we default to helping you autogenerate the corresponding controlnet map/mask that will be fed into the controlnet, but you can override the default by additionally specifying acontrolnet_preprocess: false
parameter.
Python Example for ControlNet Canny with a custom controlnet map:
import base64
import io
import os
import time
import PIL.Image
import requests
import cv2 as cv2
import matplotlib.pyplot as plt # Import Matplotlib
def _process_test(endpoint_url):
image_path = "cat.jpeg"
img = cv2.imread(image_path)
img = cv2.resize(img, (1024, 1024)) # Resize to a resolution supported by OctoAI SDXL
edges = cv2.Canny(img,100,200) # 100 and 200 are thresholds for determining canny edges
height, width = edges.shape
# Convert Canny edge map to PIL Image
edges_image = PIL.Image.fromarray(edges)
# Create a BytesIO buffer to hold the image data
image_buffer = io.BytesIO()
edges_image.save(image_buffer, format='JPEG')
image_bytes = image_buffer.getvalue()
encoded_image = base64.b64encode(image_bytes).decode('utf-8')
model_request = {
"controlnet_image": encoded_image,
"controlnet": "octoai:canny_sdxl",
"controlnet_preprocess": false,
"prompt": (
"A photo of a cute tiger astronaut in space"
),
"negative_prompt": "low quality, bad quality, sketches, unnatural",
"steps": 20,
"num_images": 1,
"seed": 768072361,
"height": height,
"width": width
}
prod_token = os.environ.get("OCTOAI_TOKEN") # noqa
reply = requests.post(
f"{endpoint_url}",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {prod_token}",
},
json=model_request,
)
if reply.status_code != 200:
print(reply.text)
exit(-1)
img_list = reply.json()["images"]
print(img_list)
for i, idict in enumerate(img_list):
ibytes = idict['image_b64']
img_bytes = base64.b64decode(ibytes)
img = PIL.Image.open(io.BytesIO(img_bytes))
img.load()
img.save(f"result_image{i}.jpg")
if __name__ == "__main__":
endpoint = "https://image.octoai.run/generate/controlnet-sdxl"
# Change this line to call either a10 or a100
_process_test(endpoint)
Was this page helpful?