Businesses can generate customizable avatars using OctoAI’s Photo Merge feature

Home

Blog

Janisha Anand&

Josh Fromm&

Brunno Goldstein

Feb 16, 2024

6 minutes

In this article

Solution overview

Workflow steps

Prerequisites

Walkthrough

Get started using Photo Merge today

In this article

Solution overview

Workflow steps

Prerequisites

Walkthrough

Get started using Photo Merge today

OctoAI Image Gen Solution introduces Photo Merge, allowing you to seamlessly integrate a photo’s subject into high-quality AI-generated output. It eliminates the need to create time-consuming custom facial fine-tunes with numerous tuning images and 15-30 minutes typically associated with SDXL LoRAs. OctoAI's Photo Merge simplifies this process, requiring only 1-4 images and delivering precise results within a few seconds. Businesses can now easily apply GenAI powered imagery for needs ranging from realistic CGI characters, to personalized product recommendations, to digital avatars.

Photo Merge can be accessed through the "transfer_images" parameter within OctoAI’s Image Generation API. This parameter accepts a key-value pair consisting of a trigger word and an array of up to 4 images. It operates exclusively with SDXL models and seamlessly harmonizes with style presets, controlnets, checkpoints, and LoRAs when utilized with SDXL models, thereby amplifying its adaptability and functionality.

In this post, we will walk you through how to use the Photo Merge functionality within OctoAI Image Gen API, utilizing the new transfer_images parameter. We also show how it compares to using custom fine-tuning.

Solution overview

For our use case, we are assuming the role of a Retail Media Marketing & eCommerce Advertising Platform. With access to a real model's photo, our goal is to seamlessly integrate it into a variety of products. First, we'll generate AI-powered images of our human model. For this, we will try both the traditional approach of creating a custom fine-tune with the human model’s images and the new approach of using Photo Merge functionality. We will compare the results and latency of the two approaches. Next, we will create a custom fine-tune for the products our model will represent and lastly, we will showcase the seamless integration of the model’s face with the corresponding product.

Workflow steps

Create a custom facial fine tune of the human model with 10-12 portrait images.
Leverage Photo Merge (transfer_image parameter in OctoAI SDXL Image Gen API) with only 1-4 images of the human model instead of custom facial fine tune created in step 1.
Compare the results between the two approaches.
Create custom fine tunes (SDXL LoRAs) for a retail product.
Integrate the human model images (generated in step 2) with the images of retail product (generated in step 4).

Prerequisites

For this walkthrough, make sure you have generated an OctoAI API token and have it set in your environment. You may use any of our supported languages: Python SDK, Typescript SDK, CLI or curl to avail OctoAI’s Image Generation API. Refer to our API documentation.

Walkthrough

Create custom fine tune (SDXL LoRAs) of a human model: We have taken 10 images of OctoAI’s CEO, Luis Ceze as our tuning image dataset.

OctoAI web UI showing fine-tune of OctoAI CEO Luis Ceze || '

Next, we will create a custom fine tune from OctoAI’s WebUI. Navigate to Image Generation → Tuning & Datasets.

OctoAI web UI screenshot showing where to select fine-tuning in the drop down menu || '

Click the ‘+New Tune’ button to begin. Adjust fine-tuning settings and upload your images. This involves selecting a base checkpoint (in our case, default SDXL), assigning a trigger word (to customize images with your subject), and specifying the number of steps. A range of 400 to 1,200 steps generally yields optimal results. Upload the image dataset and submit the fine tuning job.

OctoAI web UI showing all the required fields to create a new fine-tune on the platform || '

With 800 steps and 10 tuning images, it will approximately take between 20-30 mins to complete. Once it completes, let’s evaluate the effectiveness of our custom facial fine-tune.

Navigate to Image Generation → Image Tools and click on Text to Image tile card. Here, let’s use the following parameters:

Step by steps instructions for using the Web UI to find your face tune LoRA and apply to Text2Img GenAI image output on OctoAI || '

It's evident that the output images don't entirely resemble our tuning dataset's human model, Luis Ceze. While the man in the output bears some resemblance to the tuning dataset, he doesn't closely resemble Luis.

Four AI generated images using the Luis Ceze face fine-tune in a coffee shop that are realistic || '

Achieving a closer resemblance would require a larger tuning dataset (64-100 images) and/or increasing the number of steps, which would be both time and cost-intensive and not scalable.

Utilize Photo Merge feature: Let’s now try the new Photo Merge feature and compare the output results from both approaches. We’ll use the transfer_image parameter in OctoAI’s Image Gen API to show case this functionality.

Let us start with uploading 4 images of our human model — Luis Ceze.

OctoAI code example and showing the images of Luis C used for fine-tuning || '

Next, we utilize the transfer_images={"triggerword": list of images} parameter within the payload of OctoAI’s SDXL Image Gen API at https://image.octoai.run/generate/sdxl.

OctoAI code example to call the API showing the fine-tune trigger word in the prompt || '

In the given example, we employ the trigger word ‘luis’ and link it with the dataset comprising the four images mentioned earlier. Subsequently, we structure the prompt to incorporate the trigger word.

Prompt: A man luis sitting in a coffee shop.

The remaining parameters remain consistent with approach 1. It's worth noting that in this instance, no LoRA is utilized. Additionally, we utilize a checkpoint named ‘RealVisXL’, an OctoAI asset checkpoint specifically optimized for the Photo Merge feature. However, it's important to mention that the Photo Merge feature is functional even if the base SDXL checkpoint is utilized.

The request take approximately 8.8 secs and generates the following output:

Pretty accurate, isn’t it? Let’s try it with few different prompts and combine it with other style presets, LoRAs and checkpoints to confirm whether we consistently get the accurate results.

Let us use transfer_images parameter in conjunction with ‘Graffiti’ style preset. We are keeping all other parameter values similar to the payload above.

Code example highlighting the ability to add a style preset 'Graffiti' || '

The request take approximately 8.7 secs and generates the following output:

Four images generated of Luis C using his face fine-tune with the graffiti style preset || '

Let’s now use transfer_images parameter with a pre-trained Style LoRA. We have already imported a pre-trained style based LoRA into OctoAI’s Asset Library. In the payload below, we are using the corresponding asset’s asset id and assigning it a weight of 1.0.

Code highlight showing weight in the API call || '

The request take approximately 16.9 secs and generates the following output:

Four images of Luis using the Photo Merge and a style preset || '

You'll notice that the AI-generated images of our human model closely resemble his actual images. The results of PhotoMerge are significantly more precise and do not require the additional time of fine-tuning a custom LoRA for 20-30 minutes to achieve the desired outcome.

Comparison between custom fine tunes for faces (SDXL LoRAs) vs Octo AI’s Photo Merge

Approaches Tuning image dataset Steps in fine-tune Time for fine-tune Inference latency Results
Custom Fine Tune for Face (SDXL LoRAs)
16-64
800-1,000
20-30 minutes, increasing linearly with more tuning data and num of steps
Few seconds
Poor to mediocre quality
Photo Merge
1-4
N/A
N/A
Few seconds
Precise and accurate

Approaches	Tuning image dataset	Steps in fine-tune	Time for fine-tune	Inference latency	Results
Custom Fine Tune for Face (SDXL LoRAs)	16-64	800-1,000	20-30 minutes, increasing linearly with more tuning data and num of steps	Few seconds	Poor to mediocre quality
Photo Merge	1-4	N/A	N/A	Few seconds	Precise and accurate

Now that we've determined the best approach for generating accurate images of our human model, let's bring it all together. We'll create a custom fine-tune for our retail product and seamlessly integrate our AI-generated human model's image with it.

Create custom fine tunes (SDXL LoRAs) for retail product: The steps to create a custom fine tune are similar to what was showcased earlier in the blog. We will upload 10-12 images of our product, which in our case are different colored Lacoste polo shirts for men.

OctoAI web UI showing the creation of a fine-tune for lacoste collared shirts || '

We will then create a custom fine tune by configuring the appropriate fine tuning parameters (as shown earlier), assign it a different trigger word and and upload our tuning dataset.

OctoAI web UI fine-tune of lacoste using trigger word to call in a prompt || '

After approximately 20-30 mins, our custom LoRA fine tuned on our branded polo-shirts will be available.

We are now ready to bring everything together. Let us use transfer_images parameter (Photo Merge) to generate accurate images of our human model, Luis and apply ‘lacosteshirt-finetune’ LoRA to the shirt he is wearing.

"prompt": "A man luis wearing a pink T-shirt lacosteshirt1:1, sitting in a coffee shop"
"loras": {"asset_01hp5hsn6mfh6b0zf47q862a6b": 1.0}

"transfer_images": {"luis": luis_b64_images}

code highlight putting all parts together: Photo Merge, weighting, and fine-tune || '

Please note that "luis" serves as the trigger word associated with Luis’s images in transfer_images. We position this trigger word immediately after the subject, "man," enabling our human model to inherit the facial attributes of Luis’s images. Additionally, we input the asset ID of the custom LoRA tuned for Lacoste polo shirts for men, which is associated with the trigger word "lacosteshirt1." This trigger word is placed immediately after the word "T-shirt" in our prompt, ensuring that the required attributes are applied to the shirt.

The request takes seconds and generates the following output:

AI generated Luis Ceze in a lacoste shirt using OctoAI's new Photo Merge Image Gen feature || '

Voila! The generated output seems to seamlessly integrate our human model’s face - in this case, Luis's — with the corresponding product: a Lacoste pink polo T-shirt.

This blog showcases just one facet of OctoAI’s Photo Merge feature's possibilities. Photo Merge offers endless potential - whether in entertainment, gaming, marketing agencies, or fashion and retail sectors, it can help craft personalized avatars, advertisements, and brand ambassador representations. It can also enable virtual try-ons and lifelike digital product showcases. To learn more, refer to our documentation.

Get started using Photo Merge today

Please join us on Discord to engage with the team and our community. We’ll use the Discord channel to share about upcoming features, promotions and competitions. Stay tuned to learn more, and I look forward to see the applications and imagery you build using OctoAI Image Gen Solution.