FluxPipeline provides support for multiple image generation pipelines with and without controlnets, including text-to-image (txt2img), image-to-image (img2img), and inpainting.

Important Note

Ensure that a volume mount is added to the deployment, as all images generated are dumped inside /data/outputs directory in the container.

Model Optimization Configuration

Optimization Settings

For optimization, under the optimization config, use:

"optimisations": {
    "attention_caching": {
      "type": "auto",
      "enabled": true,
      "extra_params": {
        "threshold": 0.1
      }
    }
}
  • Higher threshold values result in greater speed gains but may degrade image generation accuracy.
  • We recommend a threshold of 0.1, which can provide up to a 40% speed improvement during inference while maintaining reasonable quality.

Pipeline Settings

For optimization, under the optimization config, use:

  • Multi-mode ControlNet model can be added under the controlnets section.
  • Each ControlNet model requires a name, source, and authentication details if needed.
  • Instead of using multiple single-mode ControlNets, all conditioning sources are unified within a single model.

Current Limitations

  • Supports only one multi-mode ControlNet model.
  • Does not support multi-mode ControlNet with additional single-mode ControlNets.
  • This means if using a multi-mode model (FLUX.1-dev-ControlNet-Union-Pro), you cannot combine it with other standalone ControlNets.
{
  "type": "flux",
  "loras": [],
  "lora_repo": {
    "path": "",
    "type": "",
    "secret": {
      "type": ""
    },
    "ownership": ""
  },
  "pipelines": [
    "txt2img"
  ],
  "controlnets": [
    {
      "name": "canny",
      "source": {
        "path": "Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro",
        "type": "hf",
        "secret": {
          "type": "hf",
          "token": ""
        }
      }
    }
  ],
  "model_choice": {
    "flux_type": "flux"
  },
  "custom_pipeline_config": [],
  "custom_pipeline_resources": ""
}

Understanding ControlNet Parameters

When using MultiMode Controlnet, these parameters define the multi-mode behavior:

ParameterDescription
control_imageList of images used as conditioning (e.g., depth map, canny edges).
controlnet_conditioning_scaleList of scaling factors for each conditioning image.
control_modeDefines the influence level for each control image.

Supported Pipelines

  1. txt2img - Generates an image from text input.
  2. txt2img_controlnet - Generates an image based on an input image and a given prompt.
  3. img2img - Modifies specific regions of an image based on a mask and a given prompt.
  4. img2img_controlnet - Generates an image based on an input image and a given prompt with controlnet support.
  5. inpaint - Modifies specific regions of an image based on a mask and a given prompt.
  6. inpaint_controlnet - Modifies specific regions of an image based on a mask and a given prompt with controlnet support.

Example Requests

txt2img

{
    "prompt": "A girl in city, 25 years old, cool, futuristic <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "negative_prompt": "canvas frame, (high contrast:1.2), (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 4,
    "num_inference_steps": 20,
    "seed": 2064977189,
    "guidance_scale": 4.5,
    "strength": 0.8,
    "scheduler": "EULER-A",
    "model_type": "txt2img"
}

txt2img_controlnet

{
    "prompt": "A girl in city, 25 years old, cool, futuristic <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "negative_prompt": "canvas frame, (high contrast:1.2), (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 4,
    "num_inference_steps": 20,
    "seed": 2064977189,
    "guidance_scale": 4.5,
    "control_images": [
      "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg",
      "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg"
    ],
    "control_weightages": [0.2, 0.4],
    "control_modes": [2, 0],
    "strength": 0.8,
    "scheduler": "EULER-A",
    "model_type": "txt2img_controlnet"
}

img2img

{
    "prompt": "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "negative_prompt": "canvas frame, (high contrast:1.2), (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 4,
    "num_inference_steps": 20,
    "image": "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg",
    "seed": 89395930,
    "guidance_scale": 7.0,
    "strength": 0.5,
    "scheduler": "EULER-A",
    "model_type": "img2img"
}

img2img_controlnet

Note : Currently num_images_per_prompt = 1 is supported.

{
    "prompt": "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "negative_prompt": "canvas frame, (high contrast:1.2), (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 1,
    "num_inference_steps": 20,
    "image": "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg",
    "control_images": [
        "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg",
        "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg"
    ],
    "control_weightages": [0.2, 0.4],
    "control_modes": [2, 0],
    "seed": 89395930,
    "guidance_scale": 7.0,
    "strength": 0.5,
    "scheduler": "EULER-A",
    "model_type": "img2img_controlnet"
}

inpaint

{
    "prompt": "Face of a yellow cat, high resolution, sitting on a park bench <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 4,
    "num_inference_steps": 20,
    "image": "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png",
    "mask_image": "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png",
    "seed": 89395930,
    "guidance_scale": 7.0,
    "strength": 0.5,
    "scheduler": "EULER-A",
    "clip_skip": 0,
    "use_foocus": true,
    "model_type": "inpaint"
}

inpaint_controlnet

Note : Currently num_images_per_prompt = 1 is supported.

{
    "prompt": "Face of a yellow cat, high resolution, sitting on a park bench <lora:multimodalart/plstps-local-feature:0.3> <lora:XLabs-AI/flux-RealismLora:0.3>",
    "height": 1024,
    "width": 1024,
    "num_images_per_prompt": 1,
    "num_inference_steps": 20,
    "image": "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png",
    "mask_image": "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png",
    "control_images": [
        "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg",
        "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg"
    ],
    "control_weightages": [0.2, 0.4],
    "control_modes": [2, 0],
    "seed": 89395930,
    "guidance_scale": 7.0,
    "strength": 0.5,
    "scheduler": "EULER-A",
    "clip_skip": 0,
    "use_foocus": true,
    "model_type": "inpaint_controlnet"
}

Example Response

{
    "response_id": "afbc439946a44d98bb8062c8b36ec16d",
    "inference_time_taken": 6.336474418640137,
    "lora_time": 1.8092551231384277,
    "total_time_taken": 7.176232099533081,
    "request_id": "6b060ab415d84117b7b6403d622414f5",
    "error": null
}

Key Notes

  • Ensure volume mounting in deployment for image storage.
  • ControlNet models are not loaded by default.
  • Ensure all 3 control_ params are sent out in request.
  • Supports multiple pipelines for text-to-image, image-to-image, and inpainting.