Tendril

Lesson 58 of 2116

ControlNet, IP-Adapter, LoRA — Fine-Grained Control

Base diffusion models give you creative possibilities. Adapters give you creative PRECISION. Master the three that matter most.

CreatorsCreative AI~26 min readAdvancedDesignerOperationsBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

44 min20 blocks5 concepts

Learning path

The main moves in order

1The control stack
2ControlNet
3LoRA
4IP-Adapter

Concept cluster

Terms to connect while reading

ControlNetLoRAIP-Adapterfine-tuningdiffusion adapters

Sections6

Lists2

Notes4

Code1

Compare1

Section 1

The control stack

A bare diffusion model reads a text prompt and generates something plausible. Production creative work needs more: a specific pose, a specific character, a specific style. Three adapter families — ControlNet, IP-Adapter, and LoRA — cover 95% of professional use cases. They compose cleanly.

ControlNet — structural conditioning

ControlNet (Zhang et al., 2023) adds structural guidance to a diffusion model via an auxiliary network. You pass a conditioning image (edge map, depth map, pose skeleton, normal map, segmentation) and the model respects that structure while the text prompt fills in appearance. It's the foundation of 'put THIS character in THAT pose' and 'keep the composition, change the style.'

Canny edges — preserve line structure of a sketch.
Depth — preserve 3D layout of a reference photo.
OpenPose — specify exact human body pose.
Scribble — rough doodle controls composition.
Segmentation — color-coded regions define what goes where.
Tile — upscale with seamless detail.

Check-in 1. Got it so far?

IP-Adapter — image prompting

IP-Adapter (Ye et al., 2023) lets you prompt with an IMAGE, not just text. Feed it a reference image; the diffusion model's output borrows the reference's subject, style, or composition (depending on the variant). Crucial for character consistency across a comic, style matching across a brand system, and face-preserving portraits.

LoRA — lightweight fine-tuning

LoRA (Low-Rank Adaptation, Hu et al., 2021) adds a small set of trainable matrices to the diffusion model's attention layers. You can fine-tune a few MB of weights on as few as 10-30 images to teach the model a new character, object, artist style, or concept. Swap LoRAs at inference time — 'same base Flux, three different brand styles' is a one-line change.

Compare the options

Tool	What it controls	When to use
ControlNet	Structure (pose, depth, edges).	You have a reference composition and want to re-style it.
IP-Adapter	Style or subject from a reference image.	You want the 'vibe' of a reference or a consistent character.
LoRA	A learned concept (character, style, object).	You have 10+ reference images of a specific thing and want to generate more.
Textual Inversion	A learned concept as a single prompt token.	Similar to LoRA but lower capacity; less common in 2026.

Check-in 2. Got it so far?

Stacking adapters

The professional pipeline typically stacks: base model (Flux Dev) + character LoRA + style LoRA + ControlNet pose + IP-Adapter for facial consistency. Each layer adds constraint. The art is knowing when you're over-constraining (outputs look muddy, burnt) vs. under-constraining (outputs drift).

Production-style adapter stacking on Flux Dev in Diffusers.

python

# ComfyUI / Diffusers-style pseudocode stacking adapters on Flux
from diffusers import FluxPipeline, ControlNetModel
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
).to("cuda")

# Load a character LoRA (trained on 15 images of our mascot)
pipe.load_lora_weights("./loras/mascot-flux-lora.safetensors", adapter_name="mascot")
# Load a brand-style LoRA
pipe.load_lora_weights("./loras/brand-style-lora.safetensors", adapter_name="brand_style")
pipe.set_adapters(["mascot", "brand_style"], adapter_weights=[1.0, 0.7])

# Pose control from an OpenPose reference
controlnet = ControlNetModel.from_pretrained("XLabs-AI/flux-controlnet-pose")

# IP-Adapter for facial consistency with hero shot
pipe.load_ip_adapter("XLabs-AI/flux-ip-adapter", weight=0.6)

image = pipe(
    prompt="The mascot standing confidently in a neon-lit lab, cinematic",
    control_image=pose_reference,  # OpenPose skeleton
    ip_adapter_image=hero_face,    # Reference face
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]

Training a LoRA yourself

1Collect 15-30 images of your concept. Varied angles, consistent subject.
2Caption each image precisely; use a unique token (e.g., 'TDRLMSCT' — a made-up word) for the concept.
3Train with kohya_ss, Replicate FLUX trainer, or Fal trainer. Typical time: 30 min on an H100, ~$3-10.
4Validate: generate 10 outputs with and without the LoRA. Check concept is captured without overfitting.
5Version. Tag the LoRA with training date, base model, and trigger token.

Check-in 3. Got it so far?

Key terms in this lesson

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “ControlNet, IP-Adapter, LoRA — Fine-Grained Control”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

ControlNet, IP-Adapter, LoRA — Fine-Grained Control

The control stack

ControlNet — structural conditioning

IP-Adapter — image prompting

LoRA — lightweight fine-tuning

Stacking adapters

Training a LoRA yourself

Curious about “ControlNet, IP-Adapter, LoRA — Fine-Grained Control”?

Keep going

ControlNet, IP-Adapter, LoRA — Fine-Grained Control

The control stack

ControlNet — structural conditioning

IP-Adapter — image prompting

LoRA — lightweight fine-tuning

Stacking adapters

Training a LoRA yourself

Curious about “ControlNet, IP-Adapter, LoRA — Fine-Grained Control”?

Keep going