How to create an original character LoRA [SD1.5 Training] SD1.5 Character Training

⏱️24min read

📅 Apr 13, 2025

🔄 Apr 13, 2025

How to create an original character LoRA [SD1.5 Training] SD1.5 Character Training featured Image

Supported by

This article details the training of SD1.5 using the Kohya ss GUI. At the time of writing this article, the SD1.5 model is not the primary model, but training SD1.5 takes a short time and allows you to experiment with a variety of training parameters. It is not exactly the same characteristics as SDXL (Illustrious Pony), which is currently the main model in Illustration AI, but it is a good introduction. if you do not know how to install Kohya ss GUI or how to create datasets, I recommend you to read the article below first.

How to create an original character LoRA [Preparation] Kohya SS GUI Installation and basic operations

🔗Read later

How to create an original character LoRA [Dataset] Making a training image and caption

🔗Read later

ASUS TUF Gaming GeForce RTX 4090 24GB Gaming Graphics Card (DLSS 3, PCIe 4.0, 24GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a, TUF-RTX4090-24G-GAMING)

Shop at Amazon-Usa

🔗Amazon-Usa Link

Image of MSI Gaming RTX 4070 Ti Super 16G AERO Graphics Card (NVIDIA RTX 4070 Ti Super, 256-Bit, Boost Clock: 2610 MHz, 16GB GDRR6X 21 Gbps, HDMI/DP, Ada Lovelace Architecture)

MSI Gaming RTX 4070 Ti Super 16G AERO Graphics Card (NVIDIA RTX 4070 Ti Super, 256-Bit, Boost Clock: 2610 MHz, 16GB GDRR6X 21 Gbps, HDMI/DP, Ada Lovelace Architecture)

Shop at Amazon-Usa

🔗Amazon-Usa Link

About the LoRA Process

In understanding LoRA learning, it is important to understand the learning process. To briefly describe the process, data is first entered into the base model using input images and caption tags. The output of the model is compared to the original image and an error (Loss) is calculated. Based on this Loss, the gradient is calculated and the parameters to be trained in LoRA are updated. This process is repeated as a single training step, and the trained parameters become the LoRA model. LoRA is characterized by the fact that the parameters of the training base model are not changed directly, but are learned through an adapter.

Process diagram within the LoRA training step — The comparison of output data is made in the form of images for easier understanding, but in reality the output data (features and predictions) is compared with the correct data (input images).

Learning with Kohya ss GUI default values

Dataset

First, let’s train with default parameters. The dataset will be trained source from the data created in the previous article. If you want to train on the same dataset, it is available on Patreon, but only paid supporters can download it.

DataSet sample for training LoRA | Digital Creative AI | PATREON

https://www.patreon.com/posts/125348409

🔗External Link

Default Parameters

Once the dataset is ready, it is trained with parameters like the following, with slightly modified values for trained illustrations of the SD1.5 model. Areas that need to be entered or changed are noted in red text.

Pretrained model name or path: runwayml/stable-diffusion-v1-5
Trained Model output name: DCAI_Girl_SD15_def *Output name of the model
Instance prompt: dcai-girl *The caption method used in this case ignores this value, but if you do not enter it, an error will occur.
Class prompt: 1girl *Entered for the same reason as above.
Repeats: 5 [Default: 40] *This is because the training source has 100 images and we want to make the total number of images 500.
Presets: none
LoRA type: Standard
Train batch size: 1
Epoch: 1
Max train epoch: 0
Max train steps: 1600
Save every N epochs: 1
Seed: 123 [Default: 0 = random] *Insert the appropriate number to control the parameter.
LR Scheduler: cosine
Optimizer: AdamW8bit
Learning rate: 0.0001 (1e-4)
Text Encoder learning rate: 0.00005 (5e-5) [Default: 0.0001 (1e-4)] *Changed to the recommended defaults in the official documentation.
Unet learning rate: 0.0001 (1e-4)
LR warmup (% of total steps): 10
Network Rank (Dimension): 8
Network Alpha: 1
clip_skip: 2 [Default: 1] *Recommended values for SD1.5 Illustration AI

About the default Pretrained model

The default model, runwayml/stable-diffusion-v1-5, is the basis of the SD1.5 system, so it is versatile, but it is not suitable for learning illustration systems. If you are new to LoRA learning, try learning with this model to get a basic understanding of LoRA.

If the Pretrained model is set to the default runwayml/stable-diffusion-v1-5, the following Diffusers pretrained model is automatically loaded.

vae: diffusion_pytorch_model.safetensors/335Mb
text_encoder: model.safetensors/492Mb
unet: diffusion_pytorch_model.safetensors/3.44Gb

The Runway model has now been removed. The version at the time of writing is automatically downloaded from benjamin-paine’s repository.

benjamin-paine/stable-diffusion-v1-5 · Hugging Face

https://huggingface.co/benjamin-paine/stable-diffusion-v1-5/tree/main

🔗External Link

About Performance Monitoring During Training

While training in Kohya ss GUI, check the performance in the Task Manager in Windows: the SD1.5 model is lightweight compared to SDXL, so you don’t have to worry too much about it, but if you run out of dedicated GPU memory (VRAM), you can use the shared GPU memory is used. This condition will slow down the learning speed considerably, so it is recommended to change to a lighter-load setting.

Test generation using trained LoRA

The trained LoRA was used with the A1111 WebUI to generate the result, as shown in the figure below. The “AnyOrangeMix” was used to generate the image.

The generated results show only little fidelity, but we can see that the costumes are affected when compared to the results before LoRA was applied in the figure below. The face was hardly affected.

Test generation parameters

cinematic lighting, upper body,
(dcai-girl, 1girl, :1.1),solo, short hair, orange hair, brown eyes, animal ears,
dress, blue dress, long sleeves, black bowtie,
(skirt, black skirt, belt, brown footwear, white thighhighs, thigh strap,:0.8)
masterpiece, meadow, sky
<lora:DCAI_Girl_SD15_def:1.0>

(easynegative:1.0),(worst quality,low quality:1.2),(bad anatomy:1.4),(realistic:1.1),nose,lips,adult,fat,sad, (inaccurate limb:1.2),extra digit,fewer digits,six fingers,(monochrome:0.95)

Checkpoint Model: anyorangemixAnything_mint
Sampler: DPM++ SDE Karras
Steps: 20
CFG scale: 6
Seed: 3547528961
With: 768
Height: 512
Clip skip: 2
Textual Inversion: easynegative
ADetailer: on
Hires upscaler: 4x-UltraSharp

AnyOrangeMix | Civit AI

https://civitai.com/models/21503?modelVersionId=335092

🔗External Link

How to check the metadata of a trained LoRA model

You can use DCAI’s 🔗LoRA Inspector🧐 to view the LoRA metadata for the trained model. When you have trained a large number of LoRAs for validation, it is hard to remember all the training parameters, so use the LoRA inspector🧐 to see the training parameters.

Training with default values and animefullFinalPruned model

Next, let’s change the Pretrained model to the animated base model “Animefull-final-pruned”.

Download the following models to your \kohya_ss\models and select the model from the 📄 button under “Pre-trained model name or path” in the Kohya ss GUI.

Animefull-final-pruned | Civit AI

https://civitai.com/models/432419?modelVersionId=481732

🔗External Link

The parameters are the same as before, only the pretrained model has been changed and the trained results are shown below.

Result of the Animefull-final-pruned default parameter

The fidelity has increased from the first test generation. Furthermore, we can see the effect on the face as well as the costume. I made a weighting comparison to make it easier to see the effect of each.

The difference in LoRA’s Pre-trained model alone makes such a difference in quality.

Checkpoint Model: anyorangemixAnything_mint
Sampler: DPM++ SDE Karras
Steps: 20
CFG scale: 6
Seed: 3547528961
With: 512
Height: 512
Clip skip: 2
Textual Inversion: easynegative
ADetailer: off

The Loss value is slightly lower than the average Loss value of TenorBord.

Average Loss of TenorBord — Sky-blue: stable-diffusion-v1-5 Pink: Animefull-final-pruned

Methods to improve the quality of LoRA

It is difficult to define what constitutes a high-quality LoRA, but a LoRA that can generate images that are faithful to its purpose can be considered high-quality. In order to achieve high quality LoRA, it is important to clarify the purpose of LoRA. Let’s take a look at some typical LoRA objectives.

Character
Costume
Image style
Concept
Poses
Background
Vehicle

The training source image, caption, and training parameters will vary depending on these objectives. In this case, I wanted to train the LoRA of a character, so I will focus on the features and costumes of the person. Depending on this emphasis, the data set and training parameters (number of training source images, total number of training steps, learning rate, and network rank) must be set in a well-balanced manner.

Look again at the LoRA weight comparison image from earlier.。

The weight 1.0 image to the right of the top row shows a slight reflection of the costume’s features, but it is a big departure from the training source image. In the lower row, the costumes are getting closer, but they still only resemble each other. However, we can see that LoRA is working better than in the upper row.

Underfitting and Overfitting

Two indicators of LoRA quality are underfitting and overfitting. Simply put, a LoRA that is less effective is called underfitting, while a LoRA that is too strongly effective is called overfitting. Usually these symptoms are related to the degree of learning of the Unet. Take a look at the following comparative images.

These are comparisons of LoRAs saved per epoch when trained with epoch set to 20. 500 images per epoch were trained.

Underfitting

The leftmost <lora:DCAI_Girl_SD15_afFP-000002:1> in the comparison chart is the LoRA of the 1000th training step. The features of the training source image have been obtained, but they are not accurate enough for the purpose of LoRA training this time.

Overfitting

Conversely, the 10,000th step, <lora:DCAI_Girl_SD15_afFP:1>, accurately generates a reproduction of the character and costume. However, the gray background of the training source image may be the reason why the prompt meadows are no longer recognized, and only simple backgrounds can be generated. This situation is considered overfitting for the purpose of this LoRA training. Signs of overfitting include the collapse of the face and hand shapes, contrast, and blurring of colors.

One way to identify underfitting or overfitting is to adjust the LoRA weights.

Swing the weight in the plus/minus direction to check the generated result. For this model, the appropriate value is around 1.0. In the case of LoRA with underfitting or overfitting, this appropriate value will be significantly off. (Example of underfitting: appropriate at a weight of 1.5, etc.; example of overfitting: appropriate at a weight of 0.5, etc.)

Unet learning rate and Text Encoder learning rate

The learning rates Unet learning rate and Text Encoder learning rate are important parameters among LoRA’s learning parameters. Beginners should use the default values. If the results are not good, adjust them in step/epoch. Now, let’s look at the comparison images.

In A1111 WebUI, you can apply weights to each Unet and TextEncoder in the format <lora:loraName:Unet:TE>. From left to right, [Apply all / Unet only / Unet and half TE / TE only / TE and half Unet]. It is difficult to judge many things from this comparison image alone, but let me explain about Unet and TE.

Unet

Unet trains the relationship between the elements of the study image and their position in the structure.

Unet learning rate is the value that has the greatest impact on under and overfitting. If it is too low, underfitting occurs and the fidelity of the learning elements is reduced when LoRA is used. If it is too high, overfitting may occur, and the illustration style of the prompt and training source images may come out strongly.

Let’s look at the comparison images Unet only/Unet and half TE: with Unet only, the costume is no longer the costume in the training source image; with Unet and half TE, we can see that the accuracy of the costume has increased.

Text Encoder

The Text Encoder controls how the AI interprets the prompts during generation.

If the Text Encoder learning rate is too low, the prompt will not work well when using LoRA. Conversely, too high and it is strongly associated with captions, making it possible to generate elements of the training source image that are not described in the prompt (e.g., background, etc.).

Let’s look at the comparison images TE only/TE and half Unet: with TE only, the costume elements are close, but the detail is not faithful; with Unet and half TE, the costume is closer to the training source image.

This test gives some idea of the strength of LoRA’s Unet learning rate and Text Encoder learning rate.

Network Rank (Dimension) and Network Alpha

Network Rank (Dimension)

Network Rank (Dimension) specifies the amount of information for Unet and TE. Generally, the larger the value, the better, but if the value is too large, unnecessary information will be learned. For AI illustrations, use around 8/16/32/64.

Network Alpha

It acts like a brake to prevent underfitting. Generally, a value equal to or half the Network Rank (Dimension) is used. If this value is too high, the trained poses and facial expressions may be fixed when generating AI illustrations.

With the above in mind, let’s look at the comparative images.

Comparison of Network Rank (Dimension) and Network Alpha

The top row is trained with ranks 8/16/32/64 and all alphas are set to 1. The higher the rank, the more detail is shown. However, since alpha does not work, the fidelity is lost, as is the freedom of expression.

The middle row is applied with alpha at half the value of the rank. Here too, the higher the rank, the higher the fidelity.

The bottom row has the same value of rank and alpha applied. It looks almost the same as the middle row, but the fidelity is slightly increased.

Tensorbord and Loss

As a guide to training, we can look at the Loss of Tensorbord to see the degree of training. First, let’s look at the Tensorbord for the default and “Animefull-final-pruned” learning.

Tensorbord of default and “Animefull-final-pruned” training — Sky-blue: stable-diffusion-v1-5 Pink: Animefull-final-pruned

Basically, the steps increase from left to right. A standard graph starts with a high loss value in the beginning, converges toward the middle of the period, and then goes up again a little in the latter period. Let’s take a look at the graph of loss/epoch, which is easy to understand the shape of the graph.

TenorBord's Epoch/Loss値 — Sky-blue: stable-diffusion-v1-5 Pink: Animefull-final-pruned

The Loss value drops rapidly from epoch 1 to 2 and then settles down in epoch 3. Now, let’s add to this graph a graph trained up to 10,000 training steps with Animefull-final-pruned and compare the results.

The graph of Epoch/Loss is enlarged because it is the easiest to understand. You can see that Loss rises sharply at epoc5, which is a sign of overfitting. There are also very large increases and decreases in Loss after that. This kind of graph has a high possibility of overfitting.

Tensorbord’s Loss should be considered as a rough guide only. Also, a lower Loss does not necessarily mean better. It is recommended that you only check the progress of the training to see if it is not in an unusual shape.

Other Recommended Parameters

Scale weight norms

Scale weight norms averages the weights of the trained LoRAs so that they do not have too much influence when used with other LoRAs. The comparison image below compares an image trained with the default settings for the Animefull-final-pruned model with a model with Scale weight norms set to 1.

In the comparison image, the generated results have not changed much, but the fidelity is slightly higher when multiple LoRAs are applied. The next comparison image shows “🔗flat2” adapted at 0.85. Also, to make it easier to see the effect, DCAI’s LoRA is applied at 1.2.

Comparison when multiple LORAs are applied

CrossAttention

Specify the cross-attention. If your GPU is NVIDIA 30X0/40X0 series, sdpa will perform better. The comparison image below compares xformers and sdpa.

The comparison resulted in a slightly faster trained time and higher fidelity.

Average Loss comparison when using CrossAttention-sdpa — Pink: xformers Yellow: sdpa

Shuffle caption/Keep n tokens

While keeping the caption for the first n tokens, as specified in Keep n tokens. Randomly reorder subsequent tags to train.

To set up, enter the instance tag and class tag at the A1111 WebUI prompts, and check the number of tokens at the counter in the upper right corner.

In the Kohya ss GUI, under Parameters > Advanced, enter the number of tokens you just checked in the “Keep n tokens” field. Then, check the “Shuffle caption” checkbox to complete the setup.

The comparison images show little change, however, comparing TenorBord’s average Loss, we can see that Loss has gone down.

Average Loss comparison when using Shuffle caption — Pink: Without Shuffle Green: With Shuffle

Min SNR gamma

Smoothes average Loss and stabilizes learning. The paper recommends a value of 5. Comparison images are trained with Min SNR gamma of 5.

The results are almost the same, but with a slight increase in fidelity. Average Loss was almost the same but slightly increased.

Average Loss comparison when using Min SNR gamma — Pink: Without Min SNR gamma Sky-blue: Min SNR gamma 5

Change parameters to train a high quality LoRA

Use the techniques I have described so far to set up your LoRA training.

This setting is focused on the fidelity of the training source image. Generating only the trained character is fine, but when multiple characters are generated, the character elements will affect the other characters (LoRA bleeding). To correct this, it is necessary to lower the weight of the applied LoRA, implement a regularized image, or adjust the attention block, which will not be explained in this article.

Training Parameters

Now let’s look at the parameters. The input and changed parts are noted in red text.

Pretrained model name or path: anyorangemixAnything_mint [Default: runwayml/stable-diffusion-v1-5]
Trained Model output name: DCAI_Girl_SD15_V1
Instance prompt: dcai-girl
Class prompt: 1girl
Repeats: 5 [Default: 40]
Presets: none
LoRA type: Standard
Train batch size: 1
Epoch: 4 [Default: 1] *To adjust total steps in Epoch
Max train epoch: 0
Max train steps: 0 [Default: 1600] *To adjust total steps in Epoch
Save every N epochs: 1
Seed: 123 [Default: 0]
LR Scheduler: cosine
Optimizer: AdamW [Default: AdamW8bit] *If you are low on VRAM, use AdamW8bit.
Learning rate: 0.0001 (1e-4)
Text Encoder learning rate: 0.00005 (5e-5) [Default: 0.0001 (1e-4)] *Changed to the recommended defaults in the official documentation.
Unet learning rate: 0.0001 (1e-4)
LR warmup (% of total steps): 5 [Default: 10]
Network Rank (Dimension): 32 [Default: 8]
Network Alpha: 32 [Default: 1] *To increase fidelity
Scale weight norms: 1 [Default: 0]
Keep n tokens: 8 [Default: 0]
clip_skip: 2 [Default: 1]
Shuffle caption: true [Default: false]
CrossAttention: sdpa [Default: xformers]
Min SNR gamma: 5 [Default: 0]

Let’s run the training with the above settings.

Training Results

Using the trained model, the generated image looks like this.

LoRA for SD1.5 series models is more versatile than that for SDXL series models, so LoRA trained with “Animefull-final-pruned” can be used for other models without any problems. As shown in the figure below, characters that capture features can be generated even for illustration-based models and realistic models such as dreamshaper8. For reference, we have also included a VRoid Studio captured image without style conversion, trained with the same training settings (the number of repeats is set to 10 because the image is half) and generated with the same generation settings as well.

Sample of style conversion from captured image

Checkpoint model comparison of LoRA using VRoid Studio dataset — Without Style Transformations

Comparing the two comparison images, the later one has a stronger VRoid Studio illustration style. It is recommended that you do not convert the style if you want the illustration style to be trained.

At the end, I have included a TenseBord graph of the final results.

TenseBord graph of final results — Sky-blue: stable-diffusion-v1-5 Pink: Animefull-final-pruned Yellow: Final training

The LoRA of the final results is available on Civitai for those interested to download.

DCAI Girl | Civit AI

https://civitai.com/models/1464028

🔗External Link

Conclusion

In this article, I explained how to train an original character LoRA for the SD1.5 model. Together with the previous dataset production, we were able to train a versatile character LoRA from a training source image with a CG-like appearance. Using this method, it would be possible to eliminate the illustration style of the training source image and create an original character LoRA that utilizes the original illustration style of the checkpoint model.

In the next article, I will explain SDXL’s character LoRAs by lineage, since SDXL has two text encoders and it is difficult to make LoRAs as versatile as SD1.5’s.

Image of Corsair Vengeance i7400 Series Gaming PC - Liquid Cooled Intel® Core™ i9 12900K CPU - NVIDIA® GeForce RTX™ RTX 4090 GPU - 2TB M.2 SSD - 64GB Vengeance RGB DDR5 Memory - Black

Corsair Vengeance i7400 Series Gaming PC - Liquid Cooled Intel® Core™ i9 12900K CPU - NVIDIA® GeForce RTX™ RTX 4090 GPU - 2TB M.2 SSD - 64GB Vengeance RGB DDR5 Memory - Black

Shop at Amazon-Usa

🔗Amazon-Usa Link

Image of Thermaltake LCGS Reactor i7TS Gaming Desktop (Intel Core™ i7-14700KF, 32GB 5600MT/s DDR5 RGB Memory, NVIDIA GeForce® RTX 4070 Ti Super, 2TB NVMe M.2, WiFi, Windows 11)

Thermaltake LCGS Reactor i7TS Gaming Desktop (Intel Core™ i7-14700KF, 32GB 5600MT/s DDR5 RGB Memory, NVIDIA GeForce® RTX 4070 Ti Super, 2TB NVMe M.2, WiFi, Windows 11)

Shop at Amazon-Usa

🔗Amazon-Usa Link

Category:📂 Advanced

Tags:🏷️ Kohya SS GUI 🏷️ LoRA 🏷️ sd1.5

Supported by