How to create an original character LoRA [SD1.5 Training] SD1.5 Character Training
![How to create an original character LoRA [SD1.5 Training] SD1.5 Character Training featured Image](/_next/image?url=https%3A%2F%2Fdca.data-hub-center.com%2Fcontent%2Fuploads%2F2025%2F04%2Feye_catch_original-character-lora-sd15-character-training-en.jpg&w=3840&q=80)
This article details the training of SD1.5 using the Kohya ss GUI. At the time of writing this article, the SD1.5 model is not the primary model, but training SD1.5 takes a short time and allows you to experiment with a variety of training parameters. It is not exactly the same characteristics as SDXL (Illustrious Pony), which is currently the main model in Illustration AI, but it is a good introduction. if you do not know how to install Kohya ss GUI or how to create datasets, I recommend you to read the article below first.


About the LoRA Process
In understanding LoRA learning, it is important to understand the learning process. To briefly describe the process, data is first entered into the base model using input images and caption tags. The output of the model is compared to the original image and an error (Loss) is calculated. Based on this Loss, the gradient is calculated and the parameters to be trained in LoRA are updated. This process is repeated as a single training step, and the trained parameters become the LoRA model. LoRA is characterized by the fact that the parameters of the training base model are not changed directly, but are learned through an adapter.

Learning with Kohya ss GUI default values
Dataset
First, let’s train with default parameters. The dataset will be trained source from the data created in the previous article. If you want to train on the same dataset, it is available on Patreon, but only paid supporters can download it.


Default Parameters
Once the dataset is ready, it is trained with parameters like the following, with slightly modified values for trained illustrations of the SD1.5 model. Areas that need to be entered or changed are noted in red text.
- Pretrained model name or path: runwayml/stable-diffusion-v1-5
- Trained Model output name: DCAI_Girl_SD15_def *Output name of the model
- Instance prompt: dcai-girl *The caption method used in this case ignores this value, but if you do not enter it, an error will occur.
- Class prompt: 1girl *Entered for the same reason as above.
- Repeats: 5 [Default: 40] *This is because the training source has 100 images and we want to make the total number of images 500.
- Presets: none
- LoRA type: Standard
- Train batch size: 1
- Epoch: 1
- Max train epoch: 0
- Max train steps: 1600
- Save every N epochs: 1
- Seed: 123 [Default: 0 = random] *Insert the appropriate number to control the parameter.
- LR Scheduler: cosine
- Optimizer: AdamW8bit
- Learning rate: 0.0001 (1e-4)
- Text Encoder learning rate: 0.00005 (5e-5) [Default: 0.0001 (1e-4)] *Changed to the recommended defaults in the official documentation.
- Unet learning rate: 0.0001 (1e-4)
- LR warmup (% of total steps): 10
- Network Rank (Dimension): 8
- Network Alpha: 1
- clip_skip: 2 [Default: 1] *Recommended values for SD1.5 Illustration AI
About the default Pretrained model
The default model, runwayml/stable-diffusion-v1-5
, is the basis of the SD1.5 system, so it is versatile, but it is not suitable for learning illustration systems. If you are new to LoRA learning, try learning with this model to get a basic understanding of LoRA.
If the Pretrained model is set to the default runwayml/stable-diffusion-v1-5
, the following Diffusers pretrained model is automatically loaded.
- vae: diffusion_pytorch_model.safetensors/335Mb
- text_encoder: model.safetensors/492Mb
- unet: diffusion_pytorch_model.safetensors/3.44Gb
The Runway model has now been removed. The version at the time of writing is automatically downloaded from benjamin-paine’s repository.

About Performance Monitoring During Training
While training in Kohya ss GUI, check the performance in the Task Manager in Windows: the SD1.5 model is lightweight compared to SDXL, so you don’t have to worry too much about it, but if you run out of dedicated GPU memory (VRAM), you can use the shared GPU memory is used. This condition will slow down the learning speed considerably, so it is recommended to change to a lighter-load setting.

Test generation using trained LoRA
The trained LoRA was used with the A1111 WebUI to generate the result, as shown in the figure below. The “AnyOrangeMix” was used to generate the image.

The generated results show only little fidelity, but we can see that the costumes are affected when compared to the results before LoRA was applied in the figure below. The face was hardly affected.

Test generation parameters
cinematic lighting, upper body,
(dcai-girl, 1girl, :1.1),solo, short hair, orange hair, brown eyes, animal ears,
dress, blue dress, long sleeves, black bowtie,
(skirt, black skirt, belt, brown footwear, white thighhighs, thigh strap,:0.8)
masterpiece, meadow, sky
<lora:DCAI_Girl_SD15_def:1.0>
(easynegative:1.0),(worst quality,low quality:1.2),(bad anatomy:1.4),(realistic:1.1),nose,lips,adult,fat,sad, (inaccurate limb:1.2),extra digit,fewer digits,six fingers,(monochrome:0.95)
Checkpoint Model: anyorangemixAnything_mint
Sampler: DPM++ SDE Karras
Steps: 20
CFG scale: 6
Seed: 3547528961
With: 768
Height: 512
Clip skip: 2
Textual Inversion: easynegative
ADetailer: on
Hires upscaler: 4x-UltraSharp
How to check the metadata of a trained LoRA model
You can use DCAI’s 🔗LoRA Inspector🧐 to view the LoRA metadata for the trained model. When you have trained a large number of LoRAs for validation, it is hard to remember all the training parameters, so use the LoRA inspector🧐 to see the training parameters.
Training with default values and animefullFinalPruned model
Next, let’s change the Pretrained model to the animated base model “Animefull-final-pruned”.
Download the following models to your \kohya_ss\models
and select the model from the 📄 button under “Pre-trained model name or path” in the Kohya ss GUI.

The parameters are the same as before, only the pretrained model has been changed and the trained results are shown below.

The fidelity has increased from the first test generation. Furthermore, we can see the effect on the face as well as the costume. I made a weighting comparison to make it easier to see the effect of each.

The difference in LoRA’s Pre-trained model alone makes such a difference in quality.
Checkpoint Model: anyorangemixAnything_mint
Sampler: DPM++ SDE Karras
Steps: 20
CFG scale: 6
Seed: 3547528961
With: 512
Height: 512
Clip skip: 2
Textual Inversion: easynegative
ADetailer: off
The Loss value is slightly lower than the average Loss value of TenorBord.

Methods to improve the quality of LoRA
It is difficult to define what constitutes a high-quality LoRA, but a LoRA that can generate images that are faithful to its purpose can be considered high-quality. In order to achieve high quality LoRA, it is important to clarify the purpose of LoRA. Let’s take a look at some typical LoRA objectives.
- Character
- Costume
- Image style
- Concept
- Poses
- Background
- Vehicle
The training source image, caption, and training parameters will vary depending on these objectives. In this case, I wanted to train the LoRA of a character, so I will focus on the features and costumes of the person. Depending on this emphasis, the data set and training parameters (number of training source images, total number of training steps, learning rate, and network rank) must be set in a well-balanced manner.
Look again at the LoRA weight comparison image from earlier.。

The weight 1.0 image to the right of the top row shows a slight reflection of the costume’s features, but it is a big departure from the training source image. In the lower row, the costumes are getting closer, but they still only resemble each other. However, we can see that LoRA is working better than in the upper row.
Underfitting and Overfitting
Two indicators of LoRA quality are underfitting and overfitting. Simply put, a LoRA that is less effective is called underfitting, while a LoRA that is too strongly effective is called overfitting. Usually these symptoms are related to the degree of learning of the Unet. Take a look at the following comparative images.

These are comparisons of LoRAs saved per epoch when trained with epoch set to 20. 500 images per epoch were trained.
Underfitting
The leftmost <lora:DCAI_Girl_SD15_afFP-000002:1>
in the comparison chart is the LoRA of the 1000th training step. The features of the training source image have been obtained, but they are not accurate enough for the purpose of LoRA training this time.
Overfitting
Conversely, the 10,000th step, <lora:DCAI_Girl_SD15_afFP:1>
, accurately generates a reproduction of the character and costume. However, the gray background of the training source image may be the reason why the prompt meadows are no longer recognized, and only simple backgrounds can be generated. This situation is considered overfitting for the purpose of this LoRA training. Signs of overfitting include the collapse of the face and hand shapes, contrast, and blurring of colors.

One way to identify underfitting or overfitting is to adjust the LoRA weights.

Swing the weight in the plus/minus direction to check the generated result. For this model, the appropriate value is around 1.0. In the case of LoRA with underfitting or overfitting, this appropriate value will be significantly off. (Example of underfitting: appropriate at a weight of 1.5, etc.; example of overfitting: appropriate at a weight of 0.5, etc.)
Unet learning rate and Text Encoder learning rate
The learning rates Unet learning rate and Text Encoder learning rate are important parameters among LoRA’s learning parameters. Beginners should use the default values. If the results are not good, adjust them in step/epoch. Now, let’s look at the comparison images.

In A1111 WebUI, you can apply weights to each Unet and TextEncoder in the format <lora:loraName:Unet:TE>
. From left to right, [Apply all / Unet only / Unet and half TE / TE only / TE and half Unet]. It is difficult to judge many things from this comparison image alone, but let me explain about Unet and TE.
Unet
Unet trains the relationship between the elements of the study image and their position in the structure.
Unet learning rate is the value that has the greatest impact on under and overfitting. If it is too low, underfitting occurs and the fidelity of the learning elements is reduced when LoRA is used. If it is too high, overfitting may occur, and the illustration style of the prompt and training source images may come out strongly.
Let’s look at the comparison images Unet only/Unet and half TE: with Unet only, the costume is no longer the costume in the training source image; with Unet and half TE, we can see that the accuracy of the costume has increased.
Text Encoder
The Text Encoder controls how the AI interprets the prompts during generation.
If the Text Encoder learning rate is too low, the prompt will not work well when using LoRA. Conversely, too high and it is strongly associated with captions, making it possible to generate elements of the training source image that are not described in the prompt (e.g., background, etc.).
Let’s look at the comparison images TE only/TE and half Unet: with TE only, the costume elements are close, but the detail is not faithful; with Unet and half TE, the costume is closer to the training source image.
This test gives some idea of the strength of LoRA’s Unet learning rate and Text Encoder learning rate.
Network Rank (Dimension) and Network Alpha
Network Rank (Dimension)
Network Rank (Dimension) specifies the amount of information for Unet and TE. Generally, the larger the value, the better, but if the value is too large, unnecessary information will be learned. For AI illustrations, use around 8/16/32/64.
Network Alpha
It acts like a brake to prevent underfitting. Generally, a value equal to or half the Network Rank (Dimension) is used. If this value is too high, the trained poses and facial expressions may be fixed when generating AI illustrations.
With the above in mind, let’s look at the comparative images.

The top row is trained with ranks 8/16/32/64 and all alphas are set to 1. The higher the rank, the more detail is shown. However, since alpha does not work, the fidelity is lost, as is the freedom of expression.
The middle row is applied with alpha at half the value of the rank. Here too, the higher the rank, the higher the fidelity.
The bottom row has the same value of rank and alpha applied. It looks almost the same as the middle row, but the fidelity is slightly increased.
Tensorbord and Loss
As a guide to training, we can look at the Loss of Tensorbord to see the degree of training. First, let’s look at the Tensorbord for the default and “Animefull-final-pruned” learning.

Basically, the steps increase from left to right. A standard graph starts with a high loss value in the beginning, converges toward the middle of the period, and then goes up again a little in the latter period. Let’s take a look at the graph of loss/epoch, which is easy to understand the shape of the graph.

The Loss value drops rapidly from epoch 1 to 2 and then settles down in epoch 3. Now, let’s add to this graph a graph trained up to 10,000 training steps with Animefull-final-pruned and compare the results.


The graph of Epoch/Loss is enlarged because it is the easiest to understand. You can see that Loss rises sharply at epoc5, which is a sign of overfitting. There are also very large increases and decreases in Loss after that. This kind of graph has a high possibility of overfitting.
Tensorbord’s Loss should be considered as a rough guide only. Also, a lower Loss does not necessarily mean better. It is recommended that you only check the progress of the training to see if it is not in an unusual shape.
Other Recommended Parameters
Scale weight norms
Scale weight norms averages the weights of the trained LoRAs so that they do not have too much influence when used with other LoRAs. The comparison image below compares an image trained with the default settings for the Animefull-final-pruned model with a model with Scale weight norms set to 1
.

In the comparison image, the generated results have not changed much, but the fidelity is slightly higher when multiple LoRAs are applied. The next comparison image shows “🔗flat2” adapted at 0.85
. Also, to make it easier to see the effect, DCAI’s LoRA is applied at 1.2
.

CrossAttention
Specify the cross-attention. If your GPU is NVIDIA 30X0/40X0 series, sdpa will perform better. The comparison image below compares xformers and sdpa.

The comparison resulted in a slightly faster trained time and higher fidelity.

Shuffle caption/Keep n tokens
While keeping the caption for the first n tokens, as specified in Keep n tokens. Randomly reorder subsequent tags to train.
To set up, enter the instance tag and class tag at the A1111 WebUI prompts, and check the number of tokens at the counter in the upper right corner.
In the Kohya ss GUI, under Parameters > Advanced, enter the number of tokens you just checked in the “Keep n tokens” field. Then, check the “Shuffle caption” checkbox to complete the setup.

The comparison images show little change, however, comparing TenorBord’s average Loss, we can see that Loss has gone down.

Min SNR gamma
Smoothes average Loss and stabilizes learning. The paper recommends a value of 5. Comparison images are trained with Min SNR gamma of 5
.

The results are almost the same, but with a slight increase in fidelity. Average Loss was almost the same but slightly increased.

Change parameters to train a high quality LoRA
Use the techniques I have described so far to set up your LoRA training.
This setting is focused on the fidelity of the training source image. Generating only the trained character is fine, but when multiple characters are generated, the character elements will affect the other characters (LoRA bleeding). To correct this, it is necessary to lower the weight of the applied LoRA, implement a regularized image, or adjust the attention block, which will not be explained in this article.
Training Parameters
Now let’s look at the parameters. The input and changed parts are noted in red text.
- Pretrained model name or path: anyorangemixAnything_mint [Default: runwayml/stable-diffusion-v1-5]
- Trained Model output name: DCAI_Girl_SD15_V1
- Instance prompt: dcai-girl
- Class prompt: 1girl
- Repeats: 5 [Default: 40]
- Presets: none
- LoRA type: Standard
- Train batch size: 1
- Epoch: 4 [Default: 1] *To adjust total steps in Epoch
- Max train epoch: 0
- Max train steps: 0 [Default: 1600] *To adjust total steps in Epoch
- Save every N epochs: 1
- Seed: 123 [Default: 0]
- LR Scheduler: cosine
- Optimizer: AdamW [Default: AdamW8bit] *If you are low on VRAM, use AdamW8bit.
- Learning rate: 0.0001 (1e-4)
- Text Encoder learning rate: 0.00005 (5e-5) [Default: 0.0001 (1e-4)] *Changed to the recommended defaults in the official documentation.
- Unet learning rate: 0.0001 (1e-4)
- LR warmup (% of total steps): 5 [Default: 10]
- Network Rank (Dimension): 32 [Default: 8]
- Network Alpha: 32 [Default: 1] *To increase fidelity
- Scale weight norms: 1 [Default: 0]
- Keep n tokens: 8 [Default: 0]
- clip_skip: 2 [Default: 1]
- Shuffle caption: true [Default: false]
- CrossAttention: sdpa [Default: xformers]
- Min SNR gamma: 5 [Default: 0]
Let’s run the training with the above settings.
Training Results
Using the trained model, the generated image looks like this.

LoRA for SD1.5 series models is more versatile than that for SDXL series models, so LoRA trained with “Animefull-final-pruned” can be used for other models without any problems. As shown in the figure below, characters that capture features can be generated even for illustration-based models and realistic models such as dreamshaper8. For reference, we have also included a VRoid Studio captured image without style conversion, trained with the same training settings (the number of repeats is set to 10 because the image is half) and generated with the same generation settings as well.



Comparing the two comparison images, the later one has a stronger VRoid Studio illustration style. It is recommended that you do not convert the style if you want the illustration style to be trained.
At the end, I have included a TenseBord graph of the final results.

The LoRA of the final results is available on Civitai for those interested to download.

Conclusion
In this article, I explained how to train an original character LoRA for the SD1.5 model. Together with the previous dataset production, we were able to train a versatile character LoRA from a training source image with a CG-like appearance. Using this method, it would be possible to eliminate the illustration style of the training source image and create an original character LoRA that utilizes the original illustration style of the checkpoint model.
In the next article, I will explain SDXL’s character LoRAs by lineage, since SDXL has two text encoders and it is difficult to make LoRAs as versatile as SD1.5’s.

