在 Scnet 上微调 Stable Diffusion 3 模型

1 AI 算力反馈

1.1 运行的商品名称

我运行的商品为 Stable Diffusion 3 文本到图像高质量生成AI绘画推理服务，运行的环境为异构加速卡AI 64G。具体创建流程如下：

1.1.1 购买模型服务

首先购买一下模型服务，这样我们就不需要再从 Hugging Face 下载预训练模型了
在这里插入图片描述

1.1.2 选择合适的开发机器

点击模型开发并配置开发环境（注意开发环境需要同步），这里选择的是异构计算单卡 64G + pytorch dtk24.04.1 的开发环境。
在这里插入图片描述

1.1.3 打开开发环境

等待 Notebook 环境准备完成后点击 JupyterLab 来打开开发环境
在这里插入图片描述

1.1.4 创建一个新的项目文件

创建并新建一个 ipynb 文件（下面的步骤均在该环境下开发），注意，这里的Python版本需要是3.10。
在这里插入图片描述

1.2 运行的过程记录

在开始项目之前，我们先定义项目的工作目录。

work_path="/root/private_data/apprepo/model/20240729095814/stable-diffusion-3-medium-diffusers-2407251517"

1.2.1 安装 diffusers

Hugging Face 的 diffusers 仓库是一个用于处理和应用扩散模型的开源库，在这次的任务中，我们使用 diffusers 仓库来进行微调，在开始微调之前，我们需要安装 diffusers。执行以下代码来保证不会因为 Github Repo 的体积太大导致无法正常的下
载。

# 保证 Github 仓库在克隆时不会出现太大而无法下载的错误
!git config --global http.sslVerify "false"
!git config --global http.postBuffer 1048576000
!git config --global core.compression -1
!git config --global http.lowSpeedLimit 0 
!git config --global http.lowSpeedTime 999999

执行以下代码来保证 git 工具把你的凭证保存到本地，方便二次开发。

# 配置 git 保存你的凭证
!git config --global credential.helper store

执行以下代码来下载并安装 diffusers 仓库，需要注意的是，由于网络原因，服务器在下载 diffusers 仓库时可能会出现问题，建议只下载深度为1的仓库。

%cd $work_path
# 克隆 Hugging Face 的 diffusers 仓库
!rm -rf diffusers
!git clone https://github.com/huggingface/diffusers --depth 1
# 进入 diffusers 目录
%cd diffusers
# 以可编辑模式安装 diffusers 库
!pip install -e .
# 配置 Accelerate 库的默认设置
!accelerate config default

1.2.2 安装微调所需要的依赖

如果你的目标是微调模型，你可能还需要安装微调所需要的依赖项。需要注意的是，当我们使用的是国产推理卡，运行 runtime 为 tk框架，需要安装特定版本的 torch。但是在安装 torchvision 这个包时，系统会自动安装 cuda 版本的torch，因此我们需要屏蔽掉这个安装包再执行安装命令。

%cd $work_path/diffusers/examples/dreambooth
# 安装 requirements_sd3.txt 中的依赖
!sed -i 's/torchvision/#torchvision/' requirements_sd3.txt
!pip install -r requirements_sd3.txt -i https://pypi.mirrors.ustc.edu.cn/simple/

接着手动安装 torchvision （以不安装依赖的方式），我这里的 torch 版本为 2.1.0，对应的torchvision 版本为 0.16，更详细的版本对应信息可以参考下表或参考
torchvision Repo。

torch	torchvision	Python
main / nightly	main / nightly	>=3.8, <=3.12
2.4	0.19	>=3.8, <=3.12
2.3	0.18	>=3.8, <=3.12
2.2	0.17	>=3.8, <=3.11
2.1	0.16	>=3.8, <=3.11
2.0	0.15	>=3.8, <=3.11

!pip install torchvision==0.16 -i https://pypi.mirrors.ustc.edu.cn/simple/ --no-deps

1.2.3 微调你的 Stable Diffusion 模型

这里使用 Python 的 heredoc 语法执行 Python 代码块，利用 Hugging Face 下载所需要的数据集。

%cd $work_path/diffusers/examples/dreambooth# 下载所需要的数据集。
from huggingface_hub import snapshot_download
# 设置数据集安装路径
local_dir = "./dataset/dog"
# 从 Hugging Face Hub 下载 "diffusers/dog-example" 数据集到本地目录
snapshot_download("diffusers/dog-example",local_dir=local_dir,repo_type="dataset",ignore_patterns=".gitattributes",
)# 这里必须删除掉没用的 .huggingface 目录
!rm -rf ./dataset/dog/.huggingface

如果你希望使用 Hugging Face 官方的预训练模型，请前往 Stable Diffusion 3 on Hugging Face 并登陆你的 Hugging Face ID 并签署协议，保证你不会用于商用。在签署完协议后，你需要手动在终端执行以下代码:

# 配置 token（输入以下命令 --> 输入你的Token --> 回车 --> 输入Y --> 回车）
huggingface-cli login

如果你使用的是 SCNet 提供的预训练模型，你不需要做任何操作，但也请确保你没有用于商用。

# 进入 diffusers/examples/dreambooth 目录
%cd $work_path/diffusers/examples/dreambooth# 使用 Accelerate 启动 train_dreambooth_lora_sd3.py 训练脚本
!accelerate launch train_dreambooth_lora_sd3.py \--pretrained_model_name_or_path=$work_path/stabilityai/stable-diffusion-3-medium-diffusers  \--instance_data_dir="./dataset/dog" \--output_dir="trained-sd3-lora" \--mixed_precision="fp16" \--instance_prompt="a photo of sks dog" \--resolution=512 \--train_batch_size=1 \--gradient_accumulation_steps=4 \--learning_rate=1e-5 \--lr_scheduler="constant" \--lr_warmup_steps=0 \--max_train_steps=500 \--validation_prompt="A photo of sks dog in a bucket" \--validation_epochs=25 \--seed="0"# 注意，如果你使用的是官方的预训练模型且没有成功签署协议，在训练代码时你可能会出现如下错误:
# OSError: Can't load tokenizer for 'stabilityai/stable-diffusion-3-medium'. 
# If you were trying to load it from 'https://huggingface.co/models', 
# make sure you don't have a local directory with the same name. 
# Otherwise, make sure 'stabilityai/stable-diffusion-3-medium' is the correct 
# path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

训练结束后可能会报错 expected scalar type Float but found Half 这是 Hugging Face 的Bug，可以忽略它，我们的训练是没有问题的。

1.2.4 使用微调后的模型执行推理

接下来进行推理，注意这里必须使用 with torch.autocast("cuda") 再进行推理，否则会报和训练时一样的错误。

%cd $work_path/diffusers/examples/dreambooth/trained-sd3-lorafrom diffusers import StableDiffusion3Pipeline
import torchmodel_path = "./checkpoint-500"
pipe = StableDiffusion3Pipeline.from_pretrained(f"{work_path}/stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe.load_lora_weights(model_path)
pipe.to("cuda")prompt = "A photo of sks dog in a bucket."
with torch.autocast("cuda"):image = pipe(prompt).images[0]image.save("output.png")

推理后的输出结果如下，可以看到生成的图片质量还是可以的
在这里插入图片描述