Performance Metrics in Evaluating Stable Diffusion Models

1.Performance Metrics in Evaluating Stable Diffusion Models

笔记来源:
1.Performance Metrics in Evaluating Stable Diffusion Models
2.Denoising Diffusion Probabilistic Models
3.A simple explanation of the Inception Score
4.What is the inception score (IS)?
5.Kullback–Leibler divergence
6.Inception Score (IS) 与 Fréchet Inception Distance (FID)
7.Fréchet inception distance
8.Using CLIP Score to evaluated images

下图引用自:Wikipedia

1.1 Inception Score (IS): Evaluating Realism Through Classification


IS takes a unique approach by assessing the likelihood of a generated image being classified as accurate by a pre-trained image classifier.

Higher IS scores reflect greater realism and logic in generated images. Also, it shows the model’s proficiency in capturing real image essence.

Prerequisites
(1)Pre-trained Inception v3 Network: This model is used to classify the generated images.
(2)Generated Images: A diverse set of images generated by the Stable Diffusion model based on various text prompts.

Steps to Calculate Inception Score
(1)Generate Images
Use the Stable Diffusion model to generate a large number of images from diverse text prompts. The more diverse the text prompts, the better the evaluation.
(2)Preprocess Images
Ensure that the images are correctly sized (typically 299x299 pixels) and normalized to the format expected by the Inception v3 network.
(3)Pass Images Through Inception v3
Feed each generated image into the Inception v3 network to obtain the predicted label distributions p ( y ∣ x ) p(y|x) p(yx). This provides a probability distribution over classes for each image.(x:image,y:label)

(4)Compute Marginal Distribution
Calculate the marginal distribution p ( y ) p(y) p(y) over all generated images.

(5)Calculate KL Divergence
Compute the Kullback-Leibler (KL) divergence between the conditional distribution p ( y ∣ x ) p(y|x) p(yx) for each generated image and the marginal distribution p ( y ) p(y) p(y) over all generated images. Average the KL divergences across all images.
(The KL divergence is a measure of how similar/different two probability distributions are.)

下图引用自:Kullback–Leibler divergence

KL散度衡量两个概率分布之间的差异程度,通过计算KL散度值,我们可以了解两个概率分布到底有多相似
两个概率分布的差异程度越大,则KL散度值越大
两个概率分布的差异程度小,则KL散度值越小
两个概率分布相同,则KL散度值为0


以下是 KL 散度如何根据我们的两个分布而变化:

(6)Exponentiation

The Inception Score is the exponentiation of the average KL divergence.
To get the final score, we take the exponential of the KL divergence (to make the score grow to bigger numbers to make it easier to see it improve) and finally take the average of this for all of our images. The result is the Inception score!

计算过程梳理:
(1)通过inception v3网络求出每一张生成图片的概率分布 p ( y ∣ x ) p(y|x) p(yx)
(2)求出所有生成图片的概率分布 p ( y ) p(y) p(y)
(3)计算每一张生成图像概率分布 p ( y ∣ x ) p(y|x) p(yx)和所有生成图片之间概率分布 p ( y ) p(y) p(y)的KL散度,这里我们得到多个KL散度值
(4)我们将上述多个KL散度值求和后平均
(5)将(4)中值进行指数运算得到最终Inception Score

KL散度值大代表着单个生成图片的具有较高质量且易区分(被分类器区分)
IS值大代表生成的图片不仅多样性大而且具有较高质量

对生成图片计算IS,代码引用自:python实现Inception Score代码(读取自己生成的图片) 也可参考:sbarratt /inception-score-pytorch

import torch
from torch import nn
from torch.nn import functional as F
import numpy as np
from torchvision.models.inception import inception_v3
from PIL import Image
import os
from scipy.stats import entropy
import argparse
from tqdm import tqdm'''
(1)Generate Images
(2)Preprocess ImagesEnsure that the images are correctly sized (typically 299x299 pixels) and normalized to the format expected by the Inception v3 network.
(3)Compute predicted label distributions p(y|x)Pass Images Through Inception v3 to obtain the predicted label distributions p(y|x). This provides a probability distribution over classes for each image
(4)Compute Marginal Distribution p(y)Calculate the marginal distribution p(y) over all generated images.
(5)Calculate KL DivergenceD_KL(p(y|x)|p(y))
(6)Average the KL divergences across all images.Expectation(D_KL(p(y|x)|p(y)))
(7)ExponentiationExp(Expectation(D_KL(p(y|x)|p(y)))) equivalent to Expectation(Exp(D_KL(p(y|x)|p(y))))
'''# (1) python Inception_score.py --inupt_image_dir path_of_your_generated_images
# Argument parser setup
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--input_image_dir', type=str, default='./input_images', help='Directory containing input images')
parser.add_argument('--batch_size', type=int, default=1, help='Batch size for processing images')
parser.add_argument('--device', type=str, choices=["cuda:0", "cpu"], default="cuda:0", help='Device for computation')
args = parser.parse_args()# (2) Preprocess images: Normalization
# Inception v3 model preprocessing constants
mean_inception = [0.485, 0.456, 0.406]  # Mean for normalization
std_inception = [0.229, 0.224, 0.225]  # Standard deviation for normalization# image -> array
def imread(filename):"""Loads an image file and converts it into a (height, width, 3) uint8 numpy array.Args:filename (str): Path to the image file.Returns:np.ndarray: Image data in a (height, width, 3) format."""return np.asarray(Image.open(filename), dtype=np.uint8)[..., :3]# calculate IS
def inception_score(batch_size=args.batch_size, resize=True, splits=1):"""Computes the Inception Score for images in the specified directory.Args:batch_size (int): Number of images to process in each batch.resize (bool): Whether to resize images to the input size of the Inception model.splits (int): Number of subsetsReturns:tuple: Maximum Inception Score and average Inception Score."""device = torch.device(args.device)  # Set computation device (CPU or GPU)# Load pre-trained Inception v3 modelinception_model = inception_v3(pretrained=True, transform_input=False).to(device)inception_model.eval()  # Set model to evaluation mode# Ensure that the images are correctly sized (typically 299x299 pixels)# normalized to the format expected by the Inception v3 network.up = nn.Upsample(size=(299, 299), mode='bilinear', align_corners=False).to(device)# calculate p(y|x)# y:label of class, x: an imagedef get_pred(x):"""Computes class probabilities using the Inception model.Args:x (torch.Tensor): Batch of images.Returns:np.ndarray: Class probabilities for each image."""if resize:x = up(x)  # Resize images if neededx = inception_model(x)  # Get model predictionsreturn F.softmax(x, dim=1).data.cpu().numpy()  # Apply softmax and move to CPUprint('Computing predictions using Inception v3 model')files = read_dir()  # Get list of image filesN = len(files)# store p(y|x) of each image# Initialize a numpy array to store predictions for all images# N is num of generated images# 1000 corresponds to the number of output classes in the Inception v3 model# Each row will store the prediction (class probabilities) for one imagepreds = np.zeros((N, 1000))  # Array to store predictions# Adjust batch size if it's larger than the number of imagesif batch_size > N:print('Warning: Batch size is larger than the number of images. Setting batch size to data size.')batch_size = N# Process images in batchesfor i in tqdm(range(0, N, batch_size)):  # Loop over the range of image indices in steps of batch_sizestart = i  # Start index for the current batchend = min(i + batch_size, N)  # End index for the current batch, ensuring it doesn't exceed the number of images# Convert the list of image arrays to a single numpy array# For each file in the current batch, read the image and convert it to a float32 numpy arrayimages = np.array([imread(f).astype(np.float32) for f in files[start:end]])  # Read and convert images to float32# Rearrange the dimensions of the images to (n_images, 3, height, width)# normalize pixel values to [0, 1] rangeimages = images.transpose((0, 3, 1, 2)) / 255# Convert the NumPy array to a PyTorch tensor of type FloatTensor and move it to the specified devicebatch = torch.from_numpy(images).type(torch.FloatTensor).to(device)# Compute class probabilities for the current batch using the Inception model and store the predictions# in the preds array at indices corresponding to the current batchpreds[start:end] = get_pred(batch)  # Store predictions for the current batch# Ensure that the batch size is greater than 0 to avoid invalid batch processingassert batch_size > 0# Ensure that the total number of images is greater than the batch size# to allow for meaningful splitting and processingassert N > batch_size# Compute the Inception Score using KL Divergenceprint('Computing KL Divergence')# The split_scores list gathers the Inception Scores for each subset,# which are then averaged to obtain a final score.split_scores = []  # Initialize an empty list to store scores for each splitfor k in range(splits):part = preds[k * (N // splits): (k + 1) * (N // splits), :]  # Split predictions into equal parts# p(y)py = np.mean(part, axis=0)  # Compute the marginal probability by averaging predictions in the split# Compute the KL Divergence for each image's prediction against the marginal probability# D_KL(p(y|x)|p(y))scores = [entropy(pyx, py) for pyx in part]# Exp(D_KL(p(y|x)|p(y)))split_scores.append(np.exp(scores))  # Convert the KL Divergence scores to exponentials and append to split_scores# mean Expectation(Exp(D_KL(p(y|x)|p(y))))return np.max(split_scores), np.mean(split_scores)  # Return the maximum and average Inception Scoresdef read_dir():"""Recursively reads all image files from the specified directory.Returns:list: List of file paths."""dirPath = args.input_image_dir  # Get the directory path from command-line argumentsallFiles = []  # Initialize an empty list to store file pathsif os.path.isdir(dirPath):  # Check if the specified path is a directory# Walk through the directory treefor root, _, files in os.walk(dirPath):for file in files:# For each file, construct the full path and add it to the listallFiles.append(os.path.join(root, file))else:# Print an error message if the specified path is not a directoryprint('Error: Specified path is not a directory.')return allFiles  # Return the list of file paths# Splitting the Data: The splits parameter allows dividing the predictions into multiple subsets.
# This is helpful for reducing the variance in the final Inception Score.
if __name__ == '__main__':max_is, avg_is = inception_score(splits=1)  # Compute Inception Scoresprint(f'MAX IS: {max_is:.4f}')print(f'Average IS: {avg_is:.4f}')

使用预训练模型生成六张图片,实际计算IS需要大量图片(如50000张),这里仅做测试

IS计算结果如下图,IS值越大说明生成图片的质量越好,多样性越大

IS局限性

(1) If you’re learning to generate something not present in the classifier’s training data (e.g. sharks are not in ILSVRC 2014) then you may always get low IS despite generating high quality images since that image doesn’t get classified as a distinct class

(2) If the classifier network cannot detect features relevant to your concept of image quality, then poor quality images may still get high scores.

1.2 Fréchet inception distance (FID): Assessing Image Distribution Similarity

Differences between IS and FID
Unlike the earlier inception score (IS), which evaluates only the distribution of generated images,
the FID compares the distribution of generated images with the distribution of a set of real images (“ground truth”).

在 Inception V3 的“世界观”下,凡是不像 ImageNet 的数据,都是不真实的,都不能保证输出一个 sharp 的 predition distribution。所以,要想更好地评价生成模型,就要使用更加有效的方法计算真实分布与生成样本之间的距离。FID正是衡量了生成样本与真实世界样本之间的距离。—引用自:Inception Score (IS) 与 Fréchet Inception Distance (FID)

FID

FID stands as a cornerstone metric that measures the distance between the distributions of generated and real images.

Lower FID scores signify a closer match between generated and real-world images. In addition, it shows superior model performance in mimicking real data distributions.

下图引用自:Fréchet inception distance

(1)Generating Images with Prompts
Use your diffusion model to generate images from text prompts.

(2)Extract Features
Pass both the generated images and a set of reference images through a pre-trained Inception network to extract feature vectors. Usually, the Inception v3 model is used for this purpose.

(3)Compute FID Score
Calculate the FID score between the feature distributions of the generated images and the reference images.

代码参考:mseitzer/pytorch-fid,其中主要的两个文件InceptionV3和计算FID Score
可安装后将其作为模块,直接进行计算

pip install pytorch-fid

生成图片作为sample dataset,ImageNet数据集本身作为reference dataset

python -m pytorch_fid path/to/dataset1 path/to/dataset2

1.3 CLIP Score

Text-guided image generation involves the use of models like StableDiffusionPipeline to generate images based on textual prompts. Also, it evaluates them using CLIP scores.

CLIP scores measure the fit between image-caption pairs. Higher scores signify better compatibility between the image and its associated caption.


Practical Implementation
(1)Generating Images with Prompts
StableDiffusionPipeline generates images based on multiple prompts. And it creates a diverse set of images aligned with the given textual cues.

(2)Computing CLIP Scores
After generating images, the CLIP scores are calculated to quantify the compatibility between each image and its corresponding prompt.

(3)Comparative Evaluation

Comparing Different Checkpoints: Generating images using different checkpoints, calculating CLIP scores for each set, and performing a comparative analysis assesses the performance differences between the versions. For example, comparing v1–4 and v1–5 checkpoints revealed improved performance in the latter.

以下网站可直接对图片和其对应文本进行评分:taesiri/CLIPScore

代码参考一:CLIP Score for PyTorch

Install PyTorch

pip install torch  # Choose a version that suits your GPU

Install CLIP

pip install git+https://github.com/openai/CLIP.git

Install clip-score from PyPI

pip install clip-score

Usage

python -m clip_score path/to/image path/to/text

代码参考二:Using CLIP Score to evaluated images

pip install -U torch torchvision
pip install -U git+https://github.com/openai/CLIP.git
import torch
import clip
from PIL import Imagedef get_clip_score(image_path, text):
# Load the pre-trained CLIP model and the image
model, preprocess = clip.load('ViT-B/32')
image = Image.open(image_path)# Preprocess the image and tokenize the textimage_input = preprocess(image).unsqueeze(0)text_input = clip.tokenize([text])# Move the inputs to GPU if availabledevice = "cuda" if torch.cuda.is_available() else "cpu"image_input = image_input.to(device)text_input = text_input.to(device)model = model.to(device)# Generate embeddings for the image and textwith torch.no_grad():image_features = model.encode_image(image_input)text_features = model.encode_text(text_input)# Normalize the featuresimage_features = image_features / image_features.norm(dim=-1, keepdim=True)text_features = text_features / text_features.norm(dim=-1, keepdim=True)# Calculate the cosine similarity to get the CLIP scoreclip_score = torch.matmul(image_features, text_features.T).item()return clip_scoreimage_path = "path/to/your/image.jpg"
text = "your text description"score = get_clip_score(image_path, text)
print(f"CLIP Score: {score}")

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.xdnf.cn/news/1487120.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章

【笔记:3D航路规划算法】一、RRT

目录 关键概念3D路径规划算法1. A*算法2. RRT1. 初始化:2. 实例化搜索算法:3. 路径生成:4. 绘制图像: 3D路径规划是在三维空间中寻找从起点到终点的最短或最优路径的一种技术。它广泛应用于无人机导航、机器人运动规划、虚拟现实等…

【BUG】已解决:ValueError: All arrays must be of the same length

ValueError: All arrays must be of the same length 目录 ValueError: All arrays must be of the same length 【常见模块错误】 【解决方案】 问题原因 解决方法 欢迎来到英杰社区https://bbs.csdn.net/topics/617804998 欢迎来到我的主页,我是博主英杰&…

SolidWorks 二次开发--创建属性页面及控件事件(二)

在前文中我们学习了如何创建和显示属性页面,本章节将重点介绍如何向属性页面中添加控件。控件是属性页面的基本组成部分,可以是文本框、按钮、复选框等,用于用户交互和数据展示。接下来我们将看到如何定义、配置和操作这些控件,让…

边缘设备使用记录--阿加犀AIBox 6490(realsense+yolox部署)

边缘设备使用记录--阿加犀AIBox 6490:realsenseyolox部署 前言Realsense SDK ROSYOLOx部署预处理后处理可视化ROS节点 总结 前言 由于6490这个板子是有type-c接口的,所以这里准备用RealsenseYOLOx来先简单做一个实时的目标检测的东西出来,这里也用到上…

数据开发/数仓工程师上手指南(一)数仓概念总览

前言 笔者毕业最开始从事的就是大数据开发和数据仓库建设工作,途中曾担任过人工智能工程师和计算机视觉工程师,没想到最后兜兜转转还是回到了最原本的工作数据开发工程师。但很少有写关于本职工作的技术内容输出。 之前笔者撰文内容大部分都是关于算法…

echarts里面的option的详细讲解

option4 {// 鼠标移动提示框tooltip: {// 触发类型(item:用于散点图、饼图。axion:用于柱状图、折线图。none:什么都不触发)trigger: axis,// 提示框内排序order: seriesDesc,// 提示框背景颜色backgroundColor: "#FF5800&q…

python easygui库常用方法介绍

msgbox() 弹出对话框 这是最基本的弹出对话框,用于显示简单的消息或提示。例如: import easygui easygui.msgbox("欢迎使用EasyGUI!") buttonbox() 带有多个按钮的对话框 它会显示一个带有多个按钮的对话框,用户点击后返回所选…

“探求新质生产力 推进中国式现代化”学习交流活动在河北廊坊举办

7月21日,一场以“探求新质生产力 推进中国式现代化”为主题的学习交流活动在河北省廊坊市举办,2000余名企业界人士共同探讨企业发展的新路径与新动力。 7月21日,“探求新质生产力 推进中国式现代化”学习交流活动在河北省廊坊市举办。图为活动…

【无人机】测绘行业新时代

【无人机】测绘行业新时代 无人机测绘主要指的是依托无人机系统为主要的信息接收平台,通过无人机机载遥感信息采集和处理设备,将最终所获取的遥感信息传输到测绘中心,经过数据技术处理,形成立体化的数字模型,以满足行…

【C++】学习笔记——哈希_2

文章目录 十八、哈希3. 实现哈希表哈希表的存储节点哈希函数哈希表的定义哈希表的插入哈希表的查找哈希表的删除测试函数完整代码结果 未完待续 十八、哈希 3. 实现哈希表 哈希表的实现方法有蛮多种,这里我们选一个比较经典的开散列法来实现哈希表。由于STL库里的…

免费【2024】springboot北京医疗企业固定资产管理系统的设计与实现

博主介绍:✌CSDN新星计划导师、Java领域优质创作者、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和学生毕业项目实战,高校老师/讲师/同行前辈交流✌ 技术范围:SpringBoot、Vue、SSM、HTML、Jsp、PHP、Nodejs、Python、爬虫、数据可视化…

学术研讨 | 区块链网络体系结构研讨会顺利召开

添加图片注释,不超过 140 字(可选) 近日,国家区块链技术创新中心组织了“区块链网络体系结构研讨会”,会议面向跨域交互多、计算规模大、数据管理复杂、性能与扩展性要求高等特征的区块链网络的体系结构展开交流研讨&…

linux下磁盘分区工具GParted

最近发现安装的redhat机器部分磁盘大小分配不合理 使用gpated对磁盘重新分区 1、使用U盘制作一个启动盘 下载启动盘制作工具Index of /downloads 使用非常简单,选择gparted-live-1.1.0-3-i686.iso包即可 2、制作完成后,重启机器,选择U盘…

【测开能力提升-Javascript】JavaScript运算符流程结构

1. 递增递减运算符 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><title>Title</title><script>// 前置递增运算符var age10age //类似于ageage1&#xff0c; 先加1后返回值alert(age)// 后置…

VUE3学习第二篇:报错记录

1、在我整理好前端代码框架后&#xff0c;而且也启动好了对应的后台服务&#xff0c;访问页面&#xff0c;正常。 2、报错ReferenceError: defineModel is not defined 学到这里报错了 在vue网站的演练场&#xff0c;使用没问题 但是在我自己的代码里就出问题了 3、watchEffec…

JAVA.4.继承

1.特点 java只支持单继承&#xff0c;一个儿子继承一个父亲 但可以多层继承&#xff0c;a继承b&#xff0c;b继承c b是a的直接父类&#xff0c;c是a的间接父类 每个类都直接或者简介继承Object&#xff0c;不写继承就默认继承它 2.注意事项 构造方法 父类的构造方法&#…

Java实现七大排序(二)

一.交换排序 1.冒泡排序 这个太经典了&#xff0c;每个学编程都绕不开的。原理跟选择排序差不多&#xff0c;不过冒泡排序是直接交换。 public static void bubbleSort(int[] array){for (int i 0; i < array.length - 1; i) {for (int j 0; j < array.length-1-i; j…

unity2D游戏开发02添加组件移动玩家

添加组件 给PlayGame和EnemyObject添加组件BoxCollider 2D碰撞器&#xff0c;不用修改参数 给PlayGame添加组件Rigibody 2D 设置数据 添加EnemyObject&#xff0c;属性如下 Edit->project setting->Physics 2D 将 y的值改为0 给playerObject添加标签 新建层 将PlayerObj…

安宝特方案|解放双手,解决死角,AR带来质量监督新体验

AR质量监督 解放双手&#xff0c;解决死角 在当今制造业快速发展的背景下&#xff0c;质量监督成为确保产品高质量和完善的管理制度的关键环节。然而&#xff0c;传统的质量监督方式存在诸多挑战&#xff0c;如人工操作带来的效率低下、查岗不及时、摄像头死角等问题。 为了解…

【Django】在vscode中新建Django应用并新增路由

文章目录 打开一个终端输入新建app命令在app下的views.py内写一个视图app路由引入该视图项目路由引入app路由项目(settings.py)引入app&#xff08;AntappConfig配置类&#xff09;运行项目 打开一个终端 输入新建app命令 python manage.py startapp antapp在app下的views.py内…