Day14 - CV项目实战：SAR飞机检测识别

论文原文：

SAR-AIRcraft-1.0:高分辨率SAR飞机检测识别数据集 - 中国知网

第一排的7张图片，普通人肉眼很难看出对应的是第二排的飞机。

还有上图里标注的飞机，外行根本看不明白，为什么这些是，其他的不是。作为一个外行，问题似乎很严峻，从图里根本识别不出来哪里是飞机，也没法标注。其实，在面对各行业问题的时候，经常会遇到这种问题，感觉可能搞不定。但是没关系，很多东西是反直觉的，这个任务真正做起来不难。

首先，目标检测的标注是非常重要的。

如何标注数据？

很多专业数据我们看不懂，标不了，没关系，这个其实是给行业专家来标注的。

新建项目文件夹 CVProject / label_demo /images --->用来放所有的图片

新建项目文件 CVProject / label_demo / classes.txt --->用来放所有的类别

论文里一共有七个类别的数据，我们将其写入到classes.txt文件中。

目标检测有三种经典数据格式

1. VOC:

.xml
xmin, ymin, xmax, ymax
左上角，右下角
绝对坐标（像素值）

from xml.etree import ElementTree# 解析XML文件
tree = ElementTree.parse(source="0000001.xml")
# 获取XML文件的根元素
root = tree.getroot()# 获取图像路径
img_path = root.find(path="path").text
print(img_path)  # 输出图像路径# 获取图像的宽度和高度
img_width = int(root.find(path="size").find(path="width").text)
img_height = int(root.find(path="size").find(path="height").text)
print(img_height, img_width)  # 输出图像的高度和宽度# 遍历所有目标对象
for obj in root.findall(path="object"):# 获取目标名称print(obj.find("name").text)# 获取边界框的坐标xmin = int(obj.find("bndbox").find("xmin").text)ymin = int(obj.find("bndbox").find("ymin").text)xmax = int(obj.find("bndbox").find("xmax").text)ymax = int(obj.find("bndbox").find("ymax").text)print(xmin, ymin, xmax, ymax)  # 输出边界框的坐标print("-" * 80)  # 分割线# 定义标签到索引的映射
label2idx = {"A330": 0, "A320/321": 1, "A220": 2, "ARJ21": 3, "Boeing737": 4, "Boeing787": 5, "other": 6
}# 定义索引到标签的映射
idx2label = {idx: label for label, idx in label2idx.items()}
print(idx2label)  # 输出索引到标签的映射

2. YOLO

.txt
cls_id, x_center, y_center, w, h
类别id，框中心点x坐标，框中心点y坐标，框宽度，框高度
类别id：跟标注的顺序一致即可，从0开始
相对坐标，百分比（跟x有关的都除以图像的宽度，跟y有关的都除以图像的高度）

YOLO官网：https://docs.ultralytics.com

用产品的心态去做技术，基本上你能想到的，都帮你实现了。

# 定义函数，将VOC格式转换为YOLO格式
def transfer_yolo(file_name="./0000001.xml"):# 解析XML文件tree = ElementTree.parse(source=file_name)# 获取XML文件的根元素root = tree.getroot()# 获取图像的宽度和高度img_width = int(root.find(path="size").find(path="width").text)img_height = int(root.find(path="size").find(path="height").text)# 打开输出文件，准备写入YOLO格式的数据with open(file=file_name.replace(".xml", ".txt"), mode="w", encoding="utf8") as f:# 遍历每个目标对象for obj in root.findall(path="object"):# 获取目标名称name = obj.find("name").text# 获取目标类别的索引cls_id = label2idx.get(name)# 获取边界框的坐标xmin = int(obj.find("bndbox").find("xmin").text)ymin = int(obj.find("bndbox").find("ymin").text)xmax = int(obj.find("bndbox").find("xmax").text)ymax = int(obj.find("bndbox").find("ymax").text)# 计算边界框的中心点坐标x_center = round(number=(xmin + xmax) / 2 / img_width, ndigits=6)y_center = round(number=(ymin + ymax) / 2 / img_height, ndigits=6)# 计算边界框的宽度和高度box_width = round(number=(xmax - xmin) / img_width, ndigits=6)box_height = round(number=(ymax - ymin) / img_height, ndigits=6)# 将YOLO格式的数据写入文件print(cls_id, x_center, y_center, box_width, box_height, sep=" ", end="\n", file=f)# 调用函数进行转换
transfer_yolo()

3. COCO

.json
x, y, width, height
x：中心点的 x 坐标
y：中心点的 y 坐标
width：框的宽度
height：框的高度
原始坐标（像素值）

import json# 打开JSON文件，模式为只读，编码为UTF-8
with open(file="./0000001.json", mode="r", encoding="utf8") as f:# 读取文件内容并解析为Python对象（通常是字典或列表）data = json.loads(s=f.read())# 获取第一个对象（假设data是一个列表，其中包含多个对象）
objs = data[0]# 获取第一个对象的所有键
keys = objs.keys()
print(keys)  # 输出所有键# 获取图像信息
image_info = objs["image"]
print(image_info)  # 输出图像信息# 遍历标注信息
for obj in objs["annotations"]:# 获取标注的标签label = obj["label"]# 获取标注的坐标coordinates = obj["coordinates"]print(label)  # 输出标签print(coordinates)  # 输出坐标

训练相关的指标

Epoch：训练轮次（把所有数据都使用一遍，称为一轮）
GPU_mem：GPU的显存占用情况
box_loss：框的误差的损失
cls_loss：类别损失
dfl_loss：框与框之间比较的损失
Instances：本批次有多少个目标
Size：训练的图像大小
Box:
- P：Precision 精准率 = TP / (TP + FP)
  - 误报
  - 预测出的正框有多少是真正的正框
  - 上图示例中检测出的A330飞机里，只有83.1%是真正的A330飞机，相当于误报16.9%
  - 图像分类很强调ACC，为什么目标检测没有ACC了，因为目标检测是严重负样本很多，所有的背景都是负样本
- R：Recall 召回
  - 漏检
  - 代表正样本挖掘的能力
  - 比如测试集里有100个A330，其中能找出88.4个是A330，相当于漏检11.6%
- mAP：
  - Mean Average Precision 全类平均准确率
  - 比如上图中all这行，mean表示7个类别的平均，AP表示平均准确率
  - mAP的核心考察的是Precision精准率，主要是防误报
  - mAP50，其中的50表示IoU，这就涉及到了什么是正样本，什么是负样本的问题
  - mAP_0.5, mAP@0.5, mAP_0.5:0.95