做到这里,其实只是想探索下新的检测框架、探索下atlas下ACL的推理方式。整个过程持续了3-4周把,回顾一下,感觉还是需要一些技巧才能拿下,如果没有任何经验的是断难搞定此代码的。主要基于华为的官方例子,里面修改了原始代码中某些库不支持的问题、解决了模型转化过程中的一些问题,发现了ACL不支持多线程的问题。
路漫漫、吾上下,纵使困顿难行,亦当砥砺奋进。
数据集制作标注:
数据集为网上搜集的,共计1407张。实用labelimg进行标注,大概花了1个多星期。
本着取之于民用之于民的思想,也分享出来,(https://download.csdn.net/download/qq_14845119/89775339),需要自提。
数据集目录结构如下,
模型训练:
训练yolov7-tiny模型
python train.py --workers 8 --device 1 --batch-size 32 --data data/sleep.yaml --img 640 640 --cfg cfg/training/yolov7-tiny-sleep.yaml --weights ./yolov7.pt --name yolov7 --hyp data/hyp.scratch.tiny.yaml
训练结果,
可视化效果,
训练yolov7模型
python train.py --workers 8 --device 0 --batch-size 32 --data data/sleep.yaml --img 640 640 --cfg cfg/training/yolov7-sleep.yaml --weights ./yolov7.pt --name yolov7 --hyp data/hyp.scratch.p5.yaml
训练结果,
可视化效果,
这里对比的训练的yolov7、yolov7-tiny两个模型,从精度上的明显差距,最终选择了yolov7模型作为最终模型。
模型转化:
Pt模型转化onnx模型,
python export.py --weights runs/train/yolov73/weights/best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
首先使用Netron工具查看模型最终输出的3个节点的名称,得到模型的3个输出节点名称。
cd model/
atc --model=yolov7.onnx --framework=5 --output=yolov7 --soc_version=Ascend310P3 --out_nodes="/model/model.105/m.0/Conv:0;/model/model.105/m.1/Conv:0;/model/model.105/m.2/Conv:0" --input_shape="images:1,3,640,640"
后处理转化:
cd src/
python3 postProcessOperator.py
cd model/
atc --model=postprocess.onnx --soc_version=Ascend310P3 --output=postprocess --framework=5 --input_shape='img_info:1,4'
后处理postProcessOperator.py代码如下,
import onnx
from onnx import helper
h, w = 640, 640
boxNum = 3
outNUm = 3
#classes = 80
classes = 1
coords = 4
f_h, f_w = h // 8, w // 8
# yolov7x anchor
anchor = [12.0, 16.0, 19.0, 36.0, 40.0, 28.0, 36.0, 75.0, 76.0, 55.0, 72.0, 146.0, 142.0, 110.0, 192.0, 243.0, 459.0, 401.0]
#input_shape_0 = [1, 255, 80, 80]
#input_shape_1 = [1, 255, 40, 40]
#input_shape_2 = [1, 255, 20, 20]input_shape_0 = [1, 18, 80, 80]
input_shape_1 = [1, 18, 40, 40]
input_shape_2 = [1, 18, 20, 20]img_info_num = 4
max_boxes_out = 6 * 1024
box_num = 8
pre_nms_topn = 1024
post_nms_topn = 1024
relative = 1
out_box_dim = 2
obj_threshold = 0.25
score_threshold = 0.25
iou_threshold = 0.45input_0 = helper.make_tensor_value_info("input_0", onnx.TensorProto.FLOAT, input_shape_0)
input_1 = helper.make_tensor_value_info("input_1", onnx.TensorProto.FLOAT, input_shape_1)
input_2 = helper.make_tensor_value_info("input_2", onnx.TensorProto.FLOAT, input_shape_2)crd_align_len = f_h * f_w
obj_align_len = boxNum * f_h * f_w
coord_data_0 = helper.make_tensor_value_info("yolo_coord_0", onnx.TensorProto.FLOAT, ['batch', boxNum * 4, crd_align_len])
obj_prob_0 = helper.make_tensor_value_info("yolo_obj_0", onnx.TensorProto.FLOAT, ['batch', obj_align_len])
classes_prob_0 = helper.make_tensor_value_info("yolo_classes_0", onnx.TensorProto.FLOAT,['batch', classes, obj_align_len])yolo_pre_node_0 = helper.make_node('YoloPreDetection',inputs = ["input_0"],outputs = ["yolo_coord_0", "yolo_obj_0", "yolo_classes_0"],boxes = boxNum,coords = coords,classes = classes,yolo_version ='V5',name = "yolo_pre_node_0")f_h, f_w = f_h // 2, f_w // 2
crd_align_len = f_h * f_w
obj_align_len = boxNum * f_h * f_w
coord_data_1 = helper.make_tensor_value_info("yolo_coord_1", onnx.TensorProto.FLOAT, ['batch', boxNum * 4, crd_align_len])
obj_prob_1 = helper.make_tensor_value_info("yolo_obj_1", onnx.TensorProto.FLOAT, ['batch', obj_align_len])
classes_prob_1 = helper.make_tensor_value_info("yolo_classes_1", onnx.TensorProto.FLOAT,['batch', classes, obj_align_len])yolo_pre_node_1 = helper.make_node('YoloPreDetection',inputs = ["input_1"],outputs = ["yolo_coord_1", "yolo_obj_1", "yolo_classes_1"],boxes = boxNum,coords = coords,classes = classes,yolo_version = 'V5',name = "yolo_pre_node_1")f_h, f_w = f_h // 2, f_w // 2
crd_align_len = f_h * f_w
obj_align_len = boxNum * f_h * f_w
coord_data_2 = helper.make_tensor_value_info("yolo_coord_2", onnx.TensorProto.FLOAT, ['batch', boxNum * 4, crd_align_len])
obj_prob_2 = helper.make_tensor_value_info("yolo_obj_2", onnx.TensorProto.FLOAT, ['batch', obj_align_len])
classes_prob_2 = helper.make_tensor_value_info("yolo_classes_2", onnx.TensorProto.FLOAT,['batch', coords, obj_align_len])yolo_pre_node_2 = helper.make_node('YoloPreDetection',inputs=["input_2"],outputs=["yolo_coord_2", "yolo_obj_2", "yolo_classes_2"],boxes=boxNum,coords = coords,classes = classes,yolo_version='V5',name="yolo_pre_node_2")# create yolo detection output layer
img_info = helper.make_tensor_value_info("img_info", onnx.TensorProto.FLOAT, ['batch', img_info_num])
box_out = helper.make_tensor_value_info("box_out", onnx.TensorProto.FLOAT, ['batch', max_boxes_out])
box_out_num = helper.make_tensor_value_info("box_out_num", onnx.TensorProto.INT32, ['batch', box_num])
yolo_detect_node = helper.make_node('YoloV5DetectionOutput',inputs = [f"yolo_coord_{i}" for i in range(outNUm)] +[f"yolo_obj_{i}" for i in range(outNUm)] +[f"yolo_classes_{i}" for i in range(outNUm)] +['img_info'],outputs = ['box_out', 'box_out_num'],boxes = boxNum,coords = coords,classes = classes,pre_nms_topn = pre_nms_topn,post_nms_topn = post_nms_topn,relative = relative,out_box_dim = out_box_dim,obj_threshold = obj_threshold,score_threshold = score_threshold,iou_threshold = iou_threshold,biases = anchor,name ='YoloV5DetectionOutput')
# make graph
graph = helper.make_graph(nodes = [yolo_pre_node_0, yolo_pre_node_1, yolo_pre_node_2, yolo_detect_node],name = "yolo",inputs = [input_0, input_1, input_2, img_info],outputs = [box_out, box_out_num]
)onnx_model = helper.make_model(graph, producer_name="onnx-parser")
onnx_model.opset_import[0].version = 12
onnx.save(onnx_model, "./models/postprocess.onnx")
注意这里要根据自己的实际训练参数,修改classes、anchor、input_shape_0、input_shape_1、input_shape_2等变量的数值。
推理代码编写:
import os
import cv2
import time
import numpy as np
from label import labelsimport sys
sys.path.append("./acllite/python/")from acllite_imageproc import AclLiteImage
from acllite_imageproc import AclLiteImageProc
from acllite_model import AclLiteModel
from acllite_resource import AclLiteResource
from acllite_logger import log_infoclass YOLOV7_NMS_ONNX(object):def __init__(self,):self.yolo_model_path = "./models/yolov7.om" # stringself.yolo_model = Noneself.yolo_model_width = 640self.yolo_model_height = 640#self.yolo_result = Noneself.postprocess_model = Noneself.postprocess_model_path = "./models/postprocess.om"#self.postprocess_input = None#self.postprocess_result = Noneself.resource = None#self.image = None#self.resized_image = Noneself.init_resource()def init_resource(self):# init acl resourceself.resource = AclLiteResource()self.resource.init()# load yolo model from fileself.yolo_model = AclLiteModel(self.yolo_model_path)# load postprocess model from fileself.postprocess_model = AclLiteModel(self.postprocess_model_path)def yolo_process_input(self, image):image = image[:,:,::-1]resized = cv2.resize(image, (self.yolo_model_width, self.yolo_model_height))new_image = resized.astype(np.float32)new_image = new_image / 255.0resized_image = new_image.transpose(2, 0, 1).copy()#self.resized_image = new_image#self.image = imagereturn resized_imagedef postprocess_process_input(self, yolo_result):# construct image infoimage_info = np.array([self.yolo_model_width, self.yolo_model_height,self.yolo_model_width, self.yolo_model_height],dtype=np.float32)#yolo_result.reverse()# construct postprocess inputpostprocess_input = [*yolo_result, image_info] return postprocess_inputdef yolo_inference(self, resized_image):# inferenceyolo_result = self.yolo_model.execute([resized_image]) return yolo_resultdef postprocess_inference(self, postprocess_input):postprocess_result = self.postprocess_model.execute(postprocess_input)return postprocess_resultdef postprocess_get_reslut(self, src_image, postprocess_result):box_num = postprocess_result[1][0, 0]box_info = postprocess_result[0].flatten()scale_x = src_image.shape[1] / self.yolo_model_widthscale_y = src_image.shape[0] / self.yolo_model_height# get scale factor#if scale_x > scale_y:# max_scale = scale_x#else:# max_scale = scale_yids = []scores = []boxes = []for n in range(int(box_num)):id = int(box_info[5 * int(box_num) + n])score = box_info[4 * int(box_num) + n]top_left_x = box_info[0 * int(box_num) + n] * scale_x#max_scaletop_left_y = box_info[1 * int(box_num) + n] * scale_y#max_scalebottom_right_x = box_info[2 * int(box_num) + n] * scale_x#max_scalebottom_right_y = box_info[3 * int(box_num) + n] * scale_y#max_scaleif id ==0:ids.append(id)scores.append(float(score))boxes.append([int(top_left_x), int(top_left_y), int(bottom_right_x), int(bottom_right_y)])return ids, scores, boxesdef release_resource(self):# release resource includes acl resource, data set and unload modelself.yolo_model.__del__()self.postprocess_model.__del__()self.resource.__del__()AclLiteResource.__del__ = lambda x: 0AclLiteModel.__del__ = lambda x: 0def process(self, image):resized_image = self.yolo_process_input(image)yolo_result = self.yolo_inference(resized_image)postprocess_input = self.postprocess_process_input(yolo_result)postprocess_result = self.postprocess_inference(postprocess_input)ids, scores, boxes = self.postprocess_get_reslut(image, postprocess_result)return ids, scores, boxesdef draw(self, image, ids, scores, boxes):colors = [0, 0, 255]# draw the boxes in original image for id, score, box in zip(ids, scores, boxes):label = labels[id] + ":" + str("%.2f" % score)cv2.rectangle(image, (box[0], box[1]), (box[2], box[3]), colors)p3 = (max(box[0], 15), max(box[1], 15))cv2.putText(image, label, p3, cv2.FONT_ITALIC, 0.6, colors, 2)return imagedef __del__(self):self.release_resource()def test_images():current_dir = os.path.dirname(os.path.abspath(__file__))images_path = os.path.join(current_dir, "./data")all_path = []for path in os.listdir(images_path):if path.split(".")[-1] != 'mp4':total_path = os.path.join(images_path, path)all_path.append(total_path)print(all_path)if len(all_path) == 0:raise Exception("the directory is empty, please download image")net = YOLOV7_NMS_ONNX()for images_path in all_path:image = cv2.imread(images_path)t1 = time.time()ids, scores, boxes = net.process(image)src_image = net.draw(image, ids, scores, boxes)t2 = time.time()print("time cost:", t2-t1)output_path = os.path.join("./out", os.path.basename(images_path))cv2.imwrite(output_path, src_image)log_info("success")def test_video():yolov7 = YOLOV7_NMS_ONNX()# Open the video filevideo_path = "./data/sleep.mp4"cap = cv2.VideoCapture(video_path)fourcc = cv2.VideoWriter_fourcc('X', 'V', 'I', 'D') # 确定视频被保存后的编码格式output = cv2.VideoWriter("output.mp4", fourcc, 25, (852, 480)) # 创建VideoWriter类对象# Loop through the video frameswhile cap.isOpened():# Read a frame from the videosuccess, frame = cap.read()if success:# Run YOLOv8 tracking on the frame, persisting tracks between framest1 = time.time()ids, scores, boxes = yolov7.process(frame)t2 = time.time()annotated_frame = yolov7.draw(frame, ids, scores, boxes)t3 = time.time()print("time", t2-t1, t3-t2,t3-t1)output.write(annotated_frame)# Display the annotated frame# Break the loop if 'q' is pressedif cv2.waitKey(1) & 0xFF == ord("q"):breakelse:# Break the loop if the end of the video is reachedbreak# Release the video capture object and close the display windowcap.release()cv2.destroyAllWindows()if __name__ == "__main__":#test_images()test_video()
测试效果:
视频测试:
B站找一段睡觉的视频下载下来,这里实用you-get工具,
you-get 当 代 高 三 牲 睡 觉 图 鉴_哔哩哔哩_bilibili,实际效果如下,
其他问题记录:
本来自己是想做一个grpc的架构的,可是实际做的过程中发现华为的AclLiteModel实现的很差,对于进程、线程这些非常不友好,必须得是一个进程,同样的上下文才可以得到正确的推理结果。最终也没好的解决方法,重新修改了架构。
华为官方文档链接(aclmdlExecute-模型加载与执行-AscendCL API参考-应用开发(C++)-推理应用开发-CANN商用版6.0.1开发文档-昇腾社区)
参考链接:
https://github.com/WongKinYiu/yolov7
Ascend/modelzoo-GPL - Gitee.com
samples: CANN Samples - Gitee.com