当前位置：首页 > news >正文

关于RK3588cpu多线程速度慢的问题

news 2025/7/4 10:49:57

项目场景

接到项目需求，需要将一个目标检测模型部署到RK3588上面执行，最对同时要跑4张图像，简单了解了下RK3588的CPU是有A76大核的，速度应该也挺快，于是尝试了CPU和NPU两种部署方式并进行耗时对比，然后问题就来了。

问题描述

cpu部署

我用的是paddle-lite框架部署到cpu上，选择了高负载模式，然后发现竟然是2线程速度最快，4线程速度反而会下降。同时，除了目标检测，还有一部分非深度学习的算法并行执行，在4线程的情况下这部分算法耗时竟然增加了2倍。总的来说，就是线程开多了速度反而慢了。

npu部署

这种部署方式我选择了FastDeploy，为了兼容模型做了一些后处理修改，后处理部分也有一些CPU计算，可能有一些耗时，但不多。这个方式就更离谱了，单线程跑模型也会使得非深度学习部分的算法耗时增加2倍。。有点无语。多线程跑的话模型耗时大大降低，但非深度学习部分算法耗时依然很高。

原因分析：

可能是系统的CPU调度方案，让某些线程跑在小核上面，导致速度变慢。

解决方案：

将线程和大核进行绑定。问了AI，然后写了一个检测大核和绑核的代码，将非深度学习部分线程和大核绑定，结果速度瞬间就上去了，问题部分解决，至于cpu部署4线程比2线程快这个问题不太好弄，暂时搁置了。下面是绑核代码：

#include <sched.h>
#include <vector>
#include <thread>
#include <fstream>
#include <sys/syscall.h>
#include <unistd.h>
// 动态检测高性能核心（大核）
std::vector<int> detect_big_cores() {std::vector<int> big_cores;const int MAX_CORE = 16; // 最大支持16核for(int i=0; i<MAX_CORE; ++i){std::string freq_path = "/sys/devices/system/cpu/cpu" + std::to_string(i) + "/cpufreq/cpuinfo_max_freq";std::ifstream ifs(freq_path);if(ifs.good()){int freq;ifs >> freq;if(freq > 2000000) // 2GHz以上判定为大核[1](@ref)big_cores.push_back(i);}}return big_cores.empty() ? std::vector<int>{0} : big_cores; // 默认核心0
}std::vector<int> detect_small_cores()
{std::vector<int> big_cores;const int MAX_CORE = 16; // 最大支持16核for(int i=0; i<MAX_CORE; ++i){std::string freq_path = "/sys/devices/system/cpu/cpu" + std::to_string(i) + "/cpufreq/cpuinfo_max_freq";std::ifstream ifs(freq_path);if(ifs.good()){int freq;ifs >> freq;if(freq < 2000000) // 2GHz以上判定为大核[1](@ref)big_cores.push_back(i);}}return big_cores.empty() ? std::vector<int>{0} : big_cores; // 默认核心0
}// 绑核核心函数
int bind_to_cores(const std::vector<int>& core_ids, bool is_process) {cpu_set_t mask;CPU_ZERO(&mask);for(auto id : core_ids) {CPU_SET(id, &mask);  // 构建CPU掩码[4,7](@ref)}#if defined(__ANDROID__) || defined(__linux__)pid_t pid = is_process ? 0 : syscall(SYS_gettid); // 进程/线程区分处理[1](@ref)int ret = sched_setaffinity(pid, sizeof(mask), &mask);
#elif defined(_WIN32)// Windows实现略（需使用SetThreadAffinityMask）
#endifreturn ret == 0 ? 0 : -1;
}
#else
std::vector<int> detect_big_cores() 
{return std::vector<int> ();
}
int bind_to_cores(const std::vector<int>& core_ids, bool is_process)
{return -1;
}
#endif//使用例子，在线程中执行以下代码，将线程和大核1绑定
auto cores = detect_big_cores();			
if(cores.size() > 0) bind_to_cores({cores[0]});