【HPC存储性能测试】01-OpenMPI部署
文章目录
- 一、前言
- 1、关于HPC
- 2、关于MPI
- 二、部署过程
- 1、配置免秘钥
- 2、软件安装
- 2.1、在线安装
- 2.2、源码安装
- 三、配置使用
- 1、通用格式
- 2、参考示例
- 四、Q&A
- 1、运行失败,错误提示为"was unable to find any relevant network interfaces"
- 问题说明
- 解决措施
- 2、运行失败,错误提示为"bash:orted:command not found"
- 问题说明
- 解决措施
- 3、运行失败,错误提示为"mpirun was unable to launch the specified application as it could not access or execute an executable"
- 问题说明
- 解决措施
- 4、运行失败,错误提示为"The authenticity of host 'client47 (192.168.10.47)' can't be established."
- 问题说明
- 解决措施
一、前言
1、关于HPC
HPC为高性能计算(High Performance Computing)简称,HPC系统通过汇总多个计算资源来快速解决大型计算问题。
HPC 推动了医疗保健、生命科学、媒体、娱乐、金融服务和能源等行业的研究和创新。研究人员、科学家和分析师使用 HPC 系统开展实验、运行模拟和评估原型。地震处理、基因组测序、媒体渲染和气候建模等 HPC 工作负载需要以不断增加数据速率和不断降低延迟的方式(高带宽、低时延)来生成和访问大量数据。高性能存储是HPC基础架构的关键基础组件。
高性能并行计算由于具有巨大的数值计算和数据处理能力,在国内外受到高度重视,它在科学研究、工程技术以及军事等方面的应用,已经取得巨大的成就。并行计算就是通过把一个大的计算问题分解成许多彼此独立且有相关的子问题,然后把他们散列到各个节点机上并行执行从而最终解决问题的一种方法。
2、关于MPI
MPI(Message Passing Interface)即标准消息传递界面,可以用于并行计算
通常在进行HPC存储测试时,会结合MPI程序通过多节点并行运行测试工具,以此测试文件系统最大性能值,目前MPI程序有以下实现版本
. | MPICH | MVAPICH | OpenMPI | Intel MPI |
---|---|---|---|---|
是否开源 | 是 | 是 | 是 | 否 |
支持网络 | 以太网 | InfiniBand、以太网 | InfiniBand、以太网 | InfiniBand、以太网 |
MPI标准 | 2.2、3.0 | 2.2 | 2.2 | 2.2 |
前身 | MPICH | MVAPICH | LAM-MPI | / |
二、部署过程
1、配置免秘钥
示例存在节点如下:
节点IP | 节点主机名 | 备注 |
---|---|---|
172.16.21.93 | node93 | 主节点 |
172.16.21.94 | node94 | 从节点 |
- 所有节点修改
/etc/hosts
配置文件如下,添加主机名和IP映射关系
[root@node93 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.21.93 node93
172.16.21.94 node94
- 主节点执行配置如下,实现主节点到从节点ssh免密登录
[root@node93 ~]# ssh-keygen
[root@node93 ~]# ssh-copy-id node94
- 从节点执行配置如下,实现从节点到主节点ssh免密登录
[root@node94 ~]# ssh-keygen
[root@node94 ~]# ssh-copy-id node93
2、软件安装
2.1、在线安装
- 配置国内镜像源
yum install wget -y
mv /etc/yum.repos.d/ /etc/yum.repos.d-bak/
mkdir /etc/yum.repos.d/
wget http://mirrors.aliyun.com/repo/Centos-7.repo -P /etc/yum.repos.d/
wget http://mirrors.aliyun.com/repo/epel-7.repo -P /etc/yum.repos.d/
yum makecache
- 安装openmpi软件包
yum install openmpi openmpi-devel -y
- 配置环境变量,添加bin环境和库环境
echo "export PATH=$PATH:/usr/lib64/openmpi/bin/" >> /root/.bashrc
echo "export LD_LIBRARY_PATH=/usr/(local/)lib/openmpi:\$LD_LIBRARY_PATH" >> /root/.bashrc
echo "export LD_LIBRARY_PATH=/usr/(local/)lib:\$LD_LIBRARY_PATH" >> /root/.bashrc
source /root/.bashrc
2.2、源码安装
- 源码编译
注:–perfix指定路径为openmpi安装路径
wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.0.tar.gz
tar -zxvf openmpi-3.1.0.tar.gz
mkdir /usr/local/openmpi
cd openmpi-3.1.0
./configure --prefix="/usr/local/openmpi/"
make
make install
- 配置环境变量
echo "export PATH=\$PATH:/usr/local/openmpi/bin" >> /root/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/openmpi/lib/" >> /root/.bashrc
source /root/.bashrc
三、配置使用
mpirun(1) man page (version 3.0.6)
Tuning the run-time characteristics of MPI TCP communications
1、通用格式
mpirun [ -allow-run-as-root ] [ -host <host1,host2> | -hostfile <hostfile-name> ] [-np <thread-num>] [--mca btl_tcp_if_include <netcard-name>] [-mca plm_rsh_no_tree_spawn 1] <program>
-
-allow-run-as-root
mpirun默认不允许使用root用户运行,如需root用户执行指令则需要加此参数 -
-host <host1,host2> | -hostfile <hostfile-name>
mpirun指定多个客户端并行运行指令,有两种指定客户端方式 -
-host <host1,host2>:通过命令指定运行客户端,如指定node93、node94客户端运行指令,则指定配置为
-host node93,node94
,默认每个客户端运行线程数为1,如需指定更多线程,可以使用方式二 -
-hostfile :通过文件指定运行客户端,如指定node93、node94、node95客户端运行执行,则指定配置为
-hostfile ./hosts
,hosts文件信息如下,其中slots
参数用于指定客户端线程数(当不指定时默认为1)
注:线程数np与hostfile中所有节点的slots参数值之和不一定要相等,程序运行会根据hostfile文件一行行调用客户端对应线程,若前面几个客户端slots总数可以满足当前线程数需求,则后面的客户端将不会进行运行调用(参考示例如下,如线程数np为5,则只会在node93、node94运行指令)
node93 slots=1
node94 slots=4
node95 slots=4
- np:总的线程数
- –mca btl_tcp_if_include :指定运行指令网卡名称,如指定使用eth0网卡,则指定配置为
--mca btl_tcp_if_include eth0
- –mca plm_rsh_no_tree_spawn 1:用于配置MPI并行作业启动方式,强制使用非树形模式启动进程(每个进程直接通过rsh远程shell启动,而不是通过中间节点),可以解决在某些网络配置下启动MPI作业遇到的问题
- program:多客户端运行程序或命令
2、参考示例
指定node58、node86客户端运行hostname指令,总运行线程数为2
[root@node58 ~]# cat hosts
node58 slots=1
node86 slots=2
[root@node58 ~]# mpirun -allow-run-as-root -hostfile ./hosts -np 2 hostname
node58
node86
四、Q&A
1、运行失败,错误提示为"was unable to find any relevant network interfaces"
问题说明
运行测试失败,错误提示信息如下
[root@node200 client]# mpirun --allow-run-as-root -hostfile hostfile -np 24 ./IOR -b 32G -t 1m -w -r -g -F -e -k -B -o /client/ior/testfile-8[[34205,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:Module: OpenFabrics (openib)Host: node200Another transport will be used instead, although this may result in
lower performance.NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
[node200][[34205,1],0][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],4]
--------------------------------------------------------------------------
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.This attempted connection will be ignored; your MPI job may or may not
continue properly.Local host: node200PID: 29308
--------------------------------------------------------------------------
[node200][[34205,1],2][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],3]
[node200][[34205,1],5][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],2]
[node200][[34205,1],6][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],0]
[node200][[34205,1],4][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],7]
[node200][[34205,1],3][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],5]
[node200][[34205,1],7][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[34205,1],6]
[node200:29300] 23 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[node200:29300] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
解决措施
节点使用多张网卡导致,需设定参数指定具体使用网卡名称,如--mca btl_tcp_if_include eth0
,添加该参数后重新运行测试成功
mpirun --allow-run-as-root -hostfile hostfile --mca btl_tcp_if_include eth0 -np 12 ./IOR -b 64G -t 1m -w -r -g -F -e -k -B -o /client/ior/testfile-8
2、运行失败,错误提示为"bash:orted:command not found"
问题说明
运行测试失败,错误提示信息如下
[root@casic client]# mpirun --allow-run-as-root -hostfile hostfile -np 30 ./IOR -b 25G -t 1m -w -r -g -F -e -k -B -o /client/ior-test/testfile-10
bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:* not finding the required libraries and/or binaries onone or more nodes. Please check your PATH and LD_LIBRARY_PATHsettings, or configure OMPI with --enable-orterun-prefix-by-default* lack of authority to execute on one or more specified nodes.Please verify your allocation and authorities.* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).Please check with your sys admin to determine the correct location to use.* compilation of the orted with dynamic libraries when static are required(e.g., on Cray). Please check your configure cmd line and consider usingone of the contrib/platform definitions for your system type.* an inability to create a connection back to mpirun due to alack of common network interfaces and/or no route found betweenthem. Please check network connectivity (including firewallsand network routing requirements).
--------------------------------------------------------------------------
--------------------------------------------------------------------------
ORTE does not know how to route a message to the specified daemon
located on the indicated node:my node: client58target node: client60This is usually an internal programming error that should be
reported to the developers. In the meantime, a workaround may
be to set the MCA param routed=direct on the command line or
in your environment. We apologize for the problem.
解决措施
环境变量配置缺失,添加bin环境和lib环境在/root/.bashrc
配置文件
echo "export PATH=\$PATH:/usr/local/openmpi/bin" >> /root/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/openmpi/lib/" >> /root/.bashrc
echo "export PATH LD_LIBRARY_PATH" >> /root/.bashrc
source /root/.bashrc
3、运行失败,错误提示为"mpirun was unable to launch the specified application as it could not access or execute an executable"
问题说明
运行测试失败,错误提示信息如下
[root@casic client]# mpirun --allow-run-as-root -hostfile hostfile -np 24 --mca btl_tcp_if_include eth2 ./IOR -b 8G -t 1m -w -r -g -F -e -k -B -o /client/ior-test/file-8
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:Executable: ./IOR
Node: client59while attempting to start process rank 8.
--------------------------------------------------------------------------
16 total processes failed to start
[warn] Epoll MOD(1) on fd 28 failed. Old events were 6; read change was 0 (none); write change was 2 (del): Bad file descriptor
[warn] Epoll MOD(4) on fd 28 failed. Old events were 6; read change was 2 (del); write change was 0 (none): Bad file descriptor
解决措施
查看具体执行程序和执行节点,在执行节点上单独运行执行程序,看能否运行,可尝试重新编译执行程序的运行环境
4、运行失败,错误提示为"The authenticity of host ‘client47 (192.168.10.47)’ can’t be established."
问题说明
运行测试失败,错误提示信息如下
[root@client58 hpc]# mpirun --allow-run-as-root -hostfile hostfile -np 12 --mca btl_tcp_if_include eth2 ./IOR -b 16G -t 1m -w -r -g -F -e -k -B -o /dcc/hpc/ior-04/
The authenticity of host 'client47 (192.168.10.47)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client36 (192.168.10.36)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client26 (192.168.10.26)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client37 (192.168.10.37)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client41 (192.168.10.41)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client42 (192.168.10.42)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client50 (192.168.10.50)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'client48 (192.168.10.48)' can't be established.
RSA key fingerprint is c7:67:bd:6a:81:67:e3:b9:6d:8e:4e:c8:18:f6:e9:ab.
Are you sure you want to continue connecting (yes/no)? yes
解决措施
openmpi联机运行是通过hostfile去远程ssh到其他客户端节点进行操作的,配置完免秘钥之后,需要执行命令“ssh hostname”,ssh到其他客户端之后,选择“yes”保存RSA Key文件