24Hibench

1. Hibench

官网

​ HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.

1.1 workloads

There are totally 29 workloads in HiBench. The workloads are divided into 6 categories which are micro, ml(machine learning), sql, graph, websearch and streaming.

1.2 install maven

首先需要安装maven,并配好环境安装教程

mkdir repo
cd conf
vim settings.xml

修改仓库地址

<localRepository>/opt/module/maven/apache-maven-3.8.6/repo</localRepository>

阿里云镜像文件中已经有了,注释掉其他mirror

  <mirror><id>alimaven</id><name>aliyun maven</name><url>http://maven.aliyun.com/nexus/content/groups/public/</url><mirrorOf>central</mirrorOf>
</mirror>

1.3 bulid

下载zip文件,上传解压后的文件 不要使用7.11 会出现版本问题

参照文档中的,构建HiBench项目,我使用的是全部安装:

#ALL
mvn -Dspark=3.1 -Dscala=2.12 clean package
#SPARK
mvn -Psparkbench -Dspark=3.1 -Dscala=2.12 clean package

image-20221113191059611

如果出现以下错误:

image-20221113171241273

原因是maven没有安装好,没有设置好镜像以及安装仓库,详情见安装教程

build成功

image-20221114134337141

第二次

image-20221214105422318

1.4 configure

hadoop.conf

image-20221122101805886

spark.conf

image-20221122101910798

Input data size

image-20230408154837846

if you chose a real large data size ,you may find the errors:

image-20230608165007024

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

you need to modify the mapred-site.xml, and add the context:

<property><name>mapred.task.timeout</name><value>800000</value><final>true</final>
</property>
cluster mode
vim /opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh# 修改run_spark_job 方法

image-20230702113310572

1.5 hadoop example

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/hadoop/run.sh

官网地址

运行成功

image-20221122121433468

image-20221122121637126

更详细的介绍

/opt/module/Hibench/HiBench-master/report/wordcount/hadoop

1.6 spark example

准备输入数据

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

控制台输出

image-20221122102053983

yarn

image-20221122102505411

进入HDFS查看准备的输入数据

image-20221122102227590

准备命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

注意jar文件夹中不要包含其他备用的jar包

运行命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/spark/run.sh

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

成功!

image-20221214112321741

结果

image-20230610145858128

trouble

网络配置问题

所有任务在yarn上都用的是内网IP

image-20230608180507018

image-20230609113252954

Permission denied

# 进入bin目录
chmod -R +x ./bin/

multi-job

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/hibench.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/spark.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/workloads/websearch/pagerank.conf
probe sleep jar: /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
ERROR, execute cmd: '( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING )' timedout.STDOUT:STDERR:Please check!
Traceback (most recent call last):File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 685, in <module>load_config(conf_root, workload_configFile, workload_folder, patching_config)File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 217, in load_configgenerate_optional_value()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 613, in generate_optional_valueprobe_masters_slaves_hostnames()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 549, in probe_masters_slaves_hostnamesprobe_masters_slaves_by_Yarn()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 500, in probe_masters_slaves_by_Yarnassert 0, "Get workers from yarn-site.xml page failed, reason:%s\nplease set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually" % e
AssertionError: Get workers from yarn-site.xml page failed, reason:( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING ) executed timedout for 5 seconds
please set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually
start ScalaSparkPagerank bench
/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh: line 38: .: filename argument required
.: usage: . filename [arguments]
/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh: line 26: OUTPUT_HDFS: unbound variable

原因可能是系统负载过高导致的响应迟钝

image-20230710195619538

2.records

parallelism = 18

有input的stage的任务数是由数据数据的大小决定的,spark.default.parallelism决定的是shuffle后的stage的任务数

LR

内存不够

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692258304054&to=1692264023063&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817154625-0003&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222144197

http://192.168.10.102:18080/history/app-20230817154625-0003/stages/

image-20230817182124188

image-20230823160549361

Bayes

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692265076383&to=1692265140060&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817173746-0004&var-groupbyInterval=1s

image-20230817182326359

http://192.168.10.102:18080/history/app-20230817173746-0004/jobs/

image-20230817182448207

image-20230817182410877

image-20230823160616907

NWeightGraphX

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222031686

ScalaPageRank

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692277885573&to=1692278062264&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817211121-0012&var-groupbyInterval=1s

image-20230817212034137

http://192.168.10.102:18080/history/app-20230817211121-0012/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

distinct

image-20230817211940302

flatMap

image-20230817211756998

DenseKMeans

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692273890319&to=1692274051643&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817200442-0009&var-groupbyInterval=1s

image-20230817203320695

http://192.168.10.102:18080/history/app-20230817200442-0009/stages/

image-20230817203354123

image-20230823160320490

map

image-20230817210720121

collect

image-20230817210901877

image-20230817210942909

sort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692276453862&to=1692276644236&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817204743-0011&var-groupbyInterval=1s

image-20230814105408629

http://192.168.10.102:18080/history/app-20230817204743-0011/stages/

image-20230817205354207

map

image-20230817205420719

reduce

image-20230817205507056

TeraSort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692275803257&to=1692276079162&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817203719-0010&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

http://192.168.10.102:18080/history/app-20230817203719-0010/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

map

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

reduce

image-20230817205732605

WordCount

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1691983226006&to=1691983545077&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

image-20230814112936597

parallelism=20

image-20230817210134565

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

image-20230814113241553

reduce

image-20230814113216632

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

[外链图片转存中…(img-eGnFeoml-1696143711685)]

parallelism=20

[外链图片转存中…(img-tqMq87MT-1696143711685)]

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

[外链图片转存中…(img-ckoZlP4I-1696143711686)]

reduce

[外链图片转存中…(img-yA8CbUjp-1696143711686)]

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.xdnf.cn/news/147040.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章

中断向量控制器(NVIC)

1. 什么是中断 在处理器中&#xff0c;中断是一个过程&#xff0c;即CPU在正常执行程序的过程中&#xff0c;遇到外部/内部的紧急事件需要处理&#xff0c;暂时中止当前程序的执行&#xff0c;转而去为处理紧急的事件&#xff0c;待处理完毕后再返回被打断的程序处继续往下执行…

博客无限滚动加载(html、css、js)实现

介绍 这是一个简单实现了类似博客瀑布流加载功能的页面&#xff0c;使用html、css、js实现。简单易懂&#xff0c;值得学习借鉴。&#x1f44d; 演示地址&#xff1a;https://i_dog.gitee.io/easy-web-projects/infinite_scroll_blog/index.html 代码 index.html <!DOCT…

[Linux 基础] 一篇带你了解linux权限问题

文章目录 1、Linux下的两种用户2、文件类型和访问权限&#xff08;事物属性&#xff09;2.1 Linux下的文件类型2.2 基本权限2.3 文件权限值的表示方法&#xff08;1&#xff09;字符表示方法&#xff08;2&#xff09;8进制数值表示方法 2.4 文件访问权限的相关设置方法(1) chm…

R语言中更改R包安装路径

看到这些包下载到我的C盘&#xff0c;我蛮不爽的&#xff1a; 所以决定毫不犹豫的改到D盘&#xff1a; 首先&#xff0c;我们需要在RStudio中新建一个初始启动文件&#xff1a; file.edit(~/.Rprofile) 然后去你喜欢的环境新建一个文件夹存放安装的包的位置&#xff0c;我喜欢…

数据结构与算法课后题-第三章(顺序队和链队)

#include <iostream> //引入头文件 using namespace std;typedef int Elemtype;#define Maxsize 5 #define ERROR 0 #define OK 1typedef struct {Elemtype data[Maxsize];int front, rear;int tag; }SqQueue;void InitQueue(SqQueue& Q) //初始化队列 {Q.rear …

春招秋招,在线测评应用得越来越普及

这年代提到测评&#xff0c;很多人都比较熟悉&#xff0c;它有一种根据所选的问题给予合适答案方面的作用。因为不同的测评带来的影响不一样&#xff0c;所以很多人都会关注在线测评的内容有哪些。在校园招聘上面&#xff0c;在线测评也频繁出现了&#xff0c;这让很多人好奇它…

[Linux]线程互斥

[Linux]线程互斥 文章目录 [Linux]线程互斥线程并发访问问题线程互斥控制--加锁pthread_mutex_init函数pthread_mutex_destroy函数pthread_mutex_lock函数pthread_mutex_unlock函数锁相关函数使用示例使用锁的细节加锁解锁的实现原理 线程安全概念常见的线程不安全的情况常见的…

CV面试知识点总结

一.卷积操作和图像处理中的中值滤波操作有什么区别&#xff1f; 1.1卷积操作 卷积操作是一种线性操作&#xff0c;通常用于特征的提取&#xff0c;通过卷积核的加权求和来得到新的像素值。1.2中值滤波 原文&#xff1a; https://blog.csdn.net/weixin_51571728/article/detai…

【Linux】UDP的服务端 + 客户端

文章目录 &#x1f4d6; 前言1. TCP和UDP2. 网络字节序2.1 大小端字节序&#xff1a;2.2 转换接口&#xff1a; 3. socket接口3.1 sockaddr结构&#xff1a;3.2 配置sockaddr_in&#xff1a;3.3 inet_addr&#xff1a;3.4 inet_ntoa&#xff1a;3.5 bind绑定&#xff1a; 4. 服…

【面试经典150 | 矩阵】旋转图像

文章目录 写在前面Tag题目来源题目解读解题思路方法一&#xff1a;原地旋转方法二&#xff1a;翻转代替旋转 写在最后 写在前面 本专栏专注于分析与讲解【面试经典150】算法&#xff0c;两到三天更新一篇文章&#xff0c;欢迎催更…… 专栏内容以分析题目为主&#xff0c;并附带…

线性表的链式存储结构——链表

一、顺序表优缺点 优点&#xff1a;我们知道顺序表结构简单&#xff0c;便于随机访问表中任一元素&#xff1b; 缺点&#xff1a;顺序存储结构不利于插入和删除&#xff0c;不利于扩充&#xff0c;也容易造成空间浪费。 二、链表的定义 ①&#xff1a;概念&#xff1a; 用一组任…

Springboot+vue的在线试题题库管理系统(有报告),Javaee项目,springboot vue前后端分离项目。

演示视频&#xff1a; Springbootvue的在线试题题库管理系统&#xff08;有报告&#xff09;&#xff0c;Javaee项目&#xff0c;springboot vue前后端分离项目。 项目介绍&#xff1a; 本文设计了一个基于Springbootvue的前后端分离的在线试题题库管理系统&#xff0c;采用M&…

PHP 数码公司运营管理系统mysql数据库web结构apache计算机软件工程网页wamp

一、源码特点 PHP 数码公司运营管理系统系统是一套完善的web设计系统&#xff0c;对理解php编程开发语言有帮助&#xff0c;系统具有完整的源代码和数据库&#xff0c;系统主要采用B/S模式开发。 php 数码公司运营管理系统 代码 https://download.csdn.net/download/qq_41…

基于 jasypt 实现spring boot 配置文件脱敏

前言 在项目构建过程中&#xff0c;保护敏感信息的安全性至关重要&#xff0c;为了提高系统的安全性能&#xff0c;我们采用了Jasypt来对配置文件中的敏感信息进行加密处理&#xff0c;以确保系统的机密信息不被轻易泄露。 步骤 添加Maven依赖 首先&#xff0c;我们需要添加…

【kubernetes】kubernetes中的StatefulSet使用

TOC 1 为什么需要StatefulSet 常规的应用通常使用Deployment&#xff0c;如果需要在所有机器上部署则使用DaemonSet&#xff0c;但是有这样一类应用&#xff0c;它们在运行时需要存储一些数据&#xff0c;并且当Pod在其它节点上重建时也希望这些数据能够在重建后的Pod上获取&…

buuctf-[Zer0pts2020]Can you guess it?

点击source&#xff0c;进入源代码 <?php include config.php; // FLAG is defined in config.phpif (preg_match(/config\.php\/*$/i, $_SERVER[PHP_SELF])) {exit("I dont know what you are thinking, but I wont let you read it :)"); }if (isset($_GET[so…

【算法学习】-【双指针】-【复写零】

LeetCode原题链接&#xff1a;1089. 复写零 下面是题目描述&#xff1a; 给你一个长度固定的整数数组 arr &#xff0c;请你将该数组中出现的每个零都复写一遍&#xff0c;并将其余的元素向右平移。 注意&#xff1a;请不要在超过该数组长度的位置写入元素。请对输入的数组 …

【浅记】分而治之

归并排序 算法流程&#xff1a; 将数组A[1,n]排序问题分解为A[1,n/2]和A[n/21,n]排序问题递归解决子问题得到两个有序的子数组将两个子数组合并为一个有序数组 符合分而治之的思想&#xff1a; 分解原问题解决子问题合并问题解 递归式求解 递归树法 用树的形式表示抽象递…

7.网络原理之TCP_IP(下)

文章目录 4.传输层重点协议4.1TCP协议4.1.1TCP协议段格式4.1.2TCP原理4.1.2.1确认应答机制 ACK&#xff08;安全机制&#xff09;4.1.2.2超时重传机制&#xff08;安全机制&#xff09;4.1.2.3连接管理机制&#xff08;安全机制&#xff09;4.1.2.4滑动窗口&#xff08;效率机制…

uni-app:实现元素在屏幕中的居中(绝对定位absolute)

一、实现水平居中 效果 代码 <template><view><view class"center">我需要居中</view></view> </template><style>.center {position: absolute;left:50%;transform: translateX(-50%);border:1px solid black;} </s…