1,上传解压
1,开始上传安装包到你虚拟机上放置安装包的文件夹
2,开始解压 ,配置环境变量
1、上传 /opt/modules
2、解压 tar -zxvf datax.tar.gz -C /opt/installs
3、修改 vi /etc/profile
配置环境变量:
export DATAX_HOME=/opt/installs/datax
export PATH=$PATH:$DATAX_HOME/bin4,刷新环境变量
source /etc/profile
datax 就安装好了
2,我们开始使用一下
1,MySQLReader 案例
datax其实就是写json的
1,切换盘符到你的datax cd /opt/installs/datax/job
2,创建json文件 开始写json
3,我这个文件名字是mysql2stream.json
2,json的编写
"username": "root",
"password": "123456",这个是你连接数据库时的账户 和密码
"writer": {
"name": "streamwriter",
"parameter": {
"print": true,
"encoding": "UTF-8"这段的意思是写到控制台上
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "123456",
"connection": [
{
"querySql": [
"select * from emp where empno < 7788;" 编写的sql 语句
],
"jdbcUrl": [
"jdbc:mysql://bigdata01:3306/sqoop" 这个是连接你的数据库 sqoop是数据库的名子
]
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true,
"encoding": "UTF-8"
}
}
}
]
}
}
3,运行一下
datax.py mysql2stream.json
3,datax的高级部分
1、数据从hive导出到mysql
1,先在hive上查看准备导出的表有无数据
select * from par3
2,在mysql 创建一个用于接收数据的表
目标:从par3中导出数据到mysql的user表。
3,开始写json 还是在你的datax/job 文件夹里创建json文件
{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "/user/hive/warehouse/yhdb.db/par3/*",
"defaultFS": "hdfs://bigdata01:9820",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "long"
}
],
"fileType": "text",
"encoding": "UTF-8",
"fieldDelimiter": ","
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "123456",
"column": [
"id",
"age"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://bigdata01:3306/sqoop",
"table": [
"par3"
]
}
]
}
}
}
]
}
}
在咱们的datax中没hiveReader,但是有hdfsreader,所以本质上就是hdfs导出到mysql
"path": "/user/hive/warehouse/yhdb.db/par3/*", 这个是你hdfs文件的路径