一、数据库环境
#mysql环境
root@192.168.150.162 20:28: [(none)]> select version();
±----------+
| version() |
±----------+
| 8.0.26 |
±----------+
1 row in set (0.00 sec)
root@192.168.150.162 20:28: [(none)]> show variables like ‘%char%’;
±-------------------------±----------------------------------+
| Variable_name | Value |
±-------------------------±----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/local/mysql8/share/charsets/ |
±-------------------------±----------------------------------+
8 rows in set (0.00 sec)
#当前mysql环境下的表
#oceanbase环境
obclient [test]> select version();
±-----------------------------+
| version() |
±-----------------------------+
| 5.7.25-OceanBase_CE-v4.2.1.2 |
±-----------------------------+
1 row in set (0.002 sec)
obclient [test]> show variables like ‘%chara%’;
±-------------------------±--------+
| Variable_name | Value |
±-------------------------±--------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb4 |
±-------------------------±--------+
7 rows in set (0.005 sec)
确认mysql与oceanbase的字符集一样
二、mysqldump迁移数据到OceanBase
通过MySQL下的mysqldump将数据导出为SQL文本格式,将数据备份文件传输到OceanBase数据库主机后,通过source命令导入到OceanBase数据库。
#当前mysql下的表
MySQL [(none)]> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MySQL [test]> show tables;
±------------------+
| Tables_in_test |
±------------------+
| cluster_test |
| cluster_test1 |
| cluster_test2 |
| cluster_test3 |
| cluster_test4 |
| t1 |
| t2 |
| t8 |
| t_smallint |
| test_clustered |
| test_nonclustered |
±------------------+
11 rows in set (0.00 sec)
# 通过mysqldump导出数据
mysqldump -h 192.168.150.162 -uroot -P4000 -p --database test > test_oceanbase.sql
#传输脚本到oceanbase服务器
scp test_oceanbase.sql 192.168.150.116:/home/admin
#oceanbase导入
obclient [test]> source test_oceanbase.sql
obclient [test]> show tables;
±------------------+
| Tables_in_test |
±------------------+
| cluster_test |
| cluster_test1 |
| cluster_test2 |
| cluster_test3 |
| cluster_test4 |
| t1 |
| t2 |
| t8 |
| t_smallint |
| test_clustered |
| test_nonclustered |
±------------------+
11 rows in set (0.004 sec)
#抽查表和数据已经导入
obclient [test]> select * from big
-> ;
±---------------------±--------------------+
| id | id1 |
±---------------------±--------------------+
| 18446744073709551615 | 9223372036854775807 |
±---------------------±--------------------+
1 row in set (0.003 sec)
三、通过datax从MySQL离线导入数据到OceanBase
#datax部署安装
datax 下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz
1、直接服务器上下载datax
wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz
2、解压datax
[admin@localhost ~]$ tar zxvf datax.tar.gz
3、安装java
yum install java
4、测试datax是否安装成功
[admin@localhost bin]$ python datax.py …/job/job.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright © 2010-2017, Alibaba Group. All Rights Reserved.
2023-12-21 23:22:50.245 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2023-12-21 23:22:50.248 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id=“GMT+08:00”,offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2023-12-21 23:22:50.307 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-12-21 23:22:50.312 [main] INFO Engine - the machine info =>
osInfo: Red Hat, Inc. 1.8 25.392-b08
jvmInfo: Linux amd64 3.10.0-1160.el7.x86_64
cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2023-12-21 23:22:50.329 [main] INFO Engine -
{
“content”:[
{
“reader”:{
“name”:“streamreader”,
“parameter”:{
“column”:[
{
“type”:“string”,
“value”:“DataX”
},
{
“type”:“long”,
“value”:19890604
},
{
“type”:“date”,
“value”:“1989-06-04 00:00:00”
},
{
“type”:“bool”,
“value”:true
},
{
“type”:“bytes”,
“value”:“test”
}
],
“sliceRecordCount”:100000
}
},
“writer”:{
“name”:“streamwriter”,
“parameter”:{
“encoding”:“UTF-8”,
“print”:false
}
}
}
],
“setting”:{
“errorLimit”:{
“percentage”:0.02,
“record”:0
},
“speed”:{
“channel”:1
}
}
}
2023-12-21 23:22:50.348 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-12-21 23:22:50.361 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-12-21 23:22:50.361 [main] INFO JobContainer - DataX jobContainer starts job.
2023-12-21 23:22:50.365 [main] INFO JobContainer - Set jobId = 0
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do prepare …
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do split …
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-12-21 23:22:50.421 [job-0] INFO JobContainer - jobContainer starts to do schedule …
2023-12-21 23:22:50.426 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2023-12-21 23:22:50.429 [job-0] INFO JobContainer - Running by standalone Mode.
2023-12-21 23:22:50.438 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2023-12-21 23:22:50.468 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-12-21 23:22:50.789 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[335]ms
2023-12-21 23:22:50.790 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it’s tasks.
2023-12-21 23:23:00.452 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.046s | All Task WaitReaderTime 0.056s | Percentage 100.00%
2023-12-21 23:23:00.453 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2023-12-21 23:23:00.454 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /home/admin/datax/hook
2023-12-21 23:23:00.455 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2023-12-21 23:23:00.455 [job-0] INFO JobContainer - PerfTrace not enable!
2023-12-21 23:23:00.456 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.046s | All Task WaitReaderTime 0.056s | Percentage 100.00%
2023-12-21 23:23:00.457 [job-0] INFO JobContainer -
任务启动时刻 : 2023-12-21 23:22:50
任务结束时刻 : 2023-12-21 23:23:00
任务总计耗时 : 10s
任务平均流量 : 253.91KB/s
记录写入速度 : 10000rec/s
读出记录总数 : 100000
读写失败总数 : 0
5、创建datax-job的json
{
"job": {
"entry": {
"jvm": "-Xms1024m -Xmx1024m"
},
"setting": {
"speed": {
"channel": 4
},
"errorLimit": {
"record": 0,
"percentage": 0.1
}
},
"content": [{
"reader": {
"name": “mysqlreader”,
"parameter": {
"username": “root”,
"password": “oracle123”,
"column": [
"*"
],
"connection": [{
"table": [
"Tab_A"
],
"jdbcUrl": [“jdbc:mysql://192.168.150.162:4000/test?useUnicode=true&characterEncoding=utf8&useSSL=false”]
}]
}
},
"writer": {
"name": “oceanbasev10writer”,
"parameter": {
"obWriteMode": “insert”,
"column": [
"*"
],
"preSql": [
"truncate table Tab_A"
],
"connection": [{
"jdbcUrl": “||_dsc_ob10_dsc_||obdemo:obmysql||_dsc_ob10_dsc_||jdbc:oceanbase://192.168.150.116:2883/test?useLocalSessionState=true&allowBatch=true&allowMultiQueries=true&rewriteBatchedStatements=true”,
"table": [
"Tab_A"
]
}],
"username": “root”,
"password": “oracle123”,
"writerThreadCount": 10,
"batchSize": 1000,
"memstoreThreshold": "0.9"
}
}
}]
}
}
6、执行离线数据同步
源端数据:
MySQL [test]> select * from Tab_A;
±—±-----±-----±-----±-----±-----±-------+
| id | bid | cid | name | type | num | amt |
±—±-----±-----±-----±-----±-----±-------+
| 1 | 1 | 1 | A01 | 01 | 111 | 111.00 |
| 2 | 2 | 2 | A01 | 01 | 112 | 111.00 |
| 3 | 3 | 3 | A02 | 02 | 113 | 111.00 |
| 4 | 4 | 4 | A02 | 02 | 112 | 111.00 |
| 5 | 5 | 5 | A01 | 01 | 111 | 111.00 |
| 6 | 6 | 6 | A02 | 02 | 113 | 111.00 |
| 7 | 5 | 7 | A01 | 01 | 111 | 88.00 |
| 8 | 6 | 8 | A02 | 02 | 113 | 88.00 |
±—±-----±-----±-----±-----±-----±-------+
8 rows in set (0.26 sec)
目标数据:
obclient [test]> select * from Tab_A;
Empty set (0.133 sec)
执行同步:
python ./datax.py …/job/mysql2ob.json
2023-12-22 00:42:13.745 [job-0] INFO JobContainer - PerfTrace not enable!
2023-12-22 00:42:13.745 [job-0] INFO StandAloneJobContainerCommunicator - Total 8 records, 134 bytes | Speed 13B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.020s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-12-22 00:42:13.747 [job-0] INFO JobContainer -
任务启动时刻 : 2023-12-22 00:41:54
任务结束时刻 : 2023-12-22 00:42:13
任务总计耗时 : 19s
任务平均流量 : 13B/s
记录写入速度 : 0rec/s
读出记录总数 : 8
读写失败总数 : 0
7、检查数据:
obclient [test]> select * from Tab_A;
±—±-----±-----±-----±-----±-----±-------+
| id | bid | cid | name | type | num | amt |
±—±-----±-----±-----±-----±-----±-------+
| 1 | 1 | 1 | A01 | 01 | 111 | 111.00 |
| 2 | 2 | 2 | A01 | 01 | 112 | 111.00 |
| 3 | 3 | 3 | A02 | 02 | 113 | 111.00 |
| 4 | 4 | 4 | A02 | 02 | 112 | 111.00 |
| 5 | 5 | 5 | A01 | 01 | 111 | 111.00 |
| 6 | 6 | 6 | A02 | 02 | 113 | 111.00 |
| 7 | 5 | 7 | A01 | 01 | 111 | 88.00 |
| 8 | 6 | 8 | A02 | 02 | 113 | 88.00 |
±—±-----±-----±-----±-----±-----±-------+
8 rows in set (0.002 sec)