orchestrator安装部署

通过 xiaobai

开 1 11 月, 2024

一、orchestrator简述

1、什么是orchestrator

Orchestrator是一款开源，对MySQL复制提供高可用、拓扑的可视化管理工具，采用go语言编写，它能够主动发现当前拓扑结构和主从复制状态，支持MySQL主从复制拓扑关系的调整、支持MySQL主库故障自动切换(failover)、手动主从切换(switchover)等功能。

Orchestrator后台依赖于MySQL或者SQLite存储元数据，能够提供Web界面展示MySQL集群的拓扑关系及实例状态，通过Web界面可更改MySQL实例的部分配置信息，同时也提供命令行和api接口，以便更加灵活的自动化运维管理。Orchestrator 对MySQL主库的故障切换分为自动切换和手动切换。手动切换又分为recover、force-master-failover、force-master-takeover以及graceful-master-takeover。

相比于MHA，Orchestrator更加偏重于复制拓扑关系的管理，能够实现MySQL任一复制拓扑关系的调整，并在此基础上，实现MySQL高可用。另外，Orchestrator自身也可以部署多个节点，通过raft分布式一致性协议，保证自身的高可用。

2、orchestrator的特点和作用

1.检测和审查复制集群,可以主动发现当前的拓扑结构和主从复制状态,读取基本的MySQL信息如复制状态和配置。 2.安全拓扑重构：转移服务于另外一台计算机的系统拓扑,orchestrator了解复制规则，可以解析binlog文件中的position、GTID、Pseudo GTID、binlog服务器，整洁的拓扑可视化，复制问题可视化，通过简单的拖拽修改拓扑。 3.故障修复,根据拓扑信息,可以识别各种故障，根据配置可以执行自动恢复。

3、原理图

二、Orchestrator搭建

1.环境准备

主机	hostname	备注
192.168.10.10	swarm01	主mysql（5.7.40）
192.168.10.20	swarm02	从mysql（5.7.40）
192.168.10.30	swarm03	从mysql（5.7.40）
192.168.10.40	swarm04	安装Orchestrator环境

2.在每台机器上配置hosts解析

192.168.10.10 swarm01
192.168.10.20 swarm02
192.168.10.30 swarm03
192.168.10.40 swarm04

3.每台主机做免密登录(每台机器都要做互相免密)

ssh-keygen -t rsa ##enter键执行3次
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm04
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm01  ##含义是对192.168.10.10登录做免密
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm02  ##含义是对192.168.10.20登录做免密
ssh-copy-id -i ~/.ssh/id_rsa.pub mysql03 ##含义是对192.168.10.30登录做免密

4、关闭防火墙，selinux

systemctl stop firewalld
systemctl disable firewalld
#永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config

三、搭建主从环境

1.安装mysql

使用脚本安装mysql（4台服务器都需要安装）

#!/bin/bash
read -p "请确保/data/mysql/ 该目录下没有文件，该服务器没有正在运行的mysql服务 (yes|no):" result
case $result in
yes)
echo "脚本继续运行"
;;
no)
echo "退出脚本"
exit
;;
*)
echo "请输入yes或no"
;;
esac

# 解决软件的依赖关系
yum install cmake ncurses-devel gcc gcc-c++ lsof bzip2 openssl-devel ncurses-compat-libs -y

# 解压mysql二进制安装包
tar xf mysql-5.7.40-el7-x86_64.tar.gz

# 移动mysql解压后的文件到/usr/local下改名叫mysql
mv mysql-5.7.40-el7-x86_64 /usr/local/mysql

# 新建组和用户mysql
groupadd mysql

# mysql这个用户的shell 是/bin/false 属于mysql组
useradd -r -g mysql -s /bin/false  mysql

# 关闭firewalld防火墙服务，并且设置开机不要启动
systemctl stop firewalld
systemctl disable firewalld

# 临时关闭selinux
setenforce 0

# 永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config

# 新建存放数据的目录
mkdir -p /data/mysql
# 修改/data/mysql目录的权限归mysql组所有，这样mysql用户可以对这个文件夹进行读写了
chown mysql:mysql /data/mysql
# 只是允许mysql这个用户和mysql组可以访问，其他人不能访问
chmod 750 /data/mysql

# 进入/usr/local/mysql/bin目录
cd /usr/local/mysql/bin

# 初始化mysql
./mysqld --initialize --user=mysql --basedir=/usr/local/mysql/ --datadir=/data/mysql &>passwd.txt

# 让mysql支持ssl方式登录的设置
./mysql_ssl_rsa_setup --datadir=/data/mysql/

# 获得临时密码
tem_passwd=$(cat passwd.txt | grep "temporary" | awk '{print $NF}')
        # $NF表示最后一个字段
        # abc=$(命令) 优先执行命令，然后将结果赋值给abc

# 修改PATH变量，加入mysql bin目录的路径
# 临时修改PATH变量的值
export PATH=/usr/local/mysql/bin/:$PATH
# 重新启动linux系统后也生效，永久修改
echo 'PATH=/usr/local/mysql/bin:$PATH' >>/root/.bashrc

# 复制support-files里的mysql.server文件到/etc/init.d目录下叫mysqld
cp ../support-files/mysql.server /etc/init.d/mysqld

# 修改/etc/init.d
sed -i '70c  datadir=/data/mysql' /etc/init.d/mysqld

# 生成/etc/my.cnf配置文件
cat >/etc/my.cnf <<EOF
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

[mysql]
auto-rehash
prompt=\\u@\\d \\R:\\m  mysql>
EOF

# 修改内核的openfile的数量
ulimit -n 1000000
# 设置开机启动的时候配置也生效
echo "ulimit -n 1000000" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local

# 启动mysqld进程
service mysqld start

# 将mysqld添加到linux系统里服务管理名单里
/sbin/chkconfig --add mysqld
# 设置mysql服务开机启动
/sbin/chkconfig mysqld on

# 初次修改密码需要使用 --connet-expired-password 选项
# -e 后面接的表示是在mysql里需要执行命令 execute 执行
# set password='Sanchuang123#'; 修改root用户的密码为Sanchaung123#
mysql -uroot -p$tem_passwd --connect-expired-password  -e "set password='123';"

# 检验上一步修改密码是否成功，如果有输出能看到mysql里的数据库，说明成功。
mysql -uroot -p'123' -e "show databases;"

#############出现以下结果即为安装成功##################

2.master、slave-1、slave-2上配置my.cnf

修改master的my.cnf文件

需要安装半同步插件（有些版本自带的有）：

master安装插件（rpl_semi_sync_master plugin）:
mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so';
Query OK, 0 rows affected (0.08 sec)
#重新设置
mysql> set global rpl_semi_sync_master_enabled = 0;
Query OK, 0 rows affected (0.00 sec)


salve安装插件（rpl_semi_sync_slave plugin：）: 
mysql> install plugin rpl_semi_sync_slave soname 'semisync_slave.so';
Query OK, 0 rows affected (0.02 sec)
#重新设置
mysql> set global rpl_semi_sync_slave_enabled = 0;
Query OK, 0 rows affected (0.00 sec)


#在主库实例和从库实例上，都安装两个半同步复制插件。
因为如果发生主从切换，从库会成为主库。
查看半同步复制相关的plugins：
mysql> SELECT PLUGIN_NAME FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE 'rpl_semi_sync_%';
+----------------------+
| PLUGIN_NAME          |
+----------------------+
| rpl_semi_sync_master |
| rpl_semi_sync_slave  |
+----------------------+
2 rows in set (0.00 sec)
 
mysql>

[root@master ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

log_bin=1
server_id=1
# 半同步,有些版本没有该插件，需要先安装插件,没有的先注释，安装完插件打开注释，重启服务
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=3000 # 3 second
expire_logs_days=7
# 开启GTID
gtid-mode=on
enforce-gtid-consistency=on
relay-log_purge=0
log_slave_updates=1

[mysql]
auto-rehash
prompt=\u@\d \R:\m  mysql>

修改slave1-1的my.cnf文件

[root@slave-1 ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

# binlog
log_bin=1
server_id=2
expire-logs-days=7
# 半同步,有些版本没有该插件，需要先安装插件,没有的先注释，安装完插件打开注释，重启服务
rpl_semi_sync_slave_enabled=1
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
log_slave_updates=on
relay_log_purge=0

[mysql]
auto-rehash
prompt=\u@\d \R:\m  mysql>

修改slave-2的my.cnf文件

[root@slave-2 ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

# log
log_bin=1
server_id=3
expire-logs-days=7
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
# 半同步,有些版本没有该插件，需要先安装插件,没有的先注释，安装完插件打开注释，重启服务
rpl_semi_sync_slave_enabled=1
relay_log_purge=0
log_slave_updates=1

[mysql]
auto-rehash
prompt=\u@\d \R:\m  mysql>

#配置完重启服务

service restart  mysqld

3.在master、slave-1、slave-2上分别做两个软链接

[root@swarm01 opt]# ln -s /usr/local/mysql/bin/mysql /usr/sbin
[root@swarm01 opt]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin
[root@swarm01 opt]# ll /usr/sbin/mysql
lrwxrwxrwx 1 root root 26 2月  29 09:59 /usr/sbin/mysql -> /usr/local/mysql/bin/mysql
[root@swarm01 opt]# ll /usr/sbin/mysqlbinlog
lrwxrwxrwx 1 root root 32 2月  29 10:00 /usr/sbin/mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog

4.重启3台mysql服务，查看是否重启成功

# slave-1、slave-2根master做同样操作
[root@swarm01 opt]# systemctl restart mysqld
[root@swarm01 opt]# netstat -anplut |grep mysqld
tcp6       0      0 :::3306                 :::*                    LISTEN      9680/mysqld

4.通过GTID配置主从复制

在所有的slave机器上导入基础数据，使得基础数据一致（这样操作容易出现从库报创建数据库报错，该步可以不操作）

#master导出备份文件
[root@swarm01 opt]# mysqldump --all-databases --set-gtid-purged=OFF -uroot -p123 >all_db.sql
mysqldump: [Warning] Using a password on the command line interface can be insecure.
# 正常警告，来自于密码暴露不安全
# master将备份文件复制到slave-1、2中的/data/mysql文件夹下去
[root@swarm01 opt]# scp /opt/all_db.sql swarm02:/data/mysql/
all_db.sql                                    100% 1062KB  54.3MB/s   00:00
[root@swarm01 opt]# scp /opt/all_db.sql swarm03:/data/mysql/
all_db.sql       

# slave上传备份文件，使数据与master一致
[root@swarm02 opt]# mysql -uroot -p123 </data/mysql/all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@swarm03 opt]# mysql -uroot -p123 </data/mysql/all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.

在master上创建授权用户

root@(none) 01:23  mysql>grant replication slave on *.* to 'xiaobai'@'192.168.10.%' identified by '123';
Query OK, 0 rows affected, 1 warning (0.02 sec)
# relication slave 授予slave复制的权限

5.开启git功能，启动主从复制服务（在slave服务器上都要操作）

# 首先停止slave服务器上MySQL服务的slave服务
stop slave;

# 在所有slave中执行此命令
CHANGE MASTER TO MASTER_HOST='192.168.10.10' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;

#开启mysqlroot远程登录命令
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;
# 重新启动slave服务器上MySQL服务的slave服务
start slave;

#查看主从服务复制是否成功
root@(none) 10:20  mysql>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.10
                  Master_User: xiaobai
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 1.000001
          Read_Master_Log_Pos: 185808
               Relay_Log_File: swarm02-relay-bin.000003
                Relay_Log_Pos: 186005
        Relay_Master_Log_File: 1.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 185808
              Relay_Log_Space: 186391
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: a8c1930e-d69e-11ee-9fe3-000c29877eaa
             Master_Info_File: /data/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: a8c1930e-d69e-11ee-9fe3-000c29877eaa:1-65
            Executed_Gtid_Set: a8c1930e-d69e-11ee-9fe3-000c29877eaa:1-65
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

最后一步在master创建好mha用户，并通过主从复制同步给其他的slave服务器

root@(none) 22:07  mysql> grant all privileges on *.* to mha@'192.168.10.%' identified by 'mha_chaoge';
Query OK, 0 rows affected (0.00 sec)

四、安装Orchestrator

1.安装jq工具

# 安装epel源
[root@orche ~]# yum install epel-release -y

# 安装jq工具
[root@orche ~]# yum install jq -y

2.安装orchestrator

下载地址：https://github.com/openark/orchestrator/releases

#不做集群不用安装client
[root@swarm04 ~]# rpm -ivh orchestrator-3.2.6-1.x86_64.rpm
#安装完查看
[root@swarm04 ~]# rpm -qa | grep orchestrator
orchestrator-3.2.6-1.x86_64
orchestrator-client-3.2.6-1.x86_64

在在192.168.10.40的mysql上面创建Orchestrator账户，用于Orchestrator存储高可用架构的信息。还需要创建一个数据库

CREATE DATABASE IF NOT EXISTS orchestrator;
create user 'orchestrator'@'%' identified by '123456';
GRANT ALL PRIVILEGES ON `orchestrator`.* TO 'orchestrator'@'%';

在192.168.10.10、192.168.10.20、192.168.10.30的mysql创建Orchestrator账户，用于Orchestrator，抓取各个mysql的信息，主从结构、切换mysql主从等。

create user 'orchestrator'@'%' identified by '123456';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';

####出现此报错正常，因为从库已经同步用户，只需要赋予权限就好######

3.orchestrator 配置文件配置

在192.168.10.40【orchestrator】服务器操作

复制一份配置文件orchestrator.conf.json

[root@swarm04 ~]# cd /usr/local/orchestrator/
[root@swarm04 orchestrator]# ls
orchestrator                   orchestrator-sample-sqlite.conf.json  orc_vip.sh
orchestrator.conf.json         orch_vip.sh                           resources
orchestrator-sample.conf.json  orc.log
[root@swarm04 orchestrator]# cp orchestrator-sample.conf.json orchestrator.conf.json

orchestrator.conf.json文件是一个json文件，主要配置以下参数

{
        "Debug": true,
        "EnableSyslog": false,
        "ListenAddress": ":3000",
        "MySQLTopologyUser": "orchestrator",
        "MySQLTopologyPassword": "123456",
        "MySQLTopologyCredentialsConfigFile": "",
        "MySQLTopologySSLPrivateKeyFile": "",
        "MySQLTopologySSLCertFile": "",
        "MySQLTopologySSLCAFile": "",
        "MySQLTopologySSLSkipVerify": true,
        "MySQLTopologyUseMutualTLS": false,
        "MySQLOrchestratorHost": "192.168.10.40",
        "MySQLOrchestratorPort": 3306,
        "MySQLOrchestratorDatabase": "orchestrator",
        "MySQLOrchestratorUser": "orchestrator",
        "MySQLOrchestratorPassword": "123456",
        "MySQLOrchestratorCredentialsConfigFile": "",
        "MySQLOrchestratorSSLPrivateKeyFile": "",
        "MySQLOrchestratorSSLCertFile": "",
        "MySQLOrchestratorSSLCAFile": "",
        "MySQLOrchestratorSSLSkipVerify": true,
        "MySQLOrchestratorUseMutualTLS": false,
        "MySQLConnectTimeoutSeconds": 1,
        "DefaultInstancePort": 3306,
        "DiscoverByShowSlaveHosts": true,
        "InstancePollSeconds": 5,
        "DiscoveryIgnoreReplicaHostnameFilters": [
                "a_host_i_want_to_ignore[.]example[.]com",
                ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
                "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
        ],
        "UnseenInstanceForgetHours": 240,
        "SnapshotTopologiesIntervalHours": 0,
        "InstanceBulkOperationsWaitTimeoutSeconds": 10,
        "HostnameResolveMethod": "default",
        "MySQLHostnameResolveMethod": "@@hostname",
        "SkipBinlogServerUnresolveCheck": true,
        "ExpiryHostnameResolvesMinutes": 60,
        "RejectHostnameResolvePattern": "",
        "ReasonableReplicationLagSeconds": 10,
        "ProblemIgnoreHostnameFilters": [],
        "VerifyReplicationFilters": false,
        "ReasonableMaintenanceReplicationLagSeconds": 20,
        "CandidateInstanceExpireMinutes": 60,
        "AuditLogFile": "",
        "AuditToSyslog": false,
        "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
        "ReadOnly": false,
        "AuthenticationMethod": "",
        "HTTPAuthUser": "",
        "HTTPAuthPassword": "",
        "AuthUserHeader": "",
        "PowerAuthUsers": [
                "*"
        ],
        "ClusterNameToAlias": {
                "127.0.0.1": "test suite"
        },
        "ReplicationLagQuery": "",
        "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
        "DetectClusterDomainQuery": "",
        "DetectInstanceAliasQuery": "",
        "DetectPromotionRuleQuery": "",
        "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
        "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
        "PromotionIgnoreHostnameFilters": [],
        "DetectSemiSyncEnforcedQuery": "",
        "ServeAgentsHttp": false,
        "AgentsServerPort": ":3001",
        "AgentsUseSSL": false,
        "AgentsUseMutualTLS": false,
        "AgentSSLSkipVerify": false,
        "AgentSSLPrivateKeyFile": "",
        "AgentSSLCertFile": "",
        "AgentSSLCAFile": "",
        "AgentSSLValidOUs": [],
        "UseSSL": false,
        "UseMutualTLS": false,
        "SSLSkipVerify": false,
        "SSLPrivateKeyFile": "",
        "SSLCertFile": "",
        "SSLCAFile": "",
        "SSLValidOUs": [],
        "URLPrefix": "",
        "StatusEndpoint": "/api/status",
        "StatusSimpleHealth": true,
        "StatusOUVerify": false,
        "AgentPollMinutes": 60,
        "UnseenAgentForgetHours": 6,
        "StaleSeedFailMinutes": 60,
        "SeedAcceptableBytesDiff": 8192,
        "PseudoGTIDPattern": "",
        "PseudoGTIDPatternIsFixedSubstring": false,
        "PseudoGTIDMonotonicHint": "asc:",
        "DetectPseudoGTIDQuery": "",
        "BinlogEventsChunkSize": 10000,
        "SkipBinlogEventsContaining": [],
        "ReduceReplicationAnalysisCount": true,
        "FailureDetectionPeriodBlockMinutes": 1,
        "FailMasterPromotionOnLagMinutes": 0,
        "RecoveryPeriodBlockSeconds": 30,
        "RecoveryIgnoreHostnameFilters": [],
        "RecoverMasterClusterFilters": [
                "*"
        ],
        "RecoverIntermediateMasterClusterFilters": [
                "*"
        ],
        "OnFailureDetectionProcesses": [
                "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
        ],
        "PreGracefulTakeoverProcesses": [
                "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
        ],
        "PreFailoverProcesses": [
                "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
        ],
        "PostFailoverProcesses": [
                "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}{failureClusterAlias}' >> /tmp/recovery.log", "/home/orch/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
        ],
        "PostUnsuccessfulFailoverProcesses": [],
        "PostMasterFailoverProcesses": [
                "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
        ],
        "PostIntermediateMasterFailoverProcesses": [
                "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
        ],
        "PostGracefulTakeoverProcesses": [
                "echo 'Planned takeover complete' >> /tmp/recovery.log"
        ],
        "CoMasterRecoveryMustPromoteOtherCoMaster": true,
        "DetachLostSlavesAfterMasterFailover": true,
        "ApplyMySQLPromotionAfterMasterFailover": true,
        "PreventCrossDataCenterMasterFailover": false,
        "PreventCrossRegionMasterFailover": false,
        "MasterFailoverDetachReplicaMasterHost": false,
        "MasterFailoverLostInstancesDowntimeMinutes": 0,
        "PostponeReplicaRecoveryOnLagMinutes": 0,
        "OSCIgnoreHostnameFilters": [],
        "GraphiteAddr": "",
        "GraphitePath": "",
        "GraphiteConvertHostnameDotsToUnderscores": true,
        "ConsulAddress": "",
        "ConsulAclToken": "",
        "ConsulKVStoreProvider": "consul"
}

配置修改：

MySQLTopologyUser 和 MySQLTopologyPassword

表示的是192.168.10.10、192.168.10.20、192.168.10.30上面的mysql账户

"MySQLTopologyUser": "orchestrator",
        "MySQLTopologyPassword": "123456",
        "MySQLTopologyCredentialsConfigFile": "",
        "MySQLTopologySSLPrivateKeyFile": "",
        "MySQLTopologySSLCertFile": "",
        "MySQLTopologySSLCAFile": "",
        "MySQLTopologySSLSkipVerify": true,
        "MySQLTopologyUseMutualTLS": false,
        "MySQLOrchestratorHost": "192.168.10.40",
        "MySQLOrchestratorPort": 3306,
        "MySQLOrchestratorDatabase": "orchestrator",
        "MySQLOrchestratorUser": "orchestrator",
        "MySQLOrchestratorPassword": "123456",

上面的参数对应的是192.168.10.40对应的存储Orchestrator信息的数据库，这里一定要理解清楚，很容易搞混淆。

RecoverMasterClusterFilters RecoverIntermediateMasterClusterFilters

两个参数建议都配置 * ，不然后面多次切换主从的时候，第二次不会成功

RecoveryPeriodBlockSeconds 3600 ##已经出故障后切换 3600s 后才能切换下一次 FailureDetectionPeriodBlockMinutes 1 在该时间内再次出现故障，不会被多次发现

如果在测试的环境把这两个值调小一点，不然多次切换主从会有问题

4.启动orchestrator

在/usr/local/orchestrator 目录下执行

nohup ./orchestrator --debug http > orc.log 2>&1 &

进入orchestrator web页面

http://192.168.10.40:3000/

点击 Submit

然后点击dashboard

#到此部署完成

五、节点恢复

1.停掉master节点mysql

[root@swarm01 opt]# systemctl stop mysqld

swarm02 变为主的了

2.将原master加入集群

[root@swarm01 opt]# systemctl start mysqld
[root@swarm01 opt]# mysql -uroot -p123

###############################################
CHANGE MASTER TO MASTER_HOST='192.168.10.20' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;
################################################
root@(none) 11:49  mysql>stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

root@(none) 11:49  mysql>CHANGE MASTER TO MASTER_HOST='192.168.10.20' ,
    -> MASTER_USER='xiaobai',
    -> MASTER_PASSWORD='123',
    -> MASTER_PORT=3306,
    -> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

root@(none) 11:49  mysql>start slave;
Query OK, 0 rows affected (0.00 sec)

六、主从切换

1.vip怎么漂移

Orchestrator中配置钩子，在orchestrator.conf.json配置文件中，添加

逻辑就是：在PostFailoverProcesses配置中添加一个shell脚本，orchestrator主从切换成功以后就会执行这个脚本。上面也可以配置不同的钩子，在orchestrator不同状态下执行对于的脚本。这里只讲vip相关的。

“/home/orch/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log”

有问题可以看上面的配置文件，里面直接复制所以我们的shell脚本就在 /home/orch/orch_hook.sh 中

#!/bin/bash

isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="123456"

logfile="/var/log/orch_hook.log"

# list of clusternames
#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)
#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

        array=( ens33 "192.168.10.200" root "192.168.10.40")
        interface=${array[0]}
        IP=${array[1]}
        user=${array[2]}

        if [ ! -z ${IP} ] ; then

                echo $(date)
                echo "Revocering from: $isitdead"
                echo "New master is: $newmaster"
                echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
                /usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster
                #mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql
        else

                echo "Cluster does not exist!" | tee $logfile

        fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

        array=( ens33 "192.168.10.200" root "192.168.10.40")
        interface=${array[0]}
        IP=${array[1]}
        user=${array[2]}
        slavehost=`echo $5 | cut -d":" -f1`

        echo $(date)
        echo "Revocering from: $isitdead"
        echo "New intermediate master is: $slavehost"
        echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
        /usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

        array=( ens33 "192.168.10.200" root "192.168.10.40")
        interface=${array[0]}
        IP=${array[3]}
        user=${array[2]}
        slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
        showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
        newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

        echo $(date)
        echo "Revocering from: $isitdead"
        echo "New intermediate master is: $newintermediatemaster"
        echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
        /usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

fi

上面的脚本需要修改的地方 array=( ens33 “192.168.10.200” root “192.168.10.40”)

ens33表示网卡，填自己的电脑上面的

192.168.10.200 表示vip地址

192.168.10.40 表示orchestrator安装的ip

root 表示免登录的linux账户

2.飘移脚本

/usr/local/orchestrator/orch_vip.sh 漂移脚本

#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]

 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then
    echo "No options were passed";
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!"
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available
    if vip_status; then
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!"
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi

上面两个文件记得权限和可执行的问题。

3.在mysql主节点添加vip

[root@swarm04 orchestrator]# ip addr add dev ens33 192.168.10.200/32
删除网卡
ip addr del 192.168.10.200/32 dev ens33

4.测试vip飘移：

停掉主节点的mysql，看192.168.10.200 vip是否飘移；（效果为msyql 主节点vip飘移到其他节点，可在其他节点查看），也可通过orchestrator观察；

5.也可部署wordpress 私人博客测试

在主节点创建wordpress 库

root@(none) 16:46  mysql>create database wordpress;
Query OK, 1 row affected (0.00 sec)
####使root用户可远程登录
root@(none) 16:46  mysql>GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;

部署wordpress
[root@swarm01 opt]# docker run -d --name wordpress -p 80:80  wordpress

数据库地址填写vip地址

安装完

然后将master节点关闭，测试服务是否正常，从节点恢复是否正常(与之前从节点恢复一样)；

从节点加入主节点（MASTER_HOST=’主节点IP’）：

CHANGE MASTER TO MASTER_HOST='192.168.10.10' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;

类别：

学习笔记

orchestrator安装部署