orchestrator安装部署

orchestrator安装部署

一、orchestrator简述

1、什么是orchestrator

Orchestrator是一款开源,对MySQL复制提供高可用、拓扑的可视化管理工具,采用go语言编写,它能够主动发现当前拓扑结构和主从复制状态,支持MySQL主从复制拓扑关系的调整、支持MySQL主库故障自动切换(failover)、手动主从切换(switchover)等功能。

Orchestrator后台依赖于MySQL或者SQLite存储元数据,能够提供Web界面展示MySQL集群的拓扑关系及实例状态,通过Web界面可更改MySQL实例的部分配置信息,同时也提供命令行和api接口,以便更加灵活的自动化运维管理。Orchestrator 对MySQL主库的故障切换分为自动切换和手动切换。手动切换又分为recover、force-master-failover、force-master-takeover以及graceful-master-takeover。

相比于MHA,Orchestrator更加偏重于复制拓扑关系的管理,能够实现MySQL任一复制拓扑关系的调整,并在此基础上,实现MySQL高可用。另外,Orchestrator自身也可以部署多个节点,通过raft分布式一致性协议,保证自身的高可用。

2、orchestrator的特点和作用

1.检测和审查复制集群,可以主动发现当前的拓扑结构和主从复制状态,读取基本的MySQL信息如复制状态和配置。 2.安全拓扑重构:转移服务于另外一台计算机的系统拓扑,orchestrator了解复制规则,可以解析binlog文件中的position、GTID、Pseudo GTID、binlog服务器 ,整洁的拓扑可视化 ,复制问题可视化 ,通过简单的拖拽修改拓扑。 3.故障修复,根据拓扑信息,可以识别各种故障,根据配置可以执行自动恢复。

3、原理图

二、Orchestrator搭建

1.环境准备

主机hostname备注
192.168.10.10swarm01主mysql(5.7.40)
192.168.10.20swarm02从mysql(5.7.40)
192.168.10.30swarm03从mysql(5.7.40)
192.168.10.40swarm04安装Orchestrator环境

2.在每台机器上配置hosts解析

192.168.10.10 swarm01
192.168.10.20 swarm02
192.168.10.30 swarm03
192.168.10.40 swarm04

3.每台主机做免密登录(每台机器都要做互相免密)

ssh-keygen -t rsa ##enter键执行3次
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm04
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm01  ##含义是对192.168.10.10登录做免密
ssh-copy-id -i ~/.ssh/id_rsa.pub swarm02  ##含义是对192.168.10.20登录做免密
ssh-copy-id -i ~/.ssh/id_rsa.pub mysql03 ##含义是对192.168.10.30登录做免密

4、关闭防火墙,selinux

systemctl stop firewalld
systemctl disable firewalld
#永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config

三、搭建主从环境

1.安装mysql

使用脚本安装mysql(4台服务器都需要安装)

#!/bin/bash
read -p "请确保/data/mysql/ 该目录下没有文件,该服务器没有正在运行的mysql服务 (yes|no):" result
case $result in
yes)
echo "脚本继续运行"
;;
no)
echo "退出脚本"
exit
;;
*)
echo "请输入yes或no"
;;
esac

# 解决软件的依赖关系
yum install cmake ncurses-devel gcc gcc-c++ lsof bzip2 openssl-devel ncurses-compat-libs -y

# 解压mysql二进制安装包
tar xf mysql-5.7.40-el7-x86_64.tar.gz

# 移动mysql解压后的文件到/usr/local下改名叫mysql
mv mysql-5.7.40-el7-x86_64 /usr/local/mysql

# 新建组和用户mysql
groupadd mysql

# mysql这个用户的shell 是/bin/false 属于mysql组
useradd -r -g mysql -s /bin/false mysql

# 关闭firewalld防火墙服务,并且设置开机不要启动
systemctl stop firewalld
systemctl disable firewalld

# 临时关闭selinux
setenforce 0

# 永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config

# 新建存放数据的目录
mkdir -p /data/mysql
# 修改/data/mysql目录的权限归mysql组所有,这样mysql用户可以对这个文件夹进行读写了
chown mysql:mysql /data/mysql
# 只是允许mysql这个用户和mysql组可以访问,其他人不能访问
chmod 750 /data/mysql

# 进入/usr/local/mysql/bin目录
cd /usr/local/mysql/bin

# 初始化mysql
./mysqld --initialize --user=mysql --basedir=/usr/local/mysql/ --datadir=/data/mysql &>passwd.txt

# 让mysql支持ssl方式登录的设置
./mysql_ssl_rsa_setup --datadir=/data/mysql/

# 获得临时密码
tem_passwd=$(cat passwd.txt | grep "temporary" | awk '{print $NF}')
       # $NF表示最后一个字段
       # abc=$(命令) 优先执行命令,然后将结果赋值给abc

# 修改PATH变量,加入mysql bin目录的路径
# 临时修改PATH变量的值
export PATH=/usr/local/mysql/bin/:$PATH
# 重新启动linux系统后也生效,永久修改
echo 'PATH=/usr/local/mysql/bin:$PATH' >>/root/.bashrc

# 复制support-files里的mysql.server文件到/etc/init.d目录下叫mysqld
cp ../support-files/mysql.server /etc/init.d/mysqld

# 修改/etc/init.d
sed -i '70c datadir=/data/mysql' /etc/init.d/mysqld

# 生成/etc/my.cnf配置文件
cat >/etc/my.cnf <<EOF
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

[mysql]
auto-rehash
prompt=\\u@\\d \\R:\\m mysql>
EOF

# 修改内核的openfile的数量
ulimit -n 1000000
# 设置开机启动的时候配置也生效
echo "ulimit -n 1000000" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local

# 启动mysqld进程
service mysqld start

# 将mysqld添加到linux系统里服务管理名单里
/sbin/chkconfig --add mysqld
# 设置mysql服务开机启动
/sbin/chkconfig mysqld on

# 初次修改密码需要使用 --connet-expired-password 选项
# -e 后面接的表示是在mysql里需要执行命令 execute 执行
# set password='Sanchuang123#'; 修改root用户的密码为Sanchaung123#
mysql -uroot -p$tem_passwd --connect-expired-password  -e "set password='123';"

# 检验上一步修改密码是否成功,如果有输出能看到mysql里的数据库,说明成功。
mysql -uroot -p'123' -e "show databases;"

#############出现以下结果即为安装成功##################

2.master、slave-1、slave-2上配置my.cnf

修改master的my.cnf文件

需要安装半同步插件(有些版本自带的有):

master安装插件(rpl_semi_sync_master plugin):
mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so';
Query OK, 0 rows affected (0.08 sec)
#重新设置
mysql> set global rpl_semi_sync_master_enabled = 0;
Query OK, 0 rows affected (0.00 sec)


salve安装插件(rpl_semi_sync_slave plugin:):
mysql> install plugin rpl_semi_sync_slave soname 'semisync_slave.so';
Query OK, 0 rows affected (0.02 sec)
#重新设置
mysql> set global rpl_semi_sync_slave_enabled = 0;
Query OK, 0 rows affected (0.00 sec)


#在主库实例和从库实例上,都安装两个半同步复制插件。
因为如果发生主从切换,从库会成为主库。
查看半同步复制相关的plugins:
mysql> SELECT PLUGIN_NAME FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE 'rpl_semi_sync_%';
+----------------------+
| PLUGIN_NAME         |
+----------------------+
| rpl_semi_sync_master |
| rpl_semi_sync_slave |
+----------------------+
2 rows in set (0.00 sec)

mysql>
[root@master ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

log_bin=1
server_id=1
# 半同步,有些版本没有该插件,需要先安装插件,没有的先注释,安装完插件打开注释,重启服务
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=3000 # 3 second
expire_logs_days=7
# 开启GTID
gtid-mode=on
enforce-gtid-consistency=on
relay-log_purge=0
log_slave_updates=1

[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>

修改slave1-1的my.cnf文件

[root@slave-1 ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

# binlog
log_bin=1
server_id=2
expire-logs-days=7
# 半同步,有些版本没有该插件,需要先安装插件,没有的先注释,安装完插件打开注释,重启服务
rpl_semi_sync_slave_enabled=1
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
log_slave_updates=on
relay_log_purge=0

[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>

修改slave-2的my.cnf文件

[root@slave-2 ~]# cat /etc/my.cnf
[mysqld_safe]

[client]
socket=/data/mysql/mysql.sock

[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8

# log
log_bin=1
server_id=3
expire-logs-days=7
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
# 半同步,有些版本没有该插件,需要先安装插件,没有的先注释,安装完插件打开注释,重启服务
rpl_semi_sync_slave_enabled=1
relay_log_purge=0
log_slave_updates=1

[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>

#配置完重启服务

service restart  mysqld

3.在master、slave-1、slave-2上分别做两个软链接

[root@swarm01 opt]# ln -s /usr/local/mysql/bin/mysql /usr/sbin
[root@swarm01 opt]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin
[root@swarm01 opt]# ll /usr/sbin/mysql
lrwxrwxrwx 1 root root 26 2月  29 09:59 /usr/sbin/mysql -> /usr/local/mysql/bin/mysql
[root@swarm01 opt]# ll /usr/sbin/mysqlbinlog
lrwxrwxrwx 1 root root 32 2月  29 10:00 /usr/sbin/mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog

4.重启3台mysql服务,查看是否重启成功

# slave-1、slave-2根master做同样操作
[root@swarm01 opt]# systemctl restart mysqld
[root@swarm01 opt]# netstat -anplut |grep mysqld
tcp6       0      0 :::3306                 :::*                   LISTEN      9680/mysqld

4.通过GTID配置主从复制

在所有的slave机器上导入基础数据,使得基础数据一致(这样操作容易出现从库报创建数据库报错,该步可以不操作)

#master导出备份文件
[root@swarm01 opt]# mysqldump --all-databases --set-gtid-purged=OFF -uroot -p123 >all_db.sql
mysqldump: [Warning] Using a password on the command line interface can be insecure.
# 正常警告,来自于密码暴露不安全
# master将备份文件复制到slave-1、2中的/data/mysql文件夹下去
[root@swarm01 opt]# scp /opt/all_db.sql swarm02:/data/mysql/
all_db.sql                                    100% 1062KB  54.3MB/s   00:00
[root@swarm01 opt]# scp /opt/all_db.sql swarm03:/data/mysql/
all_db.sql      

# slave上传备份文件,使数据与master一致
[root@swarm02 opt]# mysql -uroot -p123 </data/mysql/all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@swarm03 opt]# mysql -uroot -p123 </data/mysql/all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.

在master上创建授权用户

root@(none) 01:23  mysql>grant replication slave on *.* to 'xiaobai'@'192.168.10.%' identified by '123';
Query OK, 0 rows affected, 1 warning (0.02 sec)
# relication slave 授予slave复制的权限

5.开启git功能,启动主从复制服务(在slave服务器上都要操作)

# 首先停止slave服务器上MySQL服务的slave服务
stop slave;

# 在所有slave中执行此命令
CHANGE MASTER TO MASTER_HOST='192.168.10.10' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;

#开启mysqlroot远程登录命令
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;
# 重新启动slave服务器上MySQL服务的slave服务
start slave;

#查看主从服务复制是否成功
root@(none) 10:20 mysql>show slave status\G;
*************************** 1. row ***************************
              Slave_IO_State: Waiting for master to send event
                Master_Host: 192.168.10.10
                Master_User: xiaobai
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: 1.000001
        Read_Master_Log_Pos: 185808
              Relay_Log_File: swarm02-relay-bin.000003
              Relay_Log_Pos: 186005
      Relay_Master_Log_File: 1.000001
            Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
          Replicate_Do_Table:
      Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
        Exec_Master_Log_Pos: 185808
            Relay_Log_Space: 186391
            Until_Condition: None
              Until_Log_File:
              Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
              Master_SSL_Key:
      Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
              Last_IO_Errno: 0
              Last_IO_Error:
              Last_SQL_Errno: 0
              Last_SQL_Error:
Replicate_Ignore_Server_Ids:
            Master_Server_Id: 1
                Master_UUID: a8c1930e-d69e-11ee-9fe3-000c29877eaa
            Master_Info_File: /data/mysql/master.info
                  SQL_Delay: 0
        SQL_Remaining_Delay: NULL
    Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
          Master_Retry_Count: 86400
                Master_Bind:
    Last_IO_Error_Timestamp:
    Last_SQL_Error_Timestamp:
              Master_SSL_Crl:
          Master_SSL_Crlpath:
          Retrieved_Gtid_Set: a8c1930e-d69e-11ee-9fe3-000c29877eaa:1-65
          Executed_Gtid_Set: a8c1930e-d69e-11ee-9fe3-000c29877eaa:1-65
              Auto_Position: 1
        Replicate_Rewrite_DB:
                Channel_Name:
          Master_TLS_Version:
1 row in set (0.00 sec)

最后一步在master创建好mha用户,并通过主从复制同步给其他的slave服务器

root@(none) 22:07  mysql> grant all privileges on *.* to mha@'192.168.10.%' identified by 'mha_chaoge';
Query OK, 0 rows affected (0.00 sec)

四、安装Orchestrator

1.安装jq工具

# 安装epel源
[root@orche ~]# yum install epel-release -y

# 安装jq工具
[root@orche ~]# yum install jq -y

2.安装orchestrator

下载地址:https://github.com/openark/orchestrator/releases

#不做集群不用安装client
[root@swarm04 ~]# rpm -ivh orchestrator-3.2.6-1.x86_64.rpm
#安装完查看
[root@swarm04 ~]# rpm -qa | grep orchestrator
orchestrator-3.2.6-1.x86_64
orchestrator-client-3.2.6-1.x86_64

在在192.168.10.40的mysql上面创建Orchestrator账户,用于Orchestrator存储高可用架构的信息。还需要创建一个数据库

CREATE DATABASE IF NOT EXISTS orchestrator;
create user 'orchestrator'@'%' identified by '123456';
GRANT ALL PRIVILEGES ON `orchestrator`.* TO 'orchestrator'@'%';

在192.168.10.10、192.168.10.20、192.168.10.30的mysql创建Orchestrator账户,用于Orchestrator,抓取各个mysql的信息,主从结构、切换mysql主从等。

create user 'orchestrator'@'%' identified by '123456';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';

####出现此报错正常,因为从库已经同步用户,只需要赋予权限就好######

3.orchestrator 配置文件配置

在192.168.10.40【orchestrator】服务器操作

复制一份配置文件orchestrator.conf.json

[root@swarm04 ~]# cd /usr/local/orchestrator/
[root@swarm04 orchestrator]# ls
orchestrator                   orchestrator-sample-sqlite.conf.json orc_vip.sh
orchestrator.conf.json         orch_vip.sh                           resources
orchestrator-sample.conf.json orc.log
[root@swarm04 orchestrator]# cp orchestrator-sample.conf.json orchestrator.conf.json

orchestrator.conf.json文件是一个json文件,主要配置以下参数

{
       "Debug": true,
       "EnableSyslog": false,
       "ListenAddress": ":3000",
       "MySQLTopologyUser": "orchestrator",
       "MySQLTopologyPassword": "123456",
       "MySQLTopologyCredentialsConfigFile": "",
       "MySQLTopologySSLPrivateKeyFile": "",
       "MySQLTopologySSLCertFile": "",
       "MySQLTopologySSLCAFile": "",
       "MySQLTopologySSLSkipVerify": true,
       "MySQLTopologyUseMutualTLS": false,
       "MySQLOrchestratorHost": "192.168.10.40",
       "MySQLOrchestratorPort": 3306,
       "MySQLOrchestratorDatabase": "orchestrator",
       "MySQLOrchestratorUser": "orchestrator",
       "MySQLOrchestratorPassword": "123456",
       "MySQLOrchestratorCredentialsConfigFile": "",
       "MySQLOrchestratorSSLPrivateKeyFile": "",
       "MySQLOrchestratorSSLCertFile": "",
       "MySQLOrchestratorSSLCAFile": "",
       "MySQLOrchestratorSSLSkipVerify": true,
       "MySQLOrchestratorUseMutualTLS": false,
       "MySQLConnectTimeoutSeconds": 1,
       "DefaultInstancePort": 3306,
       "DiscoverByShowSlaveHosts": true,
       "InstancePollSeconds": 5,
       "DiscoveryIgnoreReplicaHostnameFilters": [
               "a_host_i_want_to_ignore[.]example[.]com",
               ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
               "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
      ],
       "UnseenInstanceForgetHours": 240,
       "SnapshotTopologiesIntervalHours": 0,
       "InstanceBulkOperationsWaitTimeoutSeconds": 10,
       "HostnameResolveMethod": "default",
       "MySQLHostnameResolveMethod": "@@hostname",
       "SkipBinlogServerUnresolveCheck": true,
       "ExpiryHostnameResolvesMinutes": 60,
       "RejectHostnameResolvePattern": "",
       "ReasonableReplicationLagSeconds": 10,
       "ProblemIgnoreHostnameFilters": [],
       "VerifyReplicationFilters": false,
       "ReasonableMaintenanceReplicationLagSeconds": 20,
       "CandidateInstanceExpireMinutes": 60,
       "AuditLogFile": "",
       "AuditToSyslog": false,
       "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
       "ReadOnly": false,
       "AuthenticationMethod": "",
       "HTTPAuthUser": "",
       "HTTPAuthPassword": "",
       "AuthUserHeader": "",
       "PowerAuthUsers": [
               "*"
      ],
       "ClusterNameToAlias": {
               "127.0.0.1": "test suite"
      },
       "ReplicationLagQuery": "",
       "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
       "DetectClusterDomainQuery": "",
       "DetectInstanceAliasQuery": "",
       "DetectPromotionRuleQuery": "",
       "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
       "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
       "PromotionIgnoreHostnameFilters": [],
       "DetectSemiSyncEnforcedQuery": "",
       "ServeAgentsHttp": false,
       "AgentsServerPort": ":3001",
       "AgentsUseSSL": false,
       "AgentsUseMutualTLS": false,
       "AgentSSLSkipVerify": false,
       "AgentSSLPrivateKeyFile": "",
       "AgentSSLCertFile": "",
       "AgentSSLCAFile": "",
       "AgentSSLValidOUs": [],
       "UseSSL": false,
       "UseMutualTLS": false,
       "SSLSkipVerify": false,
       "SSLPrivateKeyFile": "",
       "SSLCertFile": "",
       "SSLCAFile": "",
       "SSLValidOUs": [],
       "URLPrefix": "",
       "StatusEndpoint": "/api/status",
       "StatusSimpleHealth": true,
       "StatusOUVerify": false,
       "AgentPollMinutes": 60,
       "UnseenAgentForgetHours": 6,
       "StaleSeedFailMinutes": 60,
       "SeedAcceptableBytesDiff": 8192,
       "PseudoGTIDPattern": "",
       "PseudoGTIDPatternIsFixedSubstring": false,
       "PseudoGTIDMonotonicHint": "asc:",
       "DetectPseudoGTIDQuery": "",
       "BinlogEventsChunkSize": 10000,
       "SkipBinlogEventsContaining": [],
       "ReduceReplicationAnalysisCount": true,
       "FailureDetectionPeriodBlockMinutes": 1,
       "FailMasterPromotionOnLagMinutes": 0,
       "RecoveryPeriodBlockSeconds": 30,
       "RecoveryIgnoreHostnameFilters": [],
       "RecoverMasterClusterFilters": [
               "*"
      ],
       "RecoverIntermediateMasterClusterFilters": [
               "*"
      ],
       "OnFailureDetectionProcesses": [
               "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
      ],
       "PreGracefulTakeoverProcesses": [
               "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
      ],
       "PreFailoverProcesses": [
               "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
      ],
       "PostFailoverProcesses": [
               "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}{failureClusterAlias}' >> /tmp/recovery.log", "/home/orch/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
      ],
       "PostUnsuccessfulFailoverProcesses": [],
       "PostMasterFailoverProcesses": [
               "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
      ],
       "PostIntermediateMasterFailoverProcesses": [
               "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
      ],
       "PostGracefulTakeoverProcesses": [
               "echo 'Planned takeover complete' >> /tmp/recovery.log"
      ],
       "CoMasterRecoveryMustPromoteOtherCoMaster": true,
       "DetachLostSlavesAfterMasterFailover": true,
       "ApplyMySQLPromotionAfterMasterFailover": true,
       "PreventCrossDataCenterMasterFailover": false,
       "PreventCrossRegionMasterFailover": false,
       "MasterFailoverDetachReplicaMasterHost": false,
       "MasterFailoverLostInstancesDowntimeMinutes": 0,
       "PostponeReplicaRecoveryOnLagMinutes": 0,
       "OSCIgnoreHostnameFilters": [],
       "GraphiteAddr": "",
       "GraphitePath": "",
       "GraphiteConvertHostnameDotsToUnderscores": true,
       "ConsulAddress": "",
       "ConsulAclToken": "",
       "ConsulKVStoreProvider": "consul"
}

配置修改:

MySQLTopologyUser 和 MySQLTopologyPassword

表示的是192.168.10.10、192.168.10.20、192.168.10.30上面的mysql账户

"MySQLTopologyUser": "orchestrator",
      "MySQLTopologyPassword": "123456",
      "MySQLTopologyCredentialsConfigFile": "",
      "MySQLTopologySSLPrivateKeyFile": "",
      "MySQLTopologySSLCertFile": "",
      "MySQLTopologySSLCAFile": "",
      "MySQLTopologySSLSkipVerify": true,
      "MySQLTopologyUseMutualTLS": false,
      "MySQLOrchestratorHost": "192.168.10.40",
      "MySQLOrchestratorPort": 3306,
      "MySQLOrchestratorDatabase": "orchestrator",
      "MySQLOrchestratorUser": "orchestrator",
      "MySQLOrchestratorPassword": "123456",

上面的参数对应的是192.168.10.40对应的存储Orchestrator信息的数据库,这里一定要理解清楚,很容易搞混淆。

RecoverMasterClusterFilters RecoverIntermediateMasterClusterFilters

两个参数建议都配置 * ,不然后面多次切换主从的时候,第二次不会成功

RecoveryPeriodBlockSeconds 3600 ##已经出故障后切换 3600s 后才能切换下一次 FailureDetectionPeriodBlockMinutes 1 在该时间内再次出现故障,不会被多次发现

如果在测试的环境把这两个值调小一点,不然多次切换主从会有问题

4.启动orchestrator

在/usr/local/orchestrator 目录下执行

nohup ./orchestrator --debug http > orc.log 2>&1 &

进入orchestrator web页面

http://192.168.10.40:3000/

点击 Submit

然后点击dashboard

#到此部署完成

五、节点恢复

1.停掉master节点mysql

[root@swarm01 opt]# systemctl stop mysqld

swarm02 变为主的了

2.将原master加入集群

[root@swarm01 opt]# systemctl start mysqld
[root@swarm01 opt]# mysql -uroot -p123

###############################################
CHANGE MASTER TO MASTER_HOST='192.168.10.20' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;
################################################
root@(none) 11:49 mysql>stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

root@(none) 11:49 mysql>CHANGE MASTER TO MASTER_HOST='192.168.10.20' ,
   -> MASTER_USER='xiaobai',
   -> MASTER_PASSWORD='123',
   -> MASTER_PORT=3306,
   -> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

root@(none) 11:49 mysql>start slave;
Query OK, 0 rows affected (0.00 sec)

六、主从切换

1.vip怎么漂移

Orchestrator中配置钩子,在orchestrator.conf.json配置文件中,添加

逻辑就是:在PostFailoverProcesses配置中添加一个shell脚本,orchestrator主从切换成功以后就会执行这个脚本。上面也可以配置不同的钩子,在orchestrator不同状态下执行对于的脚本。这里只讲vip相关的。

“/home/orch/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log”

有问题可以看上面的配置文件,里面直接复制 所以我们的shell脚本就在 /home/orch/orch_hook.sh 中

#!/bin/bash

isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="123456"

logfile="/var/log/orch_hook.log"

# list of clusternames
#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)
#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

      array=( ens33 "192.168.10.200" root "192.168.10.40")
      interface=${array[0]}
      IP=${array[1]}
      user=${array[2]}

      if [ ! -z ${IP} ] ; then

              echo $(date)
              echo "Revocering from: $isitdead"
              echo "New master is: $newmaster"
              echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
              /usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster
              #mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql
      else

              echo "Cluster does not exist!" | tee $logfile

      fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

      array=( ens33 "192.168.10.200" root "192.168.10.40")
      interface=${array[0]}
      IP=${array[1]}
      user=${array[2]}
      slavehost=`echo $5 | cut -d":" -f1`

      echo $(date)
      echo "Revocering from: $isitdead"
      echo "New intermediate master is: $slavehost"
      echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
      /usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

      array=( ens33 "192.168.10.200" root "192.168.10.40")
      interface=${array[0]}
      IP=${array[3]}
      user=${array[2]}
      slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
      showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
      newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

      echo $(date)
      echo "Revocering from: $isitdead"
      echo "New intermediate master is: $newintermediatemaster"
      echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
      /usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

fi

上面的脚本需要修改的地方 array=( ens33 “192.168.10.200” root “192.168.10.40”)

ens33表示网卡,填自己的电脑上面的

192.168.10.200 表示vip地址

192.168.10.40 表示orchestrator安装的ip

root 表示免登录的linux账户

2.飘移脚本

/usr/local/orchestrator/orch_vip.sh 漂移脚本

#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
cat << EOF
usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]

OPTIONS:
  -h       Show this message
  -o string Old master hostname or IP address
  -d int   If master is dead should be 1 otherweise it is 0
  -s string SSH options
  -n string New master hostname or IP address
  -i string Interface exmple eth0:1
  -I string Virtual IP
  -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
case $flag in
  o)
    orig_master="$OPTARG";
    ;;
  d)
    isitdead="${OPTARG}";
    ;;
  s)
    ssh_options="${OPTARG}";
    ;;
  n)
    new_master="$OPTARG";
    ;;
  i)
    interface="$OPTARG";
    ;;
  I)
    vip="$OPTARG";
    ;;
  u)
    ssh_user="$OPTARG";
    ;;
  h)
    usage;
    exit 0;
    ;;
  *)
    usage;
    exit 1;
    ;;
esac
done


if [ $OPTIND -eq 1 ]; then
  echo "No options were passed";
  usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
  rc=0

  # ensure the vip is removed
  $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
  "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
  rc=$?
  return $rc
}

vip_start() {
  rc=0

  # ensure the vip is added
  # this command should exit with failure if we are unable to add the vip
  # if the vip already exists always exit 0 (whether or not we added it)
  $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
    "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
  rc=$?
  $cmd_local_arp_fix
  return $rc
}

vip_status() {
  $arping -c 1 -I ${interface} ${vip%/*}
  if ping -c 1 -W 1 "$vip"; then
      return 0
  else
      return 1
  fi
}

if [[ $isitdead == 0 ]]; then
  echo "Online failover"
  if vip_stop; then
      if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi
      else
          echo "Can't add $vip on $new_master!"
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
          exit 1
      fi
  else
      echo $rc
      echo "Can't remove the $vip from orig_master!"
      if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
      exit 1
  fi


elif [[ $isitdead == 1 ]]; then
  echo "Master is dead, failover"
  # make sure the vip is not available
  if vip_status; then
      if vip_stop; then
          if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
      else
          if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
          exit 1
      fi
  fi

  if vip_start; then
        echo "$vip is moved to $new_master."
        if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi

  else
        echo "Can't add $vip on $new_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
        exit 1
  fi
else
  echo "Wrong argument, the master is dead or live?"

fi

上面两个文件记得权限和可执行的问题。

3.在mysql主节点添加vip

[root@swarm04 orchestrator]# ip addr add dev ens33 192.168.10.200/32
删除网卡
ip addr del 192.168.10.200/32 dev ens33

4.测试vip飘移:

停掉主节点的mysql,看192.168.10.200 vip是否飘移;(效果为msyql 主节点vip飘移到其他节点,可在其他节点查看),也可通过orchestrator观察;

5.也可部署wordpress 私人博客测试

在主节点创建wordpress 库

root@(none) 16:46  mysql>create database wordpress;
Query OK, 1 row affected (0.00 sec)
####使root用户可远程登录
root@(none) 16:46 mysql>GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;

部署wordpress
[root@swarm01 opt]# docker run -d --name wordpress -p 80:80 wordpress

数据库地址填写vip地址

安装完

然后将master节点关闭,测试服务是否正常,从节点恢复是否正常(与之前从节点恢复一样);

从节点加入主节点(MASTER_HOST=’主节点IP’):

CHANGE MASTER TO MASTER_HOST='192.168.10.10' ,
MASTER_USER='xiaobai',
MASTER_PASSWORD='123',
MASTER_PORT=3306,
master_auto_position=1;

类别:

没有回应

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注