Heesung Yang

December 21, 2021

CentOS 7 HA 클러스터 생성 방법(pacemaker, drbd, docker-compose)

이 글에서는 docker-compose로 실행하는 application을 Active/Standy HA 클러스터로 구성하는 방법에 대해 설명한다.

환경 구성 목표

OS : CentOS 7
두 대의 서버를 Active/Standby 구성
- 이를 위해 Pacemaker 구성
docker-compose 구성은 전형적인 web application 구조로 되어 있음
- nginx container
- database container
- app container
database container 데이터는 양쪽 서버가 동일해야 함
- 이를 위해 DBMS가 지원하는 replication 솔루션을 사용할 수 있음
- 그러나 DBMS마다 설정 방법이 다름
- DBMS 종류와 상관없이 replication 구성을 하기 위해 DRBD 구성

DRBD 구성

Reference : https://linbit.com/drbd-user-guide/users-guide-drbd-8-4/

DRBD 구성을 하려면 사용하지 않는 파티션이 필요하다. 이 글에서는 /dev/sdb 에 새로운 파티션을 생성해보겠다.

설치

~$ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
~$ sudo yum install -y https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
~$ sudo yum install -y kmod-drbd84 drbd84-utils

DRBD용 파티션 생성

~$ sudo fdisk /dev/sdb <<EOF
n
p



w
EOF
~$ sudo fdisk -l
...

Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x6ee68984

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048    20971519    10484736   83  Linux              <<<<< Partition is created !!!

Global 설정

/etc/drbd.d/global_common.conf

global {
        usage-count no;
        udev-always-use-vnr;
}
common {
        handlers {
        }
        startup {
        }
        options {
        }
        disk {
                disk-barrier yes;
                disk-flushes yes;
        }
        net {
                protocol C;
                sndbuf-size 512k;
        }
}

global

usage-count
- DRBD 프로젝트에서 설치 횟수 통계를 목적으로 수집하기 위한 설정
- 이 사이트에서 통계 확인 가능 (http://usage.drbd.org/index.html)
- default 값은 ask
- yes로 설정할 경우, DRBD 버전 업데이트 시 자동으로 통계 정보 전달
udev-always-use-vnr
- vnr : volume number
- udev -> drbdadm 에 장치와 관련된 symbolic link 목록 요청 시 이 옵션 존재 여부에 따라 응답 값이 달라짐
- 하위 호환성을 위해 default 값은 off. 그렇지만 enable 권장
- 이 옵션 존재 시 항상 아래와 같은 naming convention으로 device file 생성됨
  - /dev/drbd/by-res/{resource name}/0
  - /dev/drbd/by-res/{resource name}/1
- 옵션 없을 시 단일 볼륨인 경우 아래와 같이 생성되고 vnr이 0인 하나의 volume 이라 가정하고 아래와 같이
  - /dev/drbd/by-res/{resource name}

common

disk
- disk-barrier
- disk-flushed
net
- protocol
  - A
    - 비동기 복제
    - 이 설정은 주로 DR(Disaster Recovery) 구성 시 사용
      - DR 구성은 보통 서버의 거리가 매우 멀기 때문에 네트워크 전송 속도가 빠르지 않음
  - B
    - 반동기화 복제
    - 로컬 디스크에 쓰기 작업이 끝나고, 복제를 위한 데이터 패킷이 복제 대상 서버에 도착하는 순간 완료로 간주(복제 대상 서버의 디스크 쓰기 작업은 확인 안함)
    - 강제 failover 시 데이터 손실이 발생하지 않지만, 두 서버가 동시에 정전되는 경우 가장 최근의 데이터 쓰기 내용이 손실될 수 있음
  - C
    - 동기화 복제
    - 보통 같은 데이터 센터에 위치한 서버(거리가 매우 짧음)에 사용하며 대부분 DRBD 구성 시 이 옵션을 사용
    - 로컬 디스크와 복제 대상 서버의 디스크에 쓰기 작업이 끝난 경우 완료로 간주
- sndbuf-size
  - 물리적으로 매우 가깝게 네트워크가 구성된 경우, network buffer 크기를 키워서 전송 속도를 높일 수 있음

Resource 생성

설정 파일

/etc/drbd.d/ 폴더 하위에 ${RESOURCE_NAME}.res 으로 파일을 생성한다. (예: /etc/drbd.d/mydisk.res)

resource mydisk {                  # mydisk 라는 이름의 리소스 생성
    device /dev/drbd0;             # DRBD 디바이스 파일
    disk   /dev/sdb1;              # DRBD 구성에 사용할 파티션
    meta-disk internal;

    on node-01 {                    # 첫번째 노드의 호스트이름. hostname 이라는 명령어로 확인 가능
        address  10.120.1.117:7789; # 첫번째 노드의 IP:포트
    }
    on node-02 {                    # 두번째 노드의 호스트이름. hostname 이라는 명령어로 확인 가능
        address  10.120.1.118:7789; # 두번째 노드의 IP:포트
    }
}

위 설정을 아래와 같이 각 노드별로 설정할 수도 있다.

resource mydisk {

    on node-01 {
        device /dev/drbd0;
        disk   /dev/sdb1;
        meta-disk internal;
        address  10.120.1.117:7789;
    }
    on node-02 {
        device /dev/drbd0;
        disk   /dev/sdb1;
        meta-disk internal;
        address  10.120.1.118:7789;
    }
}

생성

# drbdadm create-md ${RESOURCE_NAME}
~$ sudo drbdadm create-md mydisk
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

# drbdadm up ${RESOURCE_NAME}
~$ sudo drbdadm up mydisk

상태 확인

~$ sudo lsblk
NAME      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda         8:0    0   50G  0 disk
├─sda1      8:1    0    1G  0 part /boot/efi
├─sda2      8:2    0    1G  0 part /boot
└─sda3      8:3    0   48G  0 part /
sdb         8:16   0   20G  0 disk
└─sdb1      8:17   0   20G  0 part
  └─drbd0 147:0    0   20G  1 disk              # check this !
sr0        11:0    1  1.3G  0 rom

~$ drbdadm status mydisk
mydisk role:Secondary
  disk:Inconsistent
  peer role:Secondary
    replication:Established peer-disk:Inconsistent

최초 동기화

노드 1번에서만 실행.

~$ sudo drbdadm primary mydisk --force
~$ sudo drbdadm status mydisk
mydisk role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:0.25

복제 테스트

노드 1번에서 drbd0 디바이스를 마운트한 후 파일을 생성한다. 그리고 다시 언마운트한다.

[hsyang@node-01] ~$ sudo mkfs.xfs /dev/drbd0
[hsyang@node-01] ~$ sudo mkdir /data
[hsyang@node-01] ~$ sudo mount /dev/drbd0 /data/
[hsyang@node-01] ~$ sudo touch /data/testfile
[hsyang@node-01] ~$ sudo umount /data
[hsyang@node-01] ~$ sudo rmdir /data

노드 1번 demote

[hsyang@node-01] ~$ sudo drbdadm secondary mydisk

[hsyang@node-01] ~$ drbdadm status
mydisk role:Secondary
disk:UpToDate
peer role:Primary
    replication:Established peer-disk:UpToDate

노드 2번 promote

[hsyang@node-02] ~$ sudo drbdadm primary mydisk
[hsyang@node-02] ~$ sudo mkdir /data
[hsyang@node-02] ~$ sudo mount /dev/drbd0 /data

# 노드 1번에서 생성했던 테스트파일이 존재하는지 확인
[hsyang@node-02] ~$ ls /data
testfile

# 테스트 종료 후 원복
[hsyang@node-02] ~$ sudo rm /data/testfile
[hsyang@node-02] ~$ sudo drbdadm secondary mydisk

Pacemaker

Pacemaker 설치

각 서버에서 필요한 패키지를 설치한다.

~$ sudo yum install -y pcs pacemaker fence-agents-all

firewalld가 동작 중이라면 서비스를 추가해준다.

~$ sudo firewall-cmd --permanent --add-service=high-availability
~$ sudo firewall-cmd --add-service=high-availability

# 또는 비활성화
~$ sudo systemctl disable --now firewalld

패키지 설치 시 생성된 hacluster 계정에 패스워드를 설정한다.
```
~$ sudo passwd hacluster
```

pcs 서비스를 시작한다.

~$ sudo systemctl enable --now pcsd.service

/etc/hosts 파일에 서버 IP와 이름을 등록한다.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.120.1.117 node-01
10.120.1.118 node-02

각 서버 인증 (Only run on one host)

~$ sudo pcs cluster auth node-01 node-02
Username: hacluster
Password:
node02: Authorized
node01: Authorized

클러스터를 생성한다. (Only run on one host) node-01, node-02 는 /etc/hosts 파일에 등록한 이름을 사용한다.

~$ CLUSTER_NAME=mycluster
~$ sudo pcs cluster setup --start --name $CLUSTER_NAME node-01 node-02

Destroying cluster on nodes: node01, node02...
node01: Stopping Cluster (pacemaker)...
node02: Stopping Cluster (pacemaker)...
node02: Successfully destroyed cluster
node01: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node01', 'node02'
node01: successful distribution of the file 'pacemaker_remote authkey'
node02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01: Succeeded
node02: Succeeded

Starting cluster on nodes: node01, node02...
node01: Starting Cluster (corosync)...
node02: Starting Cluster (corosync)...
node01: Starting Cluster (pacemaker)...
node02: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes node01, node02...
node02: Success
node01: Success
Restarting pcsd on the nodes in order to reload the certificates...
node02: Success
node01: Success

노드가 재부팅 되었을때 자동으로 클러스터에 추가되도록 하려면 아래 명령어를 실행한다. (Only run on one host)
```
~$ sudo pcs cluster enable --all
node-01: Cluster Enabled
node-02: Cluster Enabled
```
- 수동으로 실행하려면 아래 명령어를 실행 한다.
```
~$ sudo pcs cluster start
```

현재 클러스터 상태 조회를 해보자.

~$ sudo pcs cluster status

Cluster Status:
 Stack: corosync
 Current DC: node01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
 Last updated: Wed Aug 25 13:58:41 2021
 Last change: Wed Aug 25 13:42:20 2021 by hacluster via crmd on node01
 2 nodes configured
 0 resource instances configured

PCSD Status:
  node02: Online
  node01: Online

Fencing 설정 (Disable)

현재 상태 확인

~$ sudo crm_verify -L -V

error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

stonith 비활성화 (한쪽 노드에서만 실행)

~$ sudo pcs property set stonith-enabled=false

# disable 후에는 crm_verify 명령 실행 시 아무런 에러가 뜨지 않음
~$ sudo crm_verify -L -V

Pacemaker docker-compose Resource Agent 추가

docker-compose resource agent는 기본 설치되지 않으므로 수동으로 설치해야 한다.

docker-compose resource agent 추가

~$ wget https://raw.githubusercontent.com/ClusterLabs/resource-agents/master/heartbeat/docker-compose
~$ chmod +x docker-compose
~$ sudo chown root:root docker-compose
~$ sudo mv docker-compose /usr/lib/ocf/resource.d/heartbeat/

추가된 resource agent 확인

~$ sudo pcs resource list docker-compose

ocf:heartbeat:docker-compose - This script manages docker services using docker-compose.

Virtual IP Resource 추가

Resource 이름 : VIP
IP address : 10.120.1.119
Netmask : 23
Monitoring period : 5 seconds

NIC : ens192

~$ RESOURCE_NAME=VIP
~$ sudo pcs resource create $RESOURCE_NAME ocf:heartbeat:IPaddr2 ip=10.120.1.119 cidr_netmask=23 nic=ens192 op monitor interval=5s

# IP 추가되었는지 확인
~$ ip addr

...
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:e5:32:bd brd ff:ff:ff:ff:ff:ff
    inet 10.120.1.117/23 brd 10.120.1.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
>>> inet 10.120.1.119/23 brd 10.120.1.255 scope global secondary ens192
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fee5:32bd/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
...

DRBD Resource 추가

~$ sudo pcs cluster cib drbd_cfg

# drbd_resource 이름은 /etc/drbd.d/mydisk.res 파일에 정의한 resource 이름을 따라감
~$ RESOURCE_NAME=my_data
~$ sudo pcs -f drbd_cfg resource create $RESOURCE_NAME ocf:linbit:drbd drbd_resource=mydisk op monitor interval=5s

~$ MASTER_SLAVE_NAME=MyData
~$ sudo pcs -f drbd_cfg resource master $MASTER_SLAVE_NAME $RESOURCE_NAME master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

# verify resources
~$ sudo pcs -f drbd_cfg resource show
Resource Group: mycluster
    vip    (ocf::heartbeat:IPaddr2):    Started node01
Master/Slave Set: MyData [my_data]
    Stopped: [ node01 node02 ]

~$ sudo pcs cluster cib-push drbd_cfg
CIB updated

추가된 Resource 확인

~$ sudo pcs status
Cluster name: my-cluster
Stack: corosync
Current DC: node02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Dec 13 10:59:31 2021
Last change: Mon Dec 13 10:59:28 2021 by root via cibadmin on node01

2 nodes configured
3 resource instances configured

Online: [ node01 node02 ]

Full list of resources:

Resource Group: mycluster
    vip    (ocf::heartbeat:IPaddr2):    Started node01
Master/Slave Set: MyData [my_data]
    Masters: [ node02 ]
    Slaves: [ node01 ]

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Filesystem Resource 추가

~$ RESOURCE_NAME=fs
~$ sudo pcs cluster cib fs_cfg
~$ sudo pcs -f fs_cfg resource create $RESOURCE_NAME Filesystem device=/dev/drbd0 directory=/home/hsyang/data fstype=xfs
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

~$ sudo pcs -f fs_cfg constraint colocation add $RESOURCE_NAME with $MASTER_SLAVE_NAME INFINITY with-rsc-role=Master
~$ sudo pcs -f fs_cfg constraint order promote $MASTER_SLAVE_NAME then start $RESOURCE_NAME
Adding MyData fs (kind: Mandatory) (Options: first-action=promote then-action=start)

추가된 Resource 확인

~$ sudo pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Dec 13 11:28:17 2021
Last change: Mon Dec 13 10:59:28 2021 by root via cibadmin on node01

2 nodes configured
3 resource instances configured

Online: [ node01 node02 ]

Full list of resources:

Resource Group: mycluster
    vip    (ocf::heartbeat:IPaddr2):    Started node01
Master/Slave Set: MyData [my_data]
    Masters: [ node02 ]
    Slaves: [ node01 ]

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

~$ sudo pcs resource show
Resource Group: mycluster
    vip    (ocf::heartbeat:IPaddr2):    Started node01
Master/Slave Set: MyData [my_data]
    Masters: [ node02 ]
    Slaves: [ node01 ]

Resource 들의 실행 순서와 제약 조건 설정

~$ VIP_RESOURCE_NAME=VIP
~$ FS_RESOURCE_NAME=fs
~$ sudo pcs -f fs_cfg constraint colocation add $VIP_RESOURCE_NAME with $FS_RESOURCE_NAME INFINITY
~$ sudo pcs -f fs_cfg constraint order $FS_RESOURCE_NAME then $VIP_RESOURCE_NAME
Adding fs vip (kind: Mandatory) (Options: first-action=start then-action=start)

~$ sudo pcs -f fs_cfg constraint
Location Constraints:
Ordering Constraints:
promote MyData then start fs (kind:Mandatory)
start fs then start vip (kind:Mandatory)
Colocation Constraints:
fs with MyData (score:INFINITY) (with-rsc-role:Master)
vip with fs (score:INFINITY)
Ticket Constraints:

~$ sudo pcs -f fs_cfg resource show
Resource Group: mycluster
    vip (ocf::heartbeat:IPaddr2):    Started node01
Master/Slave Set: MyData [my_data]
    Masters: [ node02 ]
    Slaves: [ node01 ]
fs  (ocf::heartbeat:Filesystem):    Stopped

# commit
~$ sudo pcs cluster cib-push fs_cfg
CIB updated

Resource 확인

~$ pcs resource show --full

Group: mycluster
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=23 ip=10.120.1.119 nic=ens192
Operations: monitor interval=30s (vip-monitor-interval-30s)
            start interval=0s timeout=20s (vip-start-interval-0s)
            stop interval=0s timeout=20s (vip-stop-interval-0s)
Master: MyData
Meta Attrs: clone-max=2 clone-node-max=1 master-max=1 master-node-max=1 notify=true
Resource: my_data (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=mycluster
Operations: demote interval=0s timeout=90 (my_data-demote-interval-0s)
            monitor interval=5s (my_data-monitor-interval-5s)
            notify interval=0s timeout=90 (my_data-notify-interval-0s)
            promote interval=0s timeout=90 (my_data-promote-interval-0s)
            reload interval=0s timeout=30 (my_data-reload-interval-0s)
            start interval=0s timeout=240 (my_data-start-interval-0s)
            stop interval=0s timeout=100 (my_data-stop-interval-0s)
Resource: fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/data fstype=xfs
Operations: monitor interval=20s timeout=40s (fs-monitor-interval-20s)
            notify interval=0s timeout=60s (fs-notify-interval-0s)
            start interval=0s timeout=60s (fs-start-interval-0s)
            stop interval=0s timeout=60s (fs-stop-interval-0s)

docker-compose Resource 추가

~$ RESOURCE_NAME=myapp
~$ VIP_RESOURCE_NAME=VIP
~$ pcs cluster cib cxr_gw_cfg
~$ pcs -f app_cfg resource create $RESOURCE_NAME ocf:heartbeat:docker-compose dirpath=/home/hsyang/app
~$ pcs -f app_cfg constraint colocation add $RESOURCE_NAME with $VIP_RESOURCE_NAME INFINITY
~$ pcs -f app_cfg constraint order $VIP_RESOURCE_NAME then $RESOURCE_NAME
Adding vip myapp (kind: Mandatory) (Options: first-action=start then-action=start)

[명령어] xxd (파일 hex/binary view)

[명령어] dcmdump