CoroSync/Pacemaker on Centos 6

| Comments

Install Pacemaker/Corosync

From my readings online you can also use heartbeat 3.x along side packmaker to achive similar results. I”ve decided to go with Corosync as its backed by RedHat and Suse and looks to have more active development. Not to memtion that the Pacemaker projects say you should now use Corosync :)

There are packages included in the Centos 6.x base/updates repositories so we can just use yum to installed the needed packages.

1
yum install pacemaker corosync

Setup Corosync

Generate AuthKey

Corosync requires an authkey for communication within its cluster. This file must be copied to each of the nodes that you want to add to the cluseter.

If a message “Invalid digest” appears from the corosync executive, the keys are not consistent between nodes

To generate the authkey Corosync has a utility corosync-keygen. Invoke this command as the root users to generate the authkey. The key will be generated at /etc/corosync/authkey

Grab a cup of coffee this process takes a while to complete as it pulls from the more secure /dev/random. You don’t have to press anything on the keyboard it will still generate the authkey.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
sudo corosync-keygen 
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 128).
Press keys on your keyboard to generate entropy (bits = 192).
Press keys on your keyboard to generate entropy (bits = 256).
Press keys on your keyboard to generate entropy (bits = 328).
Press keys on your keyboard to generate entropy (bits = 392).
Press keys on your keyboard to generate entropy (bits = 456).
Press keys on your keyboard to generate entropy (bits = 520).
Press keys on your keyboard to generate entropy (bits = 592).
Press keys on your keyboard to generate entropy (bits = 656).
Press keys on your keyboard to generate entropy (bits = 720).
Press keys on your keyboard to generate entropy (bits = 784).
Press keys on your keyboard to generate entropy (bits = 848).
Press keys on your keyboard to generate entropy (bits = 912).
Press keys on your keyboard to generate entropy (bits = 976).
Writing corosync key to /etc/corosync/authkey.

Now you just need to copy this authkey to the other nodes in your cluster

1
sudo scp /etc/corosync/authkey root@<node2>:/etc/corosync/

Configure corosync.conf

All changes listed below will need to be performed on ALL nodes in the cluster.

The first thing we’ll need to do is copy the corosync.conf.example file to corosync.conf. I’ll be using the udp configuration here as we’ll only have two nodes.

1
cp /etc/corosync/corosync.conf.example.udpu /etc/corosynccorosync.conf

Now we’ll edit this file to set the the user corosync will run as. This is nessasary so that corosync can manage the pacemaker resources.

1
sudo vim /etc/corosync/corosyn.conf

Add the following to the top of the corosync.conf file.

1
2
3
4
5
aisexec {
        # Run as root - this is necessary to be able to manage resources with Pacemaker
        user:        root
        group:       root
}

Edit the totem section to include the members in your cluster and set the bindnetaddr that corosync will listen on. You can leave the other settings default for now.

Add cluster members

1
2
3
4
5
6
7
interface {
                member {
                        memberaddr: 10.1.22.28
                }
                member {
                        memberaddr: 10.1.22.29
                }

Set bindnetaddr, this will unique per node in the cluster

1
bindnetaddr: 10.1.22.28

Create pcmk service.d file

Now we’ll create a pacemaker service.d file to tell corosync to control/run the pacemaker resoucres.

1
sudo vim /etc/corosync/service.d/pcmk

Add the following into the file you just created.

Change the ver: to 1 will allow you to start the pacemaker service manually for trouble shooting

1
2
3
4
5
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}

Start/Verify Corosync is correcly configured

Now lets start corosync on the first node in the cluster

1
sudo /etc/init.d/corosync start

Check to see if corosync is running as expected

1
2
sudo /etc/init.d/corosync status
corosync (pid  18376) is running...

or

1
2
3
4
5
6
7
8
9
10
11
sudo crm_mon
#
# Output from crm_mon 
============
Last updated: Wed May  2 07:51:20 2012
Last change: 
Current DC: NONE
0 Nodes configured, unknown expected votes
0 Resources configured.
============
Online: [ pg1.stage.net ]

This the first node up and running you can now start the second node.

Configure Active/Passive Cluster

The first step is to check the cluster configureation using crm_verify -L.

1
2
3
4
5
6
sudo crm_verify -L
crm_verify[19478]: 2012/05/02_07:52:37 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[19478]: 2012/05/02_07:52:37 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[19478]: 2012/05/02_07:52:37 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
  -V may provide more details

You’ll notice that you see a few errors, this is because by default pacemaker is set to make use of STONITH (Shoot The Other Node In The Head). For now we can disable this for our basic configuration.

1
sudo crm configure property stonith-enabled=false

Running crm_verify -L again will now complete without any errors.

Adding ClusterIP Resource

The first thing we need to do for a cluster is add a resource like an IP address so we can always contact and communicate with the cluster without regardless of where the cluster services are running. This must be a NEW address not associated with ANY node.

In the below example you’ll need to set the ip, cidr_netmask to the address for your cluster. You can also set the monitor interval to a lower number if you want a quicker failover. I have set mine to 1s so failover is almost instantaneous

1
2
3
4
5
crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=172.25.3.20 cidr_netmask=21 \
op monitor interval=30s
# Output:
crm_verify[19566]: 2012/05/02_08:04:21 WARN: cluster_status: We do not have quorum - fencing and resource management disabled

View/verify that the ClusterIP has been added

1
2
3
4
5
6
7
8
9
10
sudo crm configure show
node pg1.stage.net
primitive ClusterIP ocf:heartbeat:IPaddr2 \
  params ip="172.25.3.20" cidr_netmask="21" \
  op monitor interval="30s"
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  stonith-enabled="false"

Because we are setting up a 2 node cluster which is mathematically unable to attain quorum, we need to tell Pacemaker to ignore it.

1
sudo crm configure property no-quorum-policy=ignore

Now verify quorum is disabled

1
2
3
4
5
6
7
8
9
10
11
sudo crm configure show
node cloo.arin.net
primitive ClusterIP ocf:heartbeat:IPaddr2 \
  params ip="172.25.3.20" cidr_netmask="21" \
  op monitor interval="30s"
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  stonith-enabled="false" \
  no-quorum-policy="ignore"

Resources

Bits & Bytes of Life

Clusters from Scratch

Comments