DRBD (XFS) + Pacemaker + Corosync with 2 node and arbiter (virtual node) for no split brain: Stonith, Quorum needed?

Discussion:

aTTi

2014-09-17 20:23:29 UTC

Hi!

I want build a DRBD cluster, with no special primary/primary function, just
a normal primary and passive secondary node, as standard.
I will use XFS file system with Centos 7 with 2 NIC: eth0 for switch and
for internet, eth1 for direct cable (crosslink) to other server.
I will define 2 ring: ring 0: eth1 crosslink connection, ring1: eth0
connection. I hope normally never will use the ring1, just the ring0.

If I had just this 2 server with extra direct cable connection, 'split
brain' can happen (someone unplug all network cable from server 1 or from
server 2, for example).

I had more normal servers. I want to find the best solution, because I
don't want split brain and lost data. Also I don't want fix manually the
problems if I can find a solution what helps to (near) never happing split
brain, or other DRBD killer disaster...

My 3. server is near same hardware as server 1 and server 2 with a low
load. If needed I had server 4 too, with a little bigger load with same
hardware as server 3. I just want a safe, clear and simple DRBD solution.

My 4 ideas / plans:

1. use 3. server as virtual node (arbiter). Question: how? Using server 3
helps to avoid split brain situation?

2. use 3. server as backup server with iSCSI. (3. server also in same
gigabit switch) It's good idea? This solution can helps to avoid split
brain?

3. use the server 3 as stacked node with DRBD, I found this:

http://www.drbd.org/users-guide-8.3/s-pacemaker-floating-peers.html

It helps to avoid split brain?

4. I add as normal passive node my 3. server for DRBD. It will be problem
for me because no direct connection for all 3 servers. I don't want this
way.

What is the best? I don't want make any new solution. I just want a safe,
working DRBD. If 3. server not needed for that, I also not will use this,
just 2. But I think 3. server can help to be more safe this DRBD cluster.
Please help me to make that.

My plan:

using my server 3 for make safer my DRBD (in server 1 and server 2) and
make backup from that 2 server.
I just want real safe my data from server 1 + server 2.

For 2 node, Stonith recommended. I also want to use that.
I knows Stonith mechanism: it kills the server if needed. It's good... if
really needed.

I will use Pacemaker + Corosync to manage DRBD and services.

Quorum needed for me if I had +1 node for DRBD?

I used DRBD few times, but not I installed for servers.

Please describe me what is the best if I want use a normal active / passive
DRBD and I want to be the most safe solution, using server 3 for that.
I want the simplest / safest solution what works for most scenarios. I just
want to be a happy DRBD user.

Thank you,
aTTi

Digimer

2014-09-18 06:09:45 UTC

Permalink

Post by aTTi
Hi!
I want build a DRBD cluster, with no special primary/primary function,
just a normal primary and passive secondary node, as standard.
I will use XFS file system with Centos 7 with 2 NIC: eth0 for switch and
for internet, eth1 for direct cable (crosslink) to other server.
I will define 2 ring: ring 0: eth1 crosslink connection, ring1: eth0
connection. I hope normally never will use the ring1, just the ring0.
If I had just this 2 server with extra direct cable connection, 'split
brain' can happen (someone unplug all network cable from server 1 or
from server 2, for example).
I had more normal servers. I want to find the best solution, because I
don't want split brain and lost data. Also I don't want fix manually the
problems if I can find a solution what helps to (near) never happing
split brain, or other DRBD killer disaster...
My 3. server is near same hardware as server 1 and server 2 with a low
load. If needed I had server 4 too, with a little bigger load with same
hardware as server 3. I just want a safe, clear and simple DRBD solution.
1. use 3. server as virtual node (arbiter). Question: how? Using server
3 helps to avoid split brain situation?
2. use 3. server as backup server with iSCSI. (3. server also in same
gigabit switch) It's good idea? This solution can helps to avoid split
brain?
http://www.drbd.org/users-guide-8.3/s-pacemaker-floating-peers.html
It helps to avoid split brain?
4. I add as normal passive node my 3. server for DRBD. It will be
problem for me because no direct connection for all 3 servers. I don't
want this way.
What is the best? I don't want make any new solution. I just want a
safe, working DRBD. If 3. server not needed for that, I also not will
use this, just 2. But I think 3. server can help to be more safe this
DRBD cluster. Please help me to make that.
using my server 3 for make safer my DRBD (in server 1 and server 2) and
make backup from that 2 server.
I just want real safe my data from server 1 + server 2.
For 2 node, Stonith recommended. I also want to use that.
I knows Stonith mechanism: it kills the server if needed. It's good...
if really needed.
I will use Pacemaker + Corosync to manage DRBD and services.
Quorum needed for me if I had +1 node for DRBD?
I used DRBD few times, but not I installed for servers.
Please describe me what is the best if I want use a normal active /
passive DRBD and I want to be the most safe solution, using server 3 for
that.
I want the simplest / safest solution what works for most scenarios. I
just want to be a happy DRBD user.
Thank you,
aTTi

Hi aTTi,

The only way to prevent split-brains is with fencing, full stop.
Quorum deals with a different issue. Whether you have 2-nodes and
disable quorum or have three nodes with quorum, you will still need
fencing (aka stonith).

The best plan is to configure stonith in pacemaker (IPMI or the like
is the most common method, switched PDUs is another options). Then when
that is working properly, configure DRBD and use the crm-fence-peer.sh
fence handler and set the fencing policy to resource-and-stonith.

With this, should communication break, both nodes will block and call
a fence. The faster node will power off the slower node. Then, and only
then, storage will unblock and recovery will begin, if needed. You can
control which node wins in a race like this by assigning 'delay="15"'
fence method configuration for to the node you want to live.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

Digimer

2014-09-19 08:30:31 UTC

Permalink

Hi aTTi,

Comments in-line;

Hi Digimer!
Thanks your answer. I had a lot of questions and not just for Digimer - for all.
So, if I had just 2 nodes with disabled quorum and I use fencing (aka
STONITH) + pacemaker, it will be safe for production use? (other
recommended settings what is not default? any howto?)

"Production ready" requires many things. Fencing is one of those things,
of course, but there are others.

Details are hard to give without a better idea of your environment...
What operating system? What versions of corosync, pacemaker and DRBD? etc.

With 2-node clusters, you need to put a delay on one node, and you need
to be careful to avoid fence loops. That is to say, either don't let the
cluster stack start on boot (always my recommendation), or at least use
wait_for_all if you have corosync v2+.

See:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Giving_Nodes_More_Time_to_Start_and_Avoiding_.22Fence_Loops.22

If the STONITH kills the slower node, it not makes data loss for
slower server? It's a remote shutdown or power off / reset ? Or same
as I start a shutdown as root?

With DRBD, both nodes stop writing when connection is lost. This way,
when the slower node is powered off, no data is lost. If your OS itself
uses a journaled file system and you're not doing something silly like
using hardware RAID in write-through mode without a BBU, then the OS
should be safe as well.

When the fenced server boots back up, DRBD on the surviving node will
know just which blocks changed when the peer was gone, so it only has to
copy that data to bring the peer back up to full sync state.

So, if communication will break, happenings will be same in a western
movie: faster kills the slower and only 1 will alive. Both node will
die - it can be happen?

It can happen that both nodes die in some cases. This can be avoided
with a few precautions; disable acpid if you have IPMI fencing and set a
delay against one node.

Please read the section immediately below the example config file here:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Using_the_Fence_Devices

With good setup and with no hardware error what is the most problems
with DRBD? How can I proof that?

With good fencing, there are no problems. I have used in it production
since 2009 on dozens of 2-node clusters all over north america. The
trick is the good fencing.

How can I find a documentation about DRBD test cases? Or recommended
configurations and installation manual for 2 node with Centos 7?

I don't know how much documentation exists for CentOS 7, it is very new.
However, the concepts in CentOS 6 are very similar.

You can read here a lot about the logic and concepts behind how we use
DRBD in our 2-node clusters here:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Hooking_DRBD_into_the_Cluster.27s_Fencing

server 1 = DRBD active node with running services, server 2 = DRBD passive node
server 1 had hardware error, went offline, server 2 will the active node
server 2 set the virtual IP what needed for active, then starting services
after server 1 hardware repair, server 1 will online again
how can I switch back the most safest way if STONITH installed to
server 1 be the active and server 2 be the passive node? I need a
script? Or just few commands?

As soon as there is a problem, both nodes block and call a fence. The
faster node powers off the slower node, gets confirmation that it is
off, and *then* begins recovery. Maybe the fenced node will boot back
up, or maybe it's a pile of rubble and will never power on again... it
doesn't matter to the cluster.

Once the node is gone, the surviving machine will review the pacemaker
configuration, determine what has to be done to recover your services,
and then do that. What "that" means will depend entirely on your
configuration.

An example might be to:

1. promote DRBD to primary
2. mount the file system on drbd
3. start a service like httpd or postgresql that uses the DRBD data
4. take over the virtual IP address

This is just an example though.

Any real life experience about to periodically (weekly, monthly)
change the active and passive nodes? Like in the last example, server
1 active, server 2 passive, then monthly I change to be active the
node 2. In January the active server 1 the active node, in February
the server 2 is the active, in March again the server 1 will the
active... for same server wear/abrasion.

Migration of services can be controlled however you want, but time-based
migrations is not something I have seen. Nothing stops you from manually
moving the services though, if you want. Generally though, services
migrate in reaction to a specific event, like a component failure.

You recommend me to use 3. node as backup node or not? And in what way
to use the third node? As stacked node? Or ISCSI sync? Or normal
passive node? (I don't want it. I want to be my DRBD solution simple
and safe.)

A cluster does _NOT_ replace backups. You still need backups, always.
Generally, I have a dedicated machine, in another building, that
periodically rsync's the production data into a date-coded directory.
This way, I can go back in time to retrieve deleted or corrupted files.

How you setup your backup though, is entirely up to you. Backup is very
different from HA.

Can I combine DRBD server pairs? Like server 1+2 is DRBD1 node 1+2,
and server 3+4 is DRBD2 node 1+2. Then adding to DRBD1 the server 3 or
4, and for DRBD2 adding 3. node the server 1 or 2? Any point of this?
Or to make more strange: adding DRBD1 node 3 storage space to DRBD2's
disk space?
I think it's not a good idea just I want to know. Also I had disk
space for that, just asking as theoretically.

I don't know if it is possible, but I think it would be.

If DRBD really safe with 2 nodes, I don't want use more nodes. I will
make auto backup from data, I just want HA and no service stop and no
data loss if server error. I know, DRBD just one part of HA solution,
but it's important part.

As I said, I have used DRBD in 2-node clusters only for several years
without any issue.

You recommend to use at least 2 ring level with corosync? level 1 =
crossover cable, level 2 = switch connection. Any disadvantages of
that?

It's up to you. I use active/passive bonding with the network links
spanning two switches for full network redundancy. Redundant rings are
good, too. I go with bonding only because it protects all traffic,
including DRBD traffic.

Thank you again for your help.
aTTi

Always happy to help.

PS - Please keep replies on the mailing list. Conversations like this
can help others in the future when they are in the archives.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?