DRBD Split-Brain auto recover

Discussion:

trm asn

2011-11-26 21:49:35 UTC

Dear List,

I have one HA NFS setup with DRBD. Primary is NFS1 server & secondary is
NFS2 server.

Please help me out to configure the auto recovery from split-brain.

Below is my config & package details.

Packages:
kmod-drbd83-8.3.8-1.el5.centos
drbd83-8.3.8-1.el5.centos

/etc/drbd.conf [ same one both the box]

common { syncer { rate 100M; al-extents 257; } }
resource main {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
disk { on-io-error detach; }
startup { degr-wfc-timeout 60; wfc-timeout 60; }

on NFS1 {
address 10.20.137.8:7789;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
on NFS2 {
address 10.20.137.9:7789;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
}

Nick Khamis

2011-11-27 16:16:42 UTC

Permalink

I could be wrong, but topics as important as a disk replicator's
ability to automatically recover
from split brain has been covered multiple times on it's list. Not to
mention the thourough
documentation.

http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-automatic-split-brain-recovery-configuration
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-split-brain-notification

How about it......

Nick from Toronto.
- Show quoted text -

Post by trm asn
Dear List,
I have one HA NFS setup with DRBD. Primary is NFS1 server & secondary is
NFS2 server.
Please help me out to configure the auto recovery from split-brain.
Below is my config & package details.
kmod-drbd83-8.3.8-1.el5.centos
drbd83-8.3.8-1.el5.centos
/etc/drbd.conf [ same one both the box]
common { syncer { rate 100M; al-extents 257; } }
resource main {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
disk { on-io-error detach; }
startup { degr-wfc-timeout 60; wfc-timeout 60; }
on NFS1 {
address 10.20.137.8:7789;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
on NFS2 {
address 10.20.137.9:7789;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
}
_______________________________________________
drbd-user mailing list
http://lists.linbit.com/mailman/listinfo/drbd-user

trm asn

2011-12-08 07:14:49 UTC

Permalink

Post by Nick Khamis
I could be wrong, but topics as important as a disk replicator's
ability to automatically recover
from split brain has been covered multiple times on it's list. Not to
mention the thourough
documentation.
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-automatic-split-brain-recovery-configuration
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-split-brain-notification
How about it......
Nick from Toronto.
- Show quoted text -

Below I am getting one packet loss warning message. And due to that it's
becoming StandAlone status on both the servers. Is there any mechanism to
increase the number of packet drop count in DRBD .

Dec 7 19:23:13 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[1782:1784]
Dec 7 19:27:21 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[1906:1908]
Dec 7 19:28:27 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[1939:1941]
Dec 7 19:38:49 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2250:2252]
Dec 7 19:40:01 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2286:2288]
Dec 7 19:41:31 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2331:2333]
Dec 7 19:46:01 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2466:2468]
Dec 7 19:46:47 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2489:2491]
Dec 7 19:46:59 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2495:2497]
Dec 7 19:47:09 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[2500:2502]
Dec 8 06:52:48 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[90:92]
Dec 8 06:52:54 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[93:95]
Dec 8 06:59:14 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for [nfs2]
[283:285]

Thanks & Regards,
Tarak Ranjan

Andreas Kurz

2011-12-09 13:01:33 UTC

Permalink

Post by trm asn
Dear List,
I have one HA NFS setup with DRBD. Primary is NFS1 server &

secondary is

Post by trm asn
NFS2 server.
Please help me out to configure the auto recovery from split-brain.
Below is my config & package details.
kmod-drbd83-8.3.8-1.el5.centos
drbd83-8.3.8-1.el5.centos
/etc/drbd.conf [ same one both the box]
common { syncer { rate 100M; al-extents 257; } }
resource main {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
disk { on-io-error detach; }
startup { degr-wfc-timeout 60; wfc-timeout 60; }
on NFS1 {
address 10.20.137.8:7789 <http://10.20.137.8:7789>;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
on NFS2 {
address 10.20.137.9:7789 <http://10.20.137.9:7789>;
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
}
}

Below I am getting one packet loss warning message. And due to that it's
becoming StandAlone status on both the servers. Is there any mechanism
to increase the number of packet drop count in DRBD .

That has nothing to do with DRBD, these are messages from Heartbeats
messaging layer ... flaky network?

Regards,
Andreas
--
Need help with DRBD & Pacemaker?
http://www.hastexo.com/now

Post by Nick Khamis
Dec 7 19:23:13 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [1782:1784]
Dec 7 19:27:21 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [1906:1908]
Dec 7 19:28:27 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [1939:1941]
Dec 7 19:38:49 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2250:2252]
Dec 7 19:40:01 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2286:2288]
Dec 7 19:41:31 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2331:2333]
Dec 7 19:46:01 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2466:2468]
Dec 7 19:46:47 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2489:2491]
Dec 7 19:46:59 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2495:2497]
Dec 7 19:47:09 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [2500:2502]
Dec 8 06:52:48 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [90:92]
Dec 8 06:52:54 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [93:95]
Dec 8 06:59:14 NFS1 heartbeat: [12280]: WARN: 1 lost packet(s) for
[nfs2] [283:285]
Thanks & Regards,
Tarak Ranjan
_______________________________________________
drbd-user mailing list
http://lists.linbit.com/mailman/listinfo/drbd-user

Lars Ellenberg

2011-12-09 15:02:53 UTC

Permalink

Post by Andreas Kurz

Post by trm asn
Below I am getting one packet loss warning message. And due to that it's
becoming StandAlone status on both the servers. Is there any mechanism
to increase the number of packet drop count in DRBD .

That has nothing to do with DRBD, these are messages from Heartbeats
messaging layer ... flaky network?

Slight reordering of messages can always happen.
Heartbeat tries to be nice and not warn about it, if it receives the
"missing" messages within a (configurable, short) timeout.

Versions since [I don't know exactly when, also depends somewhat on
platform, compile time and run time environment] until 3.0.5 would
unfortunately set this timeout to zero, so would complain about each
reordering as if it really was a lost packet.

I recommend to upgrade to latest heartbeat.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

Digimer

2011-11-27 17:08:16 UTC

Permalink

Post by trm asn
Dear List,
I have one HA NFS setup with DRBD. Primary is NFS1 server & secondary is
NFS2 server.
Please help me out to configure the auto recovery from split-brain.

You can't safely recover from split-brain automatically. Consider;

Node A saved a Fedora ISO, ~3.5GB written.
Node B saved a hours worth of credit card transactions, ~1MB written.

Which node has the more valuable data?

The best you can do is configure and test fencing so that you can avoid
split brain conditions in the first place.

--
Digimer
E-Mail: digimer-***@public.gmane.org
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron