Discussion:
Understanding degr-wfc-timeout
Andrew Gideon
2010-12-01 23:33:32 UTC
Permalink
I've nodes 0 and 1. I stopped drbd service on node 1 and then node 0. I
started drbd service on node 1. Should this use degr-wfc-timeout or wfc-
timeout?

This is the output I saw on node 1:

***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 60 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'drbd1'; 0 sec -> wait forever)
To abort waiting enter 'yes' [13580]:

Because node 1 was stopped first, it was never a part of a degraded
cluster, right? So using wfc-timeout is correct?

On the other hand, when I follow the same sequence but start node 0
first, I see:

***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 60 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'drbd1'; 0 sec -> wait forever)
To abort waiting enter 'yes' [2623]:

It seems to be using wfc-timeout in this case too. But node 0 was
running for a time w/o node 1. So shouldn't degr-wfc-timeout be used in
this case? Or have I misunderstood what "degraded" means? I thought it
meant "running with only a single node".

Am I hitting a difference between stopping a node and "breaking" a node?
No. If I break comm before I shut down nodes 1 and 0, whether I start
node 0 or 1 first, both are apparently using wfc-timeout.

So: When is degr-wfc-timeout used?

Thanks...

Andrew
J. Ryan Earl
2010-12-02 19:36:25 UTC
Permalink
Post by Andrew Gideon
I've nodes 0 and 1. I stopped drbd service on node 1 and then node 0. I
started drbd service on node 1. Should this use degr-wfc-timeout or wfc-
timeout?
degr-wfc-timeout = how many seconds to wait for connection after the cluster
was in degraded status. If you "gracefully" stop DRBD on one node, it's
not "degraded." Degraded is from like a non-graceful separation due to a
crash, power-outage, network issue, etc where one end detects the other is
gone instead of being told to gracefully close connection between nodes.

So think of degr-wfc-timeout as how long you wait for the other node knowing
that the other node had an exceptional disconnection event. wfc-timeout is
used when you expect both nodes should work as they weren't previously
disconnected through an exceptional event.

At least that's my understanding.

-JR
Andrew Gideon
2010-12-03 02:13:09 UTC
Permalink
Post by J. Ryan Earl
If you "gracefully" stop DRBD on one
node, it's not "degraded." Degraded is from like a non-graceful
separation due to a crash, power-outage, network issue, etc where one
end detects the other is gone instead of being told to gracefully close
connection between nodes.
I issued a "stop" (a graceful shutdown) only after I broke the DRBD
connection by blocking the relevant packets. So before the stop, the
cluster was in a degraded state:

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Using "stop" still causes a clean shutdown which then avoids degr-wfc-
timeout?

Is there any way that a network issue, or anything else short of a crash
of the system, can invoke degr-wfc-timeout? I've even tried 'kill -9' of
the drbd processes, but they seems immune to this.

I can force a system crash if I have to, but that's something of pain in
the neck so I'd prefer another option if one is available.

Or have I misunderstood? I've been assuming that degr-wfc-timeout
applies only to the WFC at startup (because the timeout value is in the
startup block of the configuration file). Is this controlling some other
WFC period?

When I break the connection (and with no extra fencing logic specified),
I see that both nodes go into a WFC state. But this is lasting well
longer than the 60 seconds I have defined in degr-wfc-timeout.

Thanks...

Andrew
Lars Ellenberg
2010-12-03 08:43:11 UTC
Permalink
Post by Andrew Gideon
Post by J. Ryan Earl
If you "gracefully" stop DRBD on one
node, it's not "degraded." Degraded is from like a non-graceful
separation due to a crash, power-outage, network issue, etc where one
end detects the other is gone instead of being told to gracefully close
connection between nodes.
I issued a "stop" (a graceful shutdown) only after I broke the DRBD
connection by blocking the relevant packets. So before the stop, the
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Using "stop" still causes a clean shutdown which then avoids degr-wfc-
timeout?
Is there any way that a network issue, or anything else short of a crash
of the system, can invoke degr-wfc-timeout? I've even tried 'kill -9' of
the drbd processes, but they seems immune to this.
I can force a system crash if I have to, but that's something of pain in
the neck so I'd prefer another option if one is available.
Or have I misunderstood? I've been assuming that degr-wfc-timeout
applies only to the WFC at startup (because the timeout value is in the
startup block of the configuration file). Is this controlling some other
WFC period?
When I break the connection (and with no extra fencing logic specified),
I see that both nodes go into a WFC state. But this is lasting well
longer than the 60 seconds I have defined in degr-wfc-timeout.
See if my post
[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary
dated Tue Jan 8 11:56:00 CET 2008 helps.
http://lists.linbit.com/pipermail/drbd-user/2008-January/008223.html
and other archives

BTW, that setting only affects drbdadm/drbdsetup wait-connect, as used
for example by the init script, if used without an explicit timeout.
It does not affect anything else.

What is it you are trying to prove/trying to achieve?
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
Andrew Gideon
2010-12-03 17:12:23 UTC
Permalink
Post by Lars Ellenberg
See if my post
[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary
dated Tue Jan 8 11:56:00 CET 2008 helps.
http://lists.linbit.com/pipermail/drbd-user/2008-January/008223.html and
other archives
Perhaps I'm still not grasping this, but - based on that URL - I thought
the situation below would make use of degr-wfc-timeout:

I'd two nodes, both primary.

Using iptables, I "broke" the connection. Both nodes were still up, but
reporting:

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

and

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

I then crashed ("xm destroy") one node and then booted it. As I
understand the above-cited post, this should have make use of the degr-
wfc-timeout value but - apparently - it did not:

Starting drbd: Starting DRBD resources: [
drbd1
Found valid meta data in the expected location, 16105058304 bytes into /
dev/xvdb1.
d(drbd1) drbd: bd_claim(cfe1ad00,cc00c800); failed [d108e4d0;c0478e79;1]
1: Failure: (114) Lower device is already claimed. This usually means it
is mounted.

[drbd1] cmd /sbin/drbdsetup 1 disk /dev/xvdb1 /dev/xvdb1 internal --set-
defaults --create-device failed - continuing!

s(drbd1) n(drbd1) ]..........
***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 60 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'drbd1'; 0 sec -> wait forever)
To abort waiting enter 'yes' [ 208]:


What am I doing/understanding wrong?
Post by Lars Ellenberg
BTW, that setting only affects drbdadm/drbdsetup wait-connect, as used
for example by the init script, if used without an explicit timeout. It
does not affect anything else.
What is it you are trying to prove/trying to achieve?
At this point, I'm trying to understand DRBD. Specifically in this case,
I'm trying to understand the startup process and it deals with various
partition/split-brain cases. I come from a Cluster Suite world, where
"majority voting" is the answer to these issues, so I'm working to come
up to speed on how these issues are addressed by DRBD.

The idea of waiting forever seems like a problem if only one node is
available to go back into production. I know that the wait can be
overridden manually, but is there a way to not wait forever?

This is the context in which I started looking at degr-wfc-timeout.

FWIW, I've also posted in the thread "RedHat Clustering Services does not
fence when DRBD breaks" trying to understand the fencing process. I
think I managed to suspend all I/O in the case of a fence failure (the
handler returning a value of 6), but I'm not sure. Does:

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s----
ns:0 nr:0 dw:4096 dr:28 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

indicate suspension? Is that what "s----" means? I've failed to find
documentation for that bit of string in /proc/drbd.

Thanks...

Andrew
Lars Ellenberg
2010-12-06 11:48:50 UTC
Permalink
Post by Andrew Gideon
Post by Lars Ellenberg
See if my post
[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary
dated Tue Jan 8 11:56:00 CET 2008 helps.
http://lists.linbit.com/pipermail/drbd-user/2008-January/008223.html and
other archives
Perhaps I'm still not grasping this, but - based on that URL - I thought
I'd two nodes, both primary.
Using iptables, I "broke" the connection. Both nodes were still up, but
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
and
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
I then crashed ("xm destroy") one node and then booted it. As I
understand the above-cited post, this should have make use of the degr-
Starting drbd: Starting DRBD resources: [
drbd1
Found valid meta data in the expected location, 16105058304 bytes into /
dev/xvdb1.
d(drbd1) drbd: bd_claim(cfe1ad00,cc00c800); failed [d108e4d0;c0478e79;1]
1: Failure: (114) Lower device is already claimed. This usually means it
is mounted.
There. It cannot even attach.
Because it cannot attach, it cannot read it's meta data.
Thus it does not know anything about itself.
Post by Andrew Gideon
[drbd1] cmd /sbin/drbdsetup 1 disk /dev/xvdb1 /dev/xvdb1 internal --set-
defaults --create-device failed - continuing!
You better make sure xvdb1 is not used by someone else
at the time your drbd tries to attach it.

You may need to fix your fstab, or your lvm.conf,
or your initrd, or whatever other "magic" is going on there.
Post by Andrew Gideon
s(drbd1) n(drbd1) ]..........
***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 60 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'drbd1'; 0 sec -> wait forever)
What am I doing/understanding wrong?
The disk you ask DRBD to attach to is used by something else,
file system, device mapper, whatever. Fix that.
Post by Andrew Gideon
Post by Lars Ellenberg
BTW, that setting only affects drbdadm/drbdsetup wait-connect, as used
for example by the init script, if used without an explicit timeout. It
does not affect anything else.
What is it you are trying to prove/trying to achieve?
At this point, I'm trying to understand DRBD. Specifically in this case,
I'm trying to understand the startup process and it deals with various
partition/split-brain cases. I come from a Cluster Suite world, where
"majority voting" is the answer to these issues, so I'm working to come
up to speed on how these issues are addressed by DRBD.
The idea of waiting forever seems like a problem if only one node is
available to go back into production. I know that the wait can be
overridden manually, but is there a way to not wait forever?
This is the context in which I started looking at degr-wfc-timeout.
FWIW, I've also posted in the thread "RedHat Clustering Services does not
fence when DRBD breaks" trying to understand the fencing process. I
think I managed to suspend all I/O in the case of a fence failure (the
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s----
ns:0 nr:0 dw:4096 dr:28 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
indicate suspension? Is that what "s----" means?
at that position, yes, that means application io is
s: suspended, or r: running/resumed.
you can manually resume with "drbdadm resume-io"
Post by Andrew Gideon
I've failed to find documentation for that bit of string in /proc/drbd.
Is that so.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
Andrew Gideon
2010-12-06 16:26:22 UTC
Permalink
Post by Lars Ellenberg
Starting drbd: Starting DRBD resources: [ drbd1 Found valid meta data
in the expected location, 16105058304 bytes into / dev/xvdb1.
d(drbd1) drbd: bd_claim(cfe1ad00,cc00c800); failed
[d108e4d0;c0478e79;1] 1: Failure: (114) Lower device is already
claimed. This usually means it is mounted.
There. It cannot even attach.
Because it cannot attach, it cannot read it's meta data. Thus it does
not know anything about itself.
Ah! I noticed that, but it didn't click to me that this would be
related, so I just put that issue off.
Post by Lars Ellenberg
[drbd1] cmd /sbin/drbdsetup 1 disk /dev/xvdb1 /dev/xvdb1 internal
--set- defaults --create-device failed - continuing!
You better make sure xvdb1 is not used by someone else at the time your
drbd tries to attach it.
You may need to fix your fstab, or your lvm.conf, or your initrd, or
whatever other "magic" is going on there.
LVM; the underlying device is being activated by LVM so I need to block
that. I'll make the change and try again.

[...]
Post by Lars Ellenberg
FWIW, I've also posted in the thread "RedHat Clustering Services does
not fence when DRBD breaks" trying to understand the fencing process.
I think I managed to suspend all I/O in the case of a fence failure
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s----
ns:0 nr:0 dw:4096 dr:28 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
indicate suspension? Is that what "s----" means?
running/resumed.
you can manually resume with "drbdadm resume-io"
Perfect. Thanks.
Post by Lars Ellenberg
I've failed to find documentation for that bit of string in /proc/drbd.
Is that so.
Yes. I would have thought it to be in

http://www.drbd.org/users-guide/ch-admin.html#s-proc-drbd

but if so then I'm missing it. Or is it elsewhere?

Thanks again...

Andrew
J. Ryan Earl
2010-12-06 19:37:42 UTC
Permalink
Post by Andrew Gideon
Post by Lars Ellenberg
You better make sure xvdb1 is not used by someone else at the time your
drbd tries to attach it.
You may need to fix your fstab, or your lvm.conf, or your initrd, or
whatever other "magic" is going on there.
LVM; the underlying device is being activated by LVM so I need to block
that. I'll make the change and try again.
Need something like the following, replace <DRBD-backing-store> with
the appropriate device/partition:

--- lvm.conf.orig 2010-10-01 14:01:59.350001934 -0500
+++ lvm.conf 2010-10-01 14:24:11.306001934 -0500
@@ -50,7 +50,7 @@


# By default we accept every block device:
- filter = [ "a/.*/" ]
+ filter = [ "r|/dev/<DRBD-backing-store>|", "a/.*/" ]


-JR
Andrew Gideon
2010-12-06 21:22:53 UTC
Permalink
Need something like the following, replace <DRBD-backing-store> with the
Thanks, but this much (albeit not much more {8^) I did understand. I'd
just put it off because I didn't realize it was interfering with the
testing in which I was actually interested.

This is also well-documented in:

http://www.drbd.org/users-guide/s-lvm-drbd-as-pv.html

I now have seen degr-wfc-timeout used.

I also have fencing failing well.

I also noticed that a node coming up that cannot reach its peer calls the
fencing handler (as long as "Fencing" is set correctly; I'm using
resource-and-stonith). This is terrific.

Thanks again, everyone, for the help. I'm sure I'll have more questions
as I move forward.

- Andrew

Loading...