Discussion:
[DRBD-user] Initial Sync - Fast then really slow
Travis Kriza
2004-04-06 00:08:56 UTC
Permalink
Hello. Just getting DRBD setup on some boxes to do some NFS serving.
Running a shared volume of about 40gigs. Anyways, I've got it running
on a set of HPs running RedHat Enterprise 3 w/ ultramonkey.

Anyways, first I completely missed the sync limits in the drbd.conf
file, and was wondering what was taking so long. 40Gigs at 1Meg max
would take a while. Anyways, I knocked that max up to 100M (running a
shared gigabit switch).

So, I reset the sync, and I notice its going a decent clip in the 40meg
range. I distract myself and get to another chore. I come back to the
machine and notice this rate has dropped to about 500. (It happens to
do this not too far after reaching near the 10gig mark). I knocked up
the sync-min to 10megs. However, this is still occurring and I'm not
quite sure whats going on.

I am running an IDE software raid on each box as well, which I know
could have some impact on performance, although I'm guessing it should
be capable of sustaining more throughput than just 500k/s.

Any idea's? Thanks,

Travis

PS... Here is a trimmed version of the drdb.conf file:

#
# drbd.conf example
#

resource drbd0 {

protocol = C

fsckcmd = fsck -p -y

disk {
do-panic
disk-size = 40313912k
}

net {

sync-min = 10M
sync-max = 100M # maximal average syncer bandwidth
tl-size = 5000 # transfer log size, ensures strict write
ordering
timeout = 60 # unit: 0.1 seconds
connect-int = 10 # unit: seconds
ping-int = 10 # unit: seconds
ko-count = 4 # if some block send times out this many times,
# the peer is considered dead, even if it still
# answeres ping requests
}

on server-1 {
device = /dev/nb0
disk = /dev/md2
address = 192.168.0.1
port = 7788
}

on server-2 {
device = /dev/nb0
disk = /dev/md4
address = 192.168.0.2
port = 7788
}
}
Jeff Goris
2004-04-06 05:18:24 UTC
Permalink
Post by Travis Kriza
So, I reset the sync, and I notice its going a decent clip in the 40meg
range. I distract myself and get to another chore. I come back to the
machine and notice this rate has dropped to about 500. (It happens to
do this not too far after reaching near the 10gig mark). I knocked up
the sync-min to 10megs. However, this is still occurring and I'm not
quite sure whats going on.
I am running an IDE software raid on each box as well, which I know
could have some impact on performance, although I'm guessing it should
be capable of sustaining more throughput than just 500k/s.
I'm running a fairly similar configuration. RedHat 9.0, drbd-0.6.12, channel
bonded gigabit crossover, with 90GB software RAID devices on each node and I
came across this same problem when I set it up.
Post by Travis Kriza
Any idea's? Thanks,
Perhaps check the clocks on each machine. I did notice that the clocks on
each of my hosts were different as one was set to the wrong timezone. I
fixed the timezone and had each host sync off an NTP source to ensure they
kept accurate time. Then the replication rate was stable the whole way
through at about 22,500 K/sec.

Jeff.
Philipp Reisner
2004-04-06 08:59:35 UTC
Permalink
Post by Jeff Goris
Post by Travis Kriza
So, I reset the sync, and I notice its going a decent clip in the 40meg
range. I distract myself and get to another chore. I come back to the
machine and notice this rate has dropped to about 500. (It happens to
do this not too far after reaching near the 10gig mark). I knocked up
the sync-min to 10megs. However, this is still occurring and I'm not
quite sure whats going on.
I am running an IDE software raid on each box as well, which I know
could have some impact on performance, although I'm guessing it should
be capable of sustaining more throughput than just 500k/s.
I'm running a fairly similar configuration. RedHat 9.0, drbd-0.6.12,
channel bonded gigabit crossover, with 90GB software RAID devices on each
node and I came across this same problem when I set it up.
Post by Travis Kriza
Any idea's? Thanks,
Perhaps check the clocks on each machine. I did notice that the clocks on
each of my hosts were different as one was set to the wrong timezone. I
fixed the timezone and had each host sync off an NTP source to ensure they
kept accurate time. Then the replication rate was stable the whole way
through at about 22,500 K/sec.
Jeff.
Just an uneducated guess:

Might it be that the software raid started its resync process ?

-Philipp
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
Jeff Goris
2004-04-07 01:37:05 UTC
Permalink
Post by Philipp Reisner
Post by Jeff Goris
Post by Travis Kriza
I am running an IDE software raid on each box as well, which I know
could have some impact on performance, although I'm guessing it should
be capable of sustaining more throughput than just 500k/s.
I'm running a fairly similar configuration. RedHat 9.0, drbd-0.6.12,
channel bonded gigabit crossover, with 90GB software RAID devices on
each
Post by Philipp Reisner
Post by Jeff Goris
node and I came across this same problem when I set it up.
Post by Travis Kriza
Any idea's? Thanks,
Perhaps check the clocks on each machine. I did notice that the clocks
on
Post by Philipp Reisner
Post by Jeff Goris
each of my hosts were different as one was set to the wrong timezone. I
fixed the timezone and had each host sync off an NTP source to ensure
they
Post by Philipp Reisner
Post by Jeff Goris
kept accurate time. Then the replication rate was stable the whole way
through at about 22,500 K/sec.
Jeff.
Might it be that the software raid started its resync process ?
-Philipp
Your 'uneducated guess' certainly fits the symptoms. I did check the RAID
device status some of the times the problem occured but this problem did
happen when all the RAID devices were healthy and synchronised. At first I
would take drbd down to stop the resync as it would have taken days
otherwise. Eventually, since I couldn't find what was wrong I did let it try
to resync at 500 K/sec. When synchronising and the rate dropped to 500 K/sec
eventually one of the hosts would freeze completely, requiring pressing of
the reset button. It was always the same host, regardless of whether I had
set it to primary or secondary. I could only get it to complete the sync
without crashing when I left the max sync rate at 2000K which it was able to
maintain the entire time. I have seen the sync rate drop significantly on
two other systems when the RAID device was also resyncing, but drbd has has
always coped with this, although this was when using drbd-0.6.8 and earlier.

I actually had put this problem aside for over a week due to other more
urgent work. It was only when using rpm to install some security updates
that I became aware of the time zone problem on the dodgy host. I'm not
positive that I had changed nothing else on either host. It was immediately
after fixing the time that I tried a max sync rate of 50 M/sec and managed
to get it to sync fairly constantly at 22,500 K/sec.

However, you have me thinking that your guess is right and that I may have
been experiencing the problems due to software RAID resyncing at the same
time, especially since after the host crashed the RAID would have to resync
too. I will attempt to do a drbd resysnc at the same time as a software RAID
resysnc and see how this goes.

Jeff.
Lars Ellenberg
2004-04-07 06:43:27 UTC
Permalink
/ 2004-04-07 11:37:05 +1000
Post by Jeff Goris
I actually had put this problem aside for over a week due to other more
urgent work. It was only when using rpm to install some security updates
that I became aware of the time zone problem on the dodgy host. I'm not
positive that I had changed nothing else on either host. It was immediately
after fixing the time that I tried a max sync rate of 50 M/sec and managed
to get it to sync fairly constantly at 22,500 K/sec.
DRBD has no idea about wall clock time or timezone.
So I can not think of how different wall clock time
on the peers would influence DRBD behaviour in any way.

Lars Ellenberg
Jeff Goris
2004-04-08 01:58:22 UTC
Permalink
Post by Lars Ellenberg
/ 2004-04-07 11:37:05 +1000
Post by Jeff Goris
I actually had put this problem aside for over a week due to other more
urgent work. It was only when using rpm to install some security updates
that I became aware of the time zone problem on the dodgy host. I'm not
positive that I had changed nothing else on either host. It was immediately
after fixing the time that I tried a max sync rate of 50 M/sec and managed
to get it to sync fairly constantly at 22,500 K/sec.
DRBD has no idea about wall clock time or timezone.
So I can not think of how different wall clock time
on the peers would influence DRBD behaviour in any way.
Lars Ellenberg
Sorry about all my previous posts being in a new thread. Hopefully, this won't
happen again.

Okay, I have reproduced the problem. The crash was caused whilst NTP was
drifting the clock from the incorrect time to the correct time. The clock was
wrong because a) system clock used UTC, b) I had a timezone about 18 hours
behind what it should have been, c) I manually set the time on this host to
the current local time and the system clock would have then been 18 hours
ahead of the other host.

To reproduce the problem I simply used the date command on the secondary to
set the wall clock time 3 hours in advance of the primary and manually started
a full sync on the primary. Part way through the sync the secondary locks up
and the primary looks like this:

version: 0.6.12 (api:64/proto:62)

0: cs:WFConnection st:Primary/Unknown ns:267736652 nr:0 dw:144 dr:267736805
pe:0 ua:0
NEEDS_SYNC

Jeff.
Lars Ellenberg
2004-04-08 09:59:54 UTC
Permalink
/ 2004-04-08 01:58:22 +0000
Post by Jeff Goris
Post by Lars Ellenberg
/ 2004-04-07 11:37:05 +1000
Post by Jeff Goris
I actually had put this problem aside for over a week due to other more
urgent work. It was only when using rpm to install some security updates
that I became aware of the time zone problem on the dodgy host. I'm not
positive that I had changed nothing else on either host. It was immediately
after fixing the time that I tried a max sync rate of 50 M/sec and managed
to get it to sync fairly constantly at 22,500 K/sec.
DRBD has no idea about wall clock time or timezone.
So I can not think of how different wall clock time
on the peers would influence DRBD behaviour in any way.
Lars Ellenberg
Sorry about all my previous posts being in a new thread. Hopefully, this won't
happen again.
Okay, I have reproduced the problem. The crash was caused whilst NTP was
drifting the clock from the incorrect time to the correct time. The clock was
wrong because a) system clock used UTC, b) I had a timezone about 18 hours
behind what it should have been, c) I manually set the time on this host to
the current local time and the system clock would have then been 18 hours
ahead of the other host.
To reproduce the problem I simply used the date command on the secondary to
set the wall clock time 3 hours in advance of the primary and manually started
a full sync on the primary. Part way through the sync the secondary locks up
version: 0.6.12 (api:64/proto:62)
0: cs:WFConnection st:Primary/Unknown ns:267736652 nr:0 dw:144 dr:267736805
pe:0 ua:0
NEEDS_SYNC
and DRBD is the "only" thing running here, and if the nodes do not
differ in time all works smoothly.

is this what you are saying?

lge
Jeff Goris
2004-04-08 16:17:24 UTC
Permalink
Post by Lars Ellenberg
Post by Jeff Goris
To reproduce the problem I simply used the date command on the secondary to
set the wall clock time 3 hours in advance of the primary and manually started
a full sync on the primary. Part way through the sync the secondary locks up
<< snip >>
Post by Lars Ellenberg
and DRBD is the "only" thing running here, and if the nodes do not
differ in time all works smoothly.
is this what you are saying?
lge
That is correct. The machine is only doing DRBD - both machines are freshly
installed with just DRBD and heartbeat setup. The problem occurs whether or
not /dev/nb0 is mounted or unmounted. The only resources Heartbeat manages in
the cluster is one DRBD device and one virtual IP address.

Jeff.
Lars Ellenberg
2004-04-08 17:13:08 UTC
Permalink
/ 2004-04-08 16:17:24 +0000
Post by Jeff Goris
That is correct. The machine is only doing DRBD - both machines are freshly
installed with just DRBD and heartbeat setup. The problem occurs whether or
not /dev/nb0 is mounted or unmounted. The only resources Heartbeat manages in
the cluster is one DRBD device and one virtual IP address.
please reproduce it without heartbeat... "by hand"
Thanks.

Lars
Travis Kriza
2004-04-08 23:27:32 UTC
Permalink
Post by Lars Ellenberg
please reproduce it without heartbeat... "by hand"
Thanks.
Lars,

I actually experienced that fast then slow condition actually without
heartbeat. (I was wondering what had been happening originally and was
manually starting the drbd processes on each box). Anyways, after
setting up NTP and resyncing the clocks (and a reboot), I was able to
achieve pretty good throughput speeds.

However, now I'm noticing I may have too much of a bottleneck between
running a software RAID 1 on each box PLUS the drbd raid of those raid
1 volumes. I'm running some iozone tests first to see if I get
noticeable improvements in performance. (Basically, I had apache
logging via NFS to this pair of boxes, and the box ended up scaling out
of control with load --- that is, the apache box went up to like a load
of 20 whereas the nfs box went up to maybe a 2). Anyways, I was
thinking to improve performance, it may make sense to switch the shared
(drbd shared) drives to be based on a raid0 volume versus the raid 1.
(I was also thinking it may be wiser to either pipe the apache logs to
write to both spots, or simply do a regular rsync interval for the
logs). Any thoughts on this?

And, can anyone recommend an easy way to convert a raid 1 into a raid
0? doing partition management during install is easy with disk druid,
but i'm lousy at disk management when it comes command line. Never had
to deal with it that much.

Thanks,

Travis
Lars Ellenberg
2004-04-09 04:49:54 UTC
Permalink
/ 2004-04-08 18:27:32 -0500
Post by Travis Kriza
Post by Lars Ellenberg
please reproduce it without heartbeat... "by hand"
Thanks.
Lars,
I actually experienced that fast then slow condition actually
without heartbeat.
Ok.
Post by Travis Kriza
(I was wondering what had been happening originally and
was manually starting the drbd processes on each box). Anyways, after
setting up NTP and resyncing the clocks (and a reboot), I was able to
achieve pretty good throughput speeds.
Oh, I just wondered: Did you really measure the "slow", or do you rely
solely on the output of /proc/drbd ?

If the latter, and you happen to have your syslog from that time, and
you happen to know (or are able to figure) the wall clock time passed
between "Sync Started" and "Sync Done", please recalculate and verify
that one.
Post by Travis Kriza
However, now I'm noticing I may have too much of a bottleneck between
running a software RAID 1 on each box PLUS the drbd raid of those raid
1 volumes. I'm running some iozone tests first to see if I get
noticeable improvements in performance. (Basically, I had apache
logging via NFS to this pair of boxes, and the box ended up scaling out
of control with load --- that is, the apache box went up to like a load
of 20 whereas the nfs box went up to maybe a 2). Anyways, I was
thinking to improve performance, it may make sense to switch the shared
(drbd shared) drives to be based on a raid0 volume versus the raid 1.
(I was also thinking it may be wiser to either pipe the apache logs to
write to both spots, or simply do a regular rsync interval for the
logs). Any thoughts on this?
For me, the main question(s) in (Computer/Network/Data) Security is
not "how can I prevent things from happening" (which is of course still
an important question), but:

What does it mean (to me) WHEN things are happening,
what do I lose worst case, how can I cope,
how fast can I recover?

So, if you don't mind losing some minutes of apache log (worst case),
then rsync should be just fine.

Your load problem is probably mostly with apache log on NFS,
not that much with DRBD...
Your log grows that fast, that it turns out to be a performance problem?
Hard to believe that this is the first bottleneck you hit.
BTW, if you happen to have your apache session files on (NFS on top of)
DRBD, think about having them on DRBD on top of RAM disks ...
If both go down, they have lost their meaning anyways...
As long as one stays up, you still have them.
Post by Travis Kriza
And, can anyone recommend an easy way to convert a raid 1 into a raid
0? doing partition management during install is easy with disk druid,
but i'm lousy at disk management when it comes command line. Never had
to deal with it that much.
Won't comment on that...
you may come back and blame me if things go wrong :)

Lars Ellenberg
Jeff Goris
2004-04-11 15:38:05 UTC
Permalink
Post by Lars Ellenberg
Post by Jeff Goris
That is correct. The machine is only doing DRBD - both machines are freshly
installed with just DRBD and heartbeat setup. The problem occurs whether or
not /dev/nb0 is mounted or unmounted. The only resources Heartbeat manages in
the cluster is one DRBD device and one virtual IP address.
please reproduce it without heartbeat... "by hand"
Thanks.
Lars
I managed to reproduce it again. /dev/nb0 was unmounted and heartbeat and DRBD
were stopped on both hosts. Then checked that all RAID devices were healthy
and that the time on both hosts were correct and synchronised. Started DRBD
(command 'service drbd start') on the host that was last primary. Started drbd
on the secondary and monitored the hosts durng the syncall of /dev/nb0. When
finished, I set the clock on the secondary forward 8 hours with the 'date'
command. Finally, I started a resync on the primary with the
command 'drbdsetup /dev/nb0 replicate'. I monitored both RAID and DRBD during
the resync until the secondary host locked up. I did not see the sync rate
drop prior to the lockup.

I suspect now that the slow sync rate was due to the software RAID 1 also
syncing as you "guessed" as the last two times I reproduced this lock up I did
not see the sync rate drop. However, I am pretty sure that the locking up on
the secondary occurs when it's system clock is drifting from a time in the
future back to the correct time whilst DRBD is resyncing. I can't see what
else could be causing the host to lock up whlst DRBD is resyncing. I've tried
to stop everything running other than DRBD and NTPD.

If you are very sure that DRBD should not be failing under these conditions,
then I think I will need to try a fresh "minimal" install of RedHat without
software RAID and without channel bonding and try introducing components and
see if I can ascertain which component is causing the problem.

Cheers,
Jeff.
Lars Ellenberg
2004-04-11 18:19:53 UTC
Permalink
/ 2004-04-11 15:38:05 +0000
Post by Jeff Goris
Post by Lars Ellenberg
Post by Jeff Goris
That is correct. The machine is only doing DRBD - both
machines are freshly installed with just DRBD and heartbeat
setup. The problem occurs whether or not /dev/nb0 is mounted
or unmounted. The only resources Heartbeat manages in the
cluster is one DRBD device and one virtual IP address.
please reproduce it without heartbeat... "by hand"
Thanks.
I managed to reproduce it again. /dev/nb0 was unmounted and heartbeat and DRBD
were stopped on both hosts. Then checked that all RAID devices were healthy
and that the time on both hosts were correct and synchronised. Started DRBD
(command 'service drbd start') on the host that was last primary. Started drbd
on the secondary and monitored the hosts durng the syncall of /dev/nb0. When
finished, I set the clock on the secondary forward 8 hours with the 'date'
command. Finally, I started a resync on the primary with the
command 'drbdsetup /dev/nb0 replicate'. I monitored both RAID and DRBD during
the resync until the secondary host locked up. I did not see the sync rate
drop prior to the lockup.
I suspect now that the slow sync rate was due to the software RAID 1 also
syncing as you "guessed" as the last two times I reproduced this lock up I did
not see the sync rate drop. However, I am pretty sure that the locking up on
the secondary occurs when it's system clock is drifting from a time in the
future back to the correct time whilst DRBD is resyncing. I can't see what
else could be causing the host to lock up whlst DRBD is resyncing. I've tried
to stop everything running other than DRBD and NTPD.
If you are very sure that DRBD should not be failing under these conditions,
then I think I will need to try a fresh "minimal" install of RedHat without
software RAID and without channel bonding and try introducing components and
see if I can ascertain which component is causing the problem.
Ok.

The only thing where DRBD has a notion of time are some local
timers, connection timeout and so on. It just deferres certain
actions by some configured amount of time (usual seconds) relative
to "now". E.g. I issue a DrbdPing, and then set a timer to notify
me after, say, 6 seconds... if the peer answers in time, the timer
action is discarded again.

A smoothly adjusting time via NTPD by sub-second amounts...
How should that be noticed at all, and by whom? Not by DRBD.

If I really try very hard to imagine what could go wrong,
I might be able to be not too sure about what exactly might happen
when you have a more or less *large* timeskew whilst DRBD is runnig.
Though I can not think of a reason for it to lock up the box.
Worst case and still extremly unlikely, it might drop the
connection, and immediately reconnect.

If you just have a time difference between the two nodes,
DRBD is completely innocent.
DRBD does not care about, and has no knowledge of, the peers time.
TCP has timestamps, but who cares.
heartbeat is known to "dislike" time differences between its
nodes, and rightfully so, but I doubt it would lock up the box
because of that...

So I just can say ' ?? :-/ '

BTW, did you use a serial console?
NMI watchdog? enabled sysrq?
any reaction? any message?

Lars Ellenberg
Todd Denniston
2004-04-12 17:15:19 UTC
Permalink
Post by Lars Ellenberg
/ 2004-04-11 15:38:05 +0000
Post by Jeff Goris
Post by Lars Ellenberg
Post by Jeff Goris
That is correct. The machine is only doing DRBD - both
machines are freshly installed with just DRBD and heartbeat
setup. The problem occurs whether or not /dev/nb0 is mounted
or unmounted. The only resources Heartbeat manages in the
cluster is one DRBD device and one virtual IP address.
please reproduce it without heartbeat... "by hand"
Thanks.
I managed to reproduce it again. /dev/nb0 was unmounted and heartbeat and DRBD
were stopped on both hosts. Then checked that all RAID devices were healthy
and that the time on both hosts were correct and synchronised. Started DRBD
(command 'service drbd start') on the host that was last primary. Started drbd
on the secondary and monitored the hosts durng the syncall of /dev/nb0. When
finished, I set the clock on the secondary forward 8 hours with the 'date'
command. Finally, I started a resync on the primary with the
<SNIP>
Post by Lars Ellenberg
Ok.
The only thing where DRBD has a notion of time are some local
timers, connection timeout and so on. It just deferres certain
actions by some configured amount of time (usual seconds) relative
to "now". E.g. I issue a DrbdPing, and then set a timer to notify
me after, say, 6 seconds... if the peer answers in time, the timer
action is discarded again.
A smoothly adjusting time via NTPD by sub-second amounts...
How should that be noticed at all, and by whom? Not by DRBD.
<SNIP>
Note, unless '-x' is passed to NTPD, with an offset of 8 hours ntp will "jump"
the time about 8 to 10 minutes[1] in after it detects the large offset, it
will also do this anytime there is more than 128ms offset. '-x' 'forces the
time to be slewed in all cases.'[2]


So Lars, should we expect bad things to happen if the jump is over 6 seconds
in one direction or another? [Jump forward should expire the timer early in
wall time] [Jump back should expire the timer 'jump time delta' later in wall
time] I have seen X (or rather X apps) do funny things when you set the system
time backward while it is running, like not updating screens.


[1] the 8 to 12 minutes is from personal experimentation, during the time when
the machines are checking time every 64 seconds. If ntpd is synching 1024
seconds and you only have one host, it may take more like 30 minutes to 1 hour
before it does the jump.

[2] http://www.eecis.udel.edu/~mills/ntp/html/ntpd.html
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
Lars Ellenberg
2004-04-12 17:52:59 UTC
Permalink
/ 2004-04-12 12:15:19 -0500
Post by Lars Ellenberg
The only thing where DRBD has a notion of time are some local
timers, connection timeout and so on. It just deferres certain
actions by some configured amount of time (usual seconds)
relative to "now". E.g. I issue a DrbdPing, and then set a
timer to notify me after, say, 6 seconds... if the peer
answers in time, the timer action is discarded again.
A smoothly adjusting time via NTPD by sub-second amounts...
How should that be noticed at all, and by whom? Not by DRBD.
<SNIP> Note, unless '-x' is passed to NTPD, with an offset of 8
hours ntp will "jump" the time about 8 to 10 minutes[1] in after
it detects the large offset, it will also do this anytime there
is more than 128ms offset. '-x' 'forces the time to be slewed
in all cases.'[2]
So Lars, should we expect bad things to happen if the jump is
over 6 seconds in one direction or another?
No. As I said, this *should* go unnoticed.
After smoking some psychedelic stuff, I might be able to imagine a race
condition which could lead to connection loss and immediate reconnect
under very weird circumstaces... I'd of course deny that when sober ...
[Jump forward should expire the timer early in wall time] [Jump
back should expire the timer 'jump time delta' later in wall
time] I have seen X (or rather X apps) do funny things when you
set the system time backward while it is running, like not
updating screens.
Hm. Under very weird circumstances, it might take longer to recognize a
broken link or crashed peer. But not more than an other iteration of the
"hey, peer, do you copy? /yes, peer here, whats up?/ ..." game...
which happens with an offset to "now" after every successfull
transmitted/received packet, unless some other packet is tx/rx earlier,
in which case the game it defered further. but always relative to "now"...

But, ok, many if not all timers used by the kernel may under certain
conditions behave "strange" when the time is not monotonously increasing,
including the network internal timers, and maybe even some hardware
related stuff...

Lars Ellenberg

Loading...