Discussion:
MySQL-over-DRBD Performance
Art Age Software
17 years ago
Permalink
I'm testing a configuration of MySQL 5 running on a 2-node DRBD
cluster. It is all configured and seems to be running fine. However,
upon running the MySQL sql-bench tests I am seeing some surprising
(and alarming) results. I would be interested to hear from anybody who
has configured a similar setup to hear what sort of performance you
are seeing and what you have done to maximize performance of MySQL
over DRBD.

In my case, I am replacing a 3-year old pair of Dell 2850's (4GB,
Dual-Proc/Single-Core) with a pair of new Dell 2950's (8GB,
Dual-Proc/Quad-Core). Clearly, I expect to see an overall performance
boost from the new servers. And for most operations I am seeing better
performance. However for writes, I am seeing **worse** performance.
**But only when the database is located on the DRBD device.**

Here is some sample result data from the sql-bench insert test:

Old Servers:
Database on Local Storage
insert test: 212.00 sec.
Database on DRBD Device
insert test: 998.00 sec.
----------------------------
DRBD Overhead: 786 sec. = 370%

New Servers:
Database on Local Storage
insert test: 164.00 sec. (22% better than old servers)
Database on DRBD Device
insert test: 1137.00 sec. (14% *worse* than old servers)
----------------------------
DRBD Overhead: 973 sec. = 590%

As you can see, the new servers performed better when writing locally,
but performed worse when writing to the DRBD device. I have tested the
local write performance on both primary and secondary, and all
hardware and software config is identical for both nodes. So I believe
this rules out local I/O subsystem as the culprit and points to either
DRBD or something in the TCP networking stack as the problem.

The dedicated GigE link connecting the DRBD peers seems to be
operating well. I performed a full resync with the syncer rate set at
100MB and DRBD reported throughput very close to 100MB during the
entire sync process.

Here are the main differences (that I know about) between the old and
new configs (other than hardware):

1. Old config runs DRBD version 0.7.13
New config runs DRBD version 8.0.6

2. Old config runs a single GigE cross-connect between servers for
DRBD traffic.
New config runs a bonded dual-GigE cross-connect between servers for
DRBD traffic.

So, a couple of questions:

1) I have read that DRBD should impose no more than about a 30%
performance penalty on I/O when a dedicated gigabit ethernet
connection is used for the DRBD link. I'm seeing inserts take **7
times longer** to the DRBD device as compared to the local disk. Can
that be right?

2) Why is performance of the new configuration worse than the old
configuration when DRBD is involved (when it is clearly better when
DRBD is *not* involved)? Is DRBD 8.x generally slower than DRBD 0.7.x?
Or might there be something else going on?

In general, I'm a bit at a loss here and would appreciate any input
that might shed some light.

Thanks,

Sam
Lars Ellenberg
17 years ago
Permalink
...
you can try and pin drbd threads (using taskset) as well as the interrupt
handler for your NIC (/proc/interrupts, proc/irq/<nr>/smp_affinity)
to one dedicated cpu, and see if that changes things.

it used to help in similar weird cases.

you can also verify wether the network
bonding introduces additional latency.
...
--
: commercial DRBD/HA support and consulting: sales at linbit.com :
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Oliver Hookins
17 years ago
Permalink
...
I've also set up a very similar cluster, and we have seen some performance
degration due to DRBD but I haven't numerically measured it, and it hasn't
impacted the cluster to the extent that we feel it warrants investigation.
That being said, we know that write performance isn't what we want it to be
but that has largely been attributed to running with 'sync_binlog' turned on
in MySQL. We are planning to mitigate this with a battery backed RAID card
so we can enable write-back caching without fear of data loss.

In some tests we've performed with write queries we've seen quite signifant
performance gains, in the hundreds of percent faster with the write-back
caching.
--
Regards,
Oliver Hookins
Anchor Systems
Art Age Software
17 years ago
Permalink
...
Thanks for your reply. My storage subsystem on both nodes is a PERC
RAID-10 with battery-backed write-back cache enabled. The database is
all InnoDB and sync_binlog is enabled. I was under the impression that
sync_binlog should be enabled regardless of the presence of
battery-backed cache. Is this not the case?

It seems that there is a lot of uncertainty out there regarding how to
best configure MySQL on DRBD. Some sort of "best practices" document
with configuration settings for DRBD and MySQL would be a huge help in
this regard. I realize there are a lot of variables to take into
consideration. But having some solid tested examples to start from
would be quite helpful. Anyone up to the task? :)

Sam

Sam
Florian Haas
17 years ago
Permalink
Post by Art Age Software
It seems that there is a lot of uncertainty out there regarding how to
best configure MySQL on DRBD. Some sort of "best practices" document
with configuration settings for DRBD and MySQL would be a huge help in
this regard. I realize there are a lot of variables to take into
consideration. But having some solid tested examples to start from
would be quite helpful. Anyone up to the task? :)
That would be me, I suppose. :-)

About the performance issues you mentioned. The performance tests I've
conducted were run using mysqlslap, and backed up using some lower-level
latency tests, but test results were similar initially. Going back to the
Post by Art Age Software
Database on Local Storage
insert test: 212.00 sec.
Database on DRBD Device
insert test: 998.00 sec.
----------------------------
DRBD Overhead: 786 sec. = 370%
Eliminate competition for CPU resources between DRBD an MySQL, by pinning
DRBD's kernel threads to one CPU core and mysqld on the other, like so
(assuming drbd0 is your device you're running your MySQL databases on):

for thread in worker asender receiver; do
taskset -p 0x01 $(pidof drbd0_$thread)
done
taskset -p 0x02 $(pidof mysqld)

If you're unfamiliar with taskset, be sure to read its man page to understand
what those CPU affinity masks mean.

Do that, then re-run your tests (on your old server) and share your results.

That tweak alone reduced my DRBD overhead for mysqlslap from 224% to 57% on
the test system I have at my disposal.
Post by Art Age Software
Database on Local Storage
insert test: 164.00 sec. (22% better than old servers)
Database on DRBD Device
insert test: 1137.00 sec. (14% *worse* than old servers)
----------------------------
DRBD Overhead: 973 sec. = 590%
Yup, now given the CPU resource competition issues described earlier, these
are probably being exacerbated by the fact that there are now 8 logical
CPUs (cores) available, versus 2 on your old server. You can do one of two
things here:

1. Pin the DRBD threads to one core, and allocate the others to MySQL.
2. Pin the DRBD threads to one core, and allocate only a few of the others to
MySQL. I've heard some SMP issues are present in InnoDB; more cores doesn't
necessarily mean better performance.

There are other settings that can be tweaked (I can come down to about 26%
overhead on my system, which is about as low as the network will let me), but
I'd be interested to learn whether you can confirm my results with regard to
CPU affinity tweaking.

Cheers,
Florian
--
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

Please note: when replying, there is no need to CC my
personal address. Replying to the list is fine. Thank you.
Art Age Software
17 years ago
Permalink
...
Thank you very much for the suggestions, Florian. I will experiment
with the CPU affinity today and report my findings back to the list.
Unfortunately, I can only do so on the new servers as the old ones are
currently running in production and I can't risk breaking them. :)

Sam
Art Age Software
17 years ago
Permalink
...
OK, so I have re-run the benchmarks after pinning the DRBD threads to
one core in the first CPU and the mysqld process to 4 cores in the
second CPU like so:

taskset -p 0x80 (pids of drbd threads)
taskset -p 0x0f (pid of mysqld)

The benchmark result improved, but not as dramatically as I had hoped:

Database on DRBD Device
insert test: 938.00 sec.

Should I have expected a more dramatic improvement? What else can I do
to get to the bottom of the poor performance of DRBD in my setup?
Thoughts?

Sam
Todd Denniston
17 years ago
Permalink
...
out of curiosity, does spreading pining the drbd threads to multiple cores in
the first CPU work better or worse than pinning them all to just one core in
that CPU?
i.e., (_assuming_ CPU0 holds cores 0x80, 0x40, 0x20 & 0x10)
taskset -p 0x80 drbd_pid_1
taskset -p 0x40 drbd_pid_2
taskset -p 0x20 drbd_pid_3
taskset -p 0x10 drbd_pid_4

taskset -p 0x80 drbd_pid_5
taskset -p 0x40 drbd_pid_6
taskset -p 0x20 drbd_pid_7
taskset -p 0x10 drbd_pid_8 ...


and did you try Florian's second suggestion of pinning all of the MySQLs to
_one_core_ in _one_CPU_?
taskset -p 0x02 (pid of mysqld)

That is would you be kind enough to run two or three more test sets?
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
Lars Ellenberg
17 years ago
Permalink
...
please do
one-node# ping -w 10 -f -s 4100 replication-link-ip-of-other-node
and show me the output.
--
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.
Lars Ellenberg
17 years ago
Permalink
Post by Lars Ellenberg
please do
one-node# ping -w 10 -f -s 4100 replication-link-ip-of-other-node
and show me the output.
also,

1)
drbdadm disconnect you-resource-name
drbd now StandAlone Primary/Unknown
dd if=/dev/zero bs=4096 count=10000 of=/some/file/on/your/drbd oflag=dsync

2)
drbdadm adjust all
wait for the resync
drbd now Connected Primary/Secondary
dd if=/dev/zero bs=4096 count=10000 of=/some/file/on/your/drbd oflag=dsync

3)
dd if=/dev/zero bs=4096 count=10000 of=/some/file/NOT/on/your/drbd oflag=dsync

do each dd command several times,
do this when nothing else happens on the box.

the important part here is the "dsync" flag.
if your dd does not know about that, upgrade.

the dd bs=4096 count=10000 oflag=sync
(many small requests, each single one of them synchronous by itself)
is to give an idea for the average latency of one single write request,
* with drbd disconnected
* with drbd connected
* without drbd (should be as close as possible to the lower level
device of drbd, preferably on the same hardware)

the "ping -w 10 -f -s 4100" is to give an idea for the
average round-tripp time between your nodes.

together, it gives an expectation
of what should theoretically be possible to reach.
--
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.
Art Age Software
17 years ago
Permalink
...
OK, here are the results:

Test 1: DRBD Disconnected

[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.14315 seconds, 13.0 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.05737 seconds, 13.4 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.08115 seconds, 13.3 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.17052 seconds, 12.9 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.0727 seconds, 13.3 MB/s


Test 2: DRBD Connected

[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 11.8043 seconds, 3.5 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 11.9506 seconds, 3.4 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 12.2863 seconds, 3.3 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 11.203 seconds, 3.7 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 11.212 seconds, 3.7 MB/s


Test 3: Non-DRBD

[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.14307 seconds, 13.0 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.98458 seconds, 13.7 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.95751 seconds, 13.8 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.90936 seconds, 14.1 MB/s
[node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.04481 seconds, 13.5 MB/s
Art Age Software
17 years ago
Permalink
please do one-node# ping -w 10 -f -s 4100 replication-link-ip-of-other-node and show me the output.
Lars,

Thanks for your help on this. Here is the output of the ping test.

[node1 ~]$ ping -w 10 -f -s 4100 node2
--- ping statistics ---
46900 packets transmitted, 46899 received, 0% packet loss, time 10000ms
rtt min/avg/max/mdev = 0.159/0.184/20.047/0.154 ms, pipe 2, ipg/ewma
0.213/0.193 ms

[node2 ~]$ ping -w 10 -f -s 4100 node1
--- ping statistics ---
48061 packets transmitted, 48060 received, 0% packet loss, time 10001ms
rtt min/avg/max/mdev = 0.154/0.180/20.333/0.172 ms, pipe 2, ipg/ewma
0.208/0.183 ms
Lars Ellenberg
17 years ago
Permalink
...
you have a very interessting maximum and a huge deviation there.

but, lets use the 0.180 ms average rtt of 4k packets.

averages from the dd commands below are

drbd disconnected: 0.310 ms per 4k request
drbd connected 1.170 ms per 4k request
non-drbd 0.300 ms per 4k request

I've also already seen non-drbd be slower than
drbd-unconnected on the same hardware,
there are funny effects in play.
but they are close within 3%, this is expected.

however your drbd-connected seems bad.
from ping rtt and non-drbd numbers we'd expect that
latency of drbd connected should be ~ 0.480 ms.
your measurement indicates it is worse than this
expectation by a factor of 2.5.

in all setups I have tuned so far,
the actual (measured) latency of drbd,
and the rough estimate given by said ping and dd commands
are very close.

so I suspect your secondaries ("node2") io subsystem is slower.
please verify.

other than that, pinning of drbd related threads to one CPU,
preferably the same where you pinned the NIC driver irq to,
could help to reduce latency.
...
--
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.
Art Age Software
17 years ago
Permalink
Lars,

Thanks for looking at this...
you have a very interesting maximum and a huge deviation there.
Does this reflect a problem with my TCP stack? What might be causing
the huge max?
but, lets use the 0.180 ms average rtt of 4k packets.
averages from the dd commands below are
drbd disconnected: 0.310 ms per 4k request
drbd connected 1.170 ms per 4k request
non-drbd 0.300 ms per 4k request
I've also already seen non-drbd be slower than
drbd-unconnected on the same hardware,
there are funny effects in play.
but they are close within 3%, this is expected.
Hmmm, it looks to me like non-drbd is **faster** than
drbd-disconnected from my numbers, which I would expect. Am I
mis-reading?
however your drbd-connected seems bad.
from ping rtt and non-drbd numbers we'd expect that
latency of drbd connected should be ~ 0.480 ms.
your measurement indicates it is worse than this
expectation by a factor of 2.5.
Yes, this is the crux of the problem I am experiencing - now confirmed
with MySQL out of the equation.
in all setups I have tuned so far,
the actual (measured) latency of drbd,
and the rough estimate given by said ping and dd commands
are very close.
so I suspect your secondaries ("node2") io subsystem is slower.
please verify.
The 2 nodes are identical - right down to the io subsystem (identical
RAID-10 hardware with battery-backed write-back cache enabled and
identical model hard drives).
other than that, pinning of drbd related threads to one CPU,
preferably the same where you pinned the NIC driver irq to,
could help to reduce latency.
I have not pinned NIC driver IRQs. (I don't know how.) I have pinned
the DRBD-related threads to a single CPU core and the test results
reflect that configuration.

I'm really at a loss here. Do you have any other suggestions for
getting to the bottom of this?

Should I disable irqbalance daemon? (I tried it and it seemed to make
no difference).

Should I disable SELinux?

Thanks,

Sam
Lars Ellenberg
17 years ago
Permalink
Post by Art Age Software
Lars,
Thanks for looking at this...
you have a very interesting maximum and a huge deviation there.
Does this reflect a problem with my TCP stack? What might be causing
the huge max?
sorry, my crystall ball is in cleansing atm.
...
what I meant is that they are close, and that I have seen on various
setup either the exact same values, or one or the other showing less
latency. yes, obviously in _your_ setup, non-drbd apears to have less
latency, as is naively expected anyways. but I wanted to point out that
there sometimes you also see the counter-intuitive thing, namely
disconnected drbd showing less latency than the underlying device by
itself. which is where said "funny effects" come into play.
...
did you _measure_ that.
or do you just "know" that.

because we already had the case that identical hardware showed a factor
100 different latency values. specifically, we just recently had two
DELL MD3000, one of which was showing 0.7 ms, the other one had
70.x ms. and yes, the additional latency was attributed to drbd at
first, as well. only that on the same boxes, we had also other storage,
and there it behaved nicely. we have not been able to track it down to
the cause, but believe that resetting the storage made the effect
"go away".

so do measure.
Post by Art Age Software
Should I disable SELinux?
that is an interessting question.
I have no idea in how far this could affect latency.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
Art Age Software
17 years ago
Permalink
...
Got it. Thanks for the clarification.
Post by Lars Ellenberg
Post by Art Age Software
The 2 nodes are identical - right down to the io subsystem (identical
RAID-10 hardware with battery-backed write-back cache enabled and
identical model hard drives).
did you _measure_ that.
or do you just "know" that.
Well, I ran the MySQL benchmarks **without** DRBD on both machines and
got similar results. However, I will re-run the dd tests in the other
direction and report back the results.

Sam
Art Age Software
17 years ago
Permalink
Lars,

Here are the results of the dd test on node2. Results look similar...
...
Test 1: DRBD Disconnected

[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.03173 seconds, 13.5 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.00959 seconds, 13.6 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.99447 seconds, 13.7 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.9992 seconds, 13.7 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.0024 seconds, 13.6 MB/s



Test 2: DRBD Connected

[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 18.553 seconds, 2.2 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 17.0005 seconds, 2.4 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 12.2118 seconds, 3.4 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 12.1346 seconds, 3.4 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 12.0736 seconds, 3.4 MB/s


Test 3: Non-DRBD

[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.14032 seconds, 13.0 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.09048 seconds, 13.3 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.0784 seconds, 13.3 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.08517 seconds, 13.3 MB/s
[node2 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.06288 seconds, 13.4 MB/s
Art Age Software
17 years ago
Permalink
I have run some additional tests:

1) Disabled bonding on the network interfaces (both nodes). No
significant change.

2) Changed the DRBD communication interface. Was using a direct
crossover connection between the on-board NICs of the servers. I
switched to Intel Gigabit NIC cards in both machines, connecting
through a Gigabit switch. No significant change.

3) Ran a file copy from node1 to node2 via scp. Even with the
additional overhead of scp, I get a solid 65 MB/sec. throughput.

So, at this stage I have seemingly ruled out:

1) Slow IO subsystem (both machines measured and check out fine).

2) Bonding driver (additional latency)

3) On-board NICs (hardware/firmware problem)

4) Network copy speed.

What's left? I'm stumped as to why DRBD can only do about 3.5 BM/sec.
on this very fast hardware.

Sam
Matteo Tescione
17 years ago
Permalink
Sorry if already asked, but are you using hardware raid or software raid? If
so, is it raid 5/6 ? I discovered an huge hole in performance like your
reports using that kind of setup. Search in the list for previous posts
about performance solved.

Regards,

--
#Matteo Tescione
#RMnet srl
...
Art Age Software
17 years ago
Permalink
Hardware RAID-10. There is no problem with the disks. We have measured
raw I/O performance through the RAID on both nodes.
...
Matteo Tescione
17 years ago
Permalink
Ok, can you show the output of iostat -x -m 1 during your drbd testcase
and, if possible, of your raw raid subsystem??
Look at svctime, await and %cputil, they all help you investigating further.
Regards,
--matteo
...
Lars Ellenberg
17 years ago
Permalink
Post by Art Age Software
1) Disabled bonding on the network interfaces (both nodes). No
significant change.
2) Changed the DRBD communication interface. Was using a direct
crossover connection between the on-board NICs of the servers. I
switched to Intel Gigabit NIC cards in both machines, connecting
through a Gigabit switch. No significant change.
3) Ran a file copy from node1 to node2 via scp. Even with the
additional overhead of scp, I get a solid 65 MB/sec. throughput.
this is streaming.
completely different than what we measured below.
Post by Art Age Software
1) Slow IO subsystem (both machines measured and check out fine).
2) Bonding driver (additional latency)
3) On-board NICs (hardware/firmware problem)
4) Network copy speed.
What's left? I'm stumped as to why DRBD can only do about 3.5 BM/sec.
on this very fast hardware.
doing one-by-one synchronous 4k writes, which are latency bound.
if you do streaming writes, it probably get up to your 65 MB/sec again.

sorry, latency tuning can be complex,
and is not easily covered by "general advice".
--
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.
Art Age Software
17 years ago
Permalink
...
Ok, but we have tested that with and without DRBD by the dd command,
right? So at this point, by all tests performed so far, it looks like
DRBD is the bottleneck. What other tests can I perform that can say
otherwise?
Post by Lars Ellenberg
sorry, latency tuning can be complex,
and is not easily covered by "general advice".
--
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
http://lists.linbit.com/mailman/listinfo/drbd-user
Lars Ellenberg
17 years ago
Permalink
...
sure.
but comparing 3.5 (with drbd) against 13.5 (without drbd) is bad enough,
no need to now compare it with some streaming number (65) to make it
look _really_ bad ;-)
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
Art Age Software
17 years ago
Permalink
...
Sorry, my intent was not to make DRBD look bad. I think DRBD is
**fantastic** and I just want to get it working properly. My point in
trying the streaming test was simply to make sure that there was
nothing totally broken on the network side. I suppose I should also
try a streaming test to the DRBD device and compare that to the raw
streaming number. And, back to my last question: What other tests can
I perform at this point to narrow down the source of the (latency?)
problem?
Carlos Xavier
17 years ago
Permalink
Hi,
I have been following this thread since i want to do a very similar
configuration.

The system is running on Dell 1435SC each one with 2 dual core AMD Opteron
and 4GB of ram.
the network cards are:
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit
Ethernet PCI Express (rev 21)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit
Ethernet PCI Express (rev 21)
06:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet
Controller (Copper) (rev 06)

Right now it is running a OCFS2 over DRBD and we dont have Myqld database
over it yet. I run the commands to see the throughput of the write on the
disk. As you can see bellow is that when the DRBD is up and connected the
througput fall a litle below the middle of the value we got with it
disconnected.

DRBD and OCFS2 cluster connected

***@apolo1:~# dd if=/dev/zero bs=4096 count=10000 of=/clusterdisk/testfile
oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.89017 s, 10.5 MB/s


DRBD connected and OCFS2 remote disconnected
***@apolo1:~# dd if=/dev/zero bs=4096 count=10000 of=/clusterdisk/testfile
oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.65195 s, 11.2 MB/s

DRBD remote stopped and OCFS2 local mounted
***@apolo1:~# dd if=/dev/zero bs=4096 count=10000 of=/clusterdisk/testfile
oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 1.50187 s, 27.3 MB/s

Regards,
Carlos.


----- Original Message -----
From: "Art Age Software" <artagesw-***@public.gmane.org>
To: <drbd-user-63ez5xqkn6DQT0dZR+***@public.gmane.org>
Sent: Thursday, December 20, 2007 7:35 PM
Subject: Re: [DRBD-user] MySQL-over-DRBD Performance
...
Art Age Software
17 years ago
Permalink
Well, at least you are getting much better performance than I am getting.

I don't understand why even my local write performance is so much
worse than yours. What sort of disk subsystem are you using?
...
Carlos Xavier
17 years ago
Permalink
Im sorry for the big delay on the answer, I was on vacation.

I got 2 clusters running, one with Dell PowerEdge SC 1435 and controller
BCM5785 and another on Dell PowerEdge 1900 with controller SAS1068.
On both systems the disks used are sata disk WDC WD2500JS.


----- Original Message -----
From: "Art Age Software" <artagesw-***@public.gmane.org>
To: <drbd-user-63ez5xqkn6DQT0dZR+***@public.gmane.org>
Sent: Friday, December 21, 2007 6:05 PM
Subject: Re: [DRBD-user] MySQL-over-DRBD Performance
...
Carlos Xavier
17 years ago
Permalink
Hi Sam,
Im sorry for being late once again, I´m still onvacation.

----- Original Message -----
From: "Art Age Software" <artagesw-***@public.gmane.org>
To: "Carlos Xavier" <cbastos-y1ricOmiHYbtqW+***@public.gmane.org>
Sent: Friday, January 25, 2008 6:28 PM
Subject: Re: [DRBD-user] MySQL-over-DRBD Performance
Hi Carlos,
Do your Dell PowerEdge SC 1435 servers use the Dell-supplied SAS 5/iR
Adapter adapter? That is what I am using, and my benchmark results are
abysmal.
The system do not have any optional controller. The controller in use is
provided by the cipset of the mainboard.
The controler is a Broadcom BCM5785 [HT1000]
The following results are direct to disk (no DRBD).
# dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.178766 seconds, 229 MB/s
# dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 275.086 seconds, 149 kB/s
What do you make of that?
I cant reproduce your tests righ now, since the system is on production, but
soon I´ll be able to. We are migrating the system.
Thanks,
Sam
Regards,
Carlos.
...
Ben Clewett
17 years ago
Permalink
Post by Carlos Xavier
Hi Sam,
Im sorry for being late once again, I´m still onvacation.
Sent: Friday, January 25, 2008 6:28 PM
Subject: Re: [DRBD-user] MySQL-over-DRBD Performance
Hi Carlos,
Do your Dell PowerEdge SC 1435 servers use the Dell-supplied SAS 5/iR
Adapter adapter? That is what I am using, and my benchmark results are
abysmal.
The system do not have any optional controller. The controller in use is
provided by the cipset of the mainboard.
The controler is a Broadcom BCM5785 [HT1000]
Hi Carlos,

My self and other people on this mailing list have reported problems
with Boardcom NICs. The solution in my case was to upgrade the firmware
*and* the drivers to the latest available, which fixed this problem for me.

I believe some Dell users also removed the TOC jumper on their NICs to
archive the same result.

Let us know...

Ben
...
*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
Registered in England No: 02017435, Registered Address: Charter Court,
Midland Road, Hemel Hempstead, Hertfordshire, HP2 5GE.
*************************************************************************
Art Age Software
17 years ago
Permalink
...
This is a bit over my head - but I will look into it.
Post by Lars Ellenberg
you can also verify wether the network bonding introduces additional latency.
Yes, I was planning to investigate this further. But since performance
of the resync was fine, it did not seem that bonding was causing any
issues.

What write performance overhead should I expect from DRBD in a
configuration like mine? (And thanks for the suggestions.)

Sam
Loading...