Discussion:
Drbd 0.7.12 - Buffer I/O error on device drbd0
Poyner, Brandon
2005-08-25 20:33:16 UTC
Permalink
I began the installation of a new pair of servers today so I downloaded
drbd 0.7.12 for them. These servers were fully functional under drbd
0.6.x last week, I just wiped them and installed RHEL 4 AS. Everything
was ok for the first hour or so, but now I'm getting this error from the
kernel. Ideas?

Aug 25 15:05:27 artemis kernel: drbd: initialised. Version: 0.7.12
(api:77/proto:74)
Aug 25 15:05:27 artemis kernel: drbd: SVN Revision: 1924 build by
root-UY1Nr1VP/mSsycaad8f+***@public.gmane.org, 2005-08-25 14:04:46
Aug 25 15:05:27 artemis kernel: drbd: registered as block device major
147
Aug 25 15:05:33 artemis kernel: drbd0: resync bitmap: bits=7831552
words=244736
Aug 25 15:05:33 artemis kernel: drbd0: size = 29 GB (31326208 KB)
Aug 25 15:05:34 artemis kernel: drbd0: 0 KB marked out-of-sync by on
disk bit-map.
Aug 25 15:05:34 artemis kernel: drbd0: No usable activity log found.
Aug 25 15:05:34 artemis kernel: drbd0: drbdsetup [27251]: cstate
Unconfigured --> StandAlone
Aug 25 15:05:34 artemis kernel: drbd0: drbdsetup [27264]: cstate
StandAlone --> Unconnected
Aug 25 15:05:34 artemis kernel: drbd0: drbd0_receiver [27265]: cstate
Unconnected --> WFConnection
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]: cstate
WFConnection --> WFReportParams
Aug 25 15:05:37 artemis kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Aug 25 15:05:37 artemis kernel: drbd0: Connection established.
Aug 25 15:05:37 artemis kernel: drbd0: I am(S):
1:00000002:00000001:00000002:00000001:01
Aug 25 15:05:37 artemis kernel: drbd0: Peer(S):
1:00000002:00000001:00000001:00000001:01
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]: cstate
WFReportParams --> WFBitMapS
Aug 25 15:05:37 artemis kernel: drbd0: Secondary/Unknown -->
Secondary/Secondary
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]: cstate
WFBitMapS --> SyncSource
Aug 25 15:05:37 artemis kernel: drbd0: Resync started as SyncSource
(need to sync 0 KB [0 bits set]).
Aug 25 15:05:37 artemis kernel: drbd0: Resync done (total 1 sec; paused
0 sec; 0 K/sec)
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]: cstate
SyncSource --> Connected
Aug 25 15:08:43 artemis kernel: drbd0: Secondary/Secondary -->
Secondary/Primary
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 0
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 1
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 2
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 3
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 4
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 5
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 6
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 7
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 8
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 9
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 10
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 11
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 12
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 13
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 14
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 15
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 0
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 1
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 2
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 3
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 4
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 5
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 6
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 7
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 8
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 9
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 10
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 11
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 12
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 13
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 14
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 15
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 0
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 0


global {
minor-count 3;
dialog-refresh 10;
}

resource web {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ;
halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error panic;
}
net {
on-disconnect reconnect;
}
syncer {
rate 100M;
group 1;
al-extents 257;
}
on apollo.ccac.edu {
device /dev/drbd0;
disk /dev/sys/web;
address 192.168.9.81:7788;
meta-disk internal;
}
on artemis.ccac.edu {
device /dev/drbd0;
disk /dev/sys/web;
address 192.168.9.82:7788;
meta-disk internal;
}
}


Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Poyner, Brandon
2005-08-25 20:38:19 UTC
Permalink
Actually, this might be related to patching the system's lvm2. The
underlying device is an LVM partition. I added drbd to the lvm2 filter.
Would this be a good assumption?

Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086




_____

From: Poyner, Brandon
Sent: Thursday, August 25, 2005 4:33 PM
To: drbd-user-63ez5xqkn6DQT0dZR+***@public.gmane.org
Subject: Drbd 0.7.12 - Buffer I/O error on device drbd0


I began the installation of a new pair of servers today so I
downloaded drbd 0.7.12 for them. These servers were fully functional
under drbd 0.6.x last week, I just wiped them and installed RHEL 4 AS.
Everything was ok for the first hour or so, but now I'm getting this
error from the kernel. Ideas?

Aug 25 15:05:27 artemis kernel: drbd: initialised. Version:
0.7.12 (api:77/proto:74)
Aug 25 15:05:27 artemis kernel: drbd: SVN Revision: 1924 build
by root-UY1Nr1VP/mSsycaad8f+***@public.gmane.org, 2005-08-25 14:04:46
Aug 25 15:05:27 artemis kernel: drbd: registered as block device
major 147
Aug 25 15:05:33 artemis kernel: drbd0: resync bitmap:
bits=7831552 words=244736
Aug 25 15:05:33 artemis kernel: drbd0: size = 29 GB (31326208
KB)
Aug 25 15:05:34 artemis kernel: drbd0: 0 KB marked out-of-sync
by on disk bit-map.
Aug 25 15:05:34 artemis kernel: drbd0: No usable activity log
found.
Aug 25 15:05:34 artemis kernel: drbd0: drbdsetup [27251]: cstate
Unconfigured --> StandAlone
Aug 25 15:05:34 artemis kernel: drbd0: drbdsetup [27264]: cstate
StandAlone --> Unconnected
Aug 25 15:05:34 artemis kernel: drbd0: drbd0_receiver [27265]:
cstate Unconnected --> WFConnection
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]:
cstate WFConnection --> WFReportParams
Aug 25 15:05:37 artemis kernel: drbd0: Handshake successful:
DRBD Network Protocol version 74
Aug 25 15:05:37 artemis kernel: drbd0: Connection established.
Aug 25 15:05:37 artemis kernel: drbd0: I am(S):
1:00000002:00000001:00000002:00000001:01
Aug 25 15:05:37 artemis kernel: drbd0: Peer(S):
1:00000002:00000001:00000001:00000001:01
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]:
cstate WFReportParams --> WFBitMapS
Aug 25 15:05:37 artemis kernel: drbd0: Secondary/Unknown -->
Secondary/Secondary
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]:
cstate WFBitMapS --> SyncSource
Aug 25 15:05:37 artemis kernel: drbd0: Resync started as
SyncSource (need to sync 0 KB [0 bits set]).
Aug 25 15:05:37 artemis kernel: drbd0: Resync done (total 1 sec;
paused 0 sec; 0 K/sec)
Aug 25 15:05:37 artemis kernel: drbd0: drbd0_receiver [27265]:
cstate SyncSource --> Connected
Aug 25 15:08:43 artemis kernel: drbd0: Secondary/Secondary -->
Secondary/Primary
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no
IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 0
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no
IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 1
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no
IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 2
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no
IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 3
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no
IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 4
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 5
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 6
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 7
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 8
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 9
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 10
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 11
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 12
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 13
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 14
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 15
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 0
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 1
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 2
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 3
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 4
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 5
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 6
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 7
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 8
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 9
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 10
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 11
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 12
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 13
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 14
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 15
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 0
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device
drbd0, logical block 0


global {
minor-count 3;
dialog-refresh 10;
}

resource web {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep
60 ; halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error panic;
}
net {
on-disconnect reconnect;
}
syncer {
rate 100M;
group 1;
al-extents 257;
}
on apollo.ccac.edu {
device /dev/drbd0;
disk /dev/sys/web;
address 192.168.9.81:7788;
meta-disk internal;
}
on artemis.ccac.edu {
device /dev/drbd0;
disk /dev/sys/web;
address 192.168.9.82:7788;
meta-disk internal;
}
}


Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Lars Ellenberg
2005-08-26 09:59:26 UTC
Permalink
/ 2005-08-25 16:33:16 -0400
Post by Poyner, Brandon
I began the installation of a new pair of servers today so I downloaded
drbd 0.7.12 for them. These servers were fully functional under drbd
0.6.x last week, I just wiped them and installed RHEL 4 AS. Everything
was ok for the first hour or so, but now I'm getting this error from the
kernel. Ideas?
Aug 25 15:08:43 artemis kernel: drbd0: Secondary/Secondary --> Secondary/Primary
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO requests allowed
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0, logical block 0
what else can we do than give error messages in plain english?
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Poyner, Brandon
2005-08-26 13:12:13 UTC
Permalink
Post by Lars Ellenberg
Post by Poyner, Brandon
Aug 25 15:08:43 artemis kernel: drbd0: Secondary/Secondary -->
Secondary/Primary
Post by Lars Ellenberg
Post by Poyner, Brandon
Aug 25 16:18:53 artemis kernel: drbd0: Not in Primary state, no IO
requests allowed
Post by Lars Ellenberg
Post by Poyner, Brandon
Aug 25 16:18:53 artemis kernel: Buffer I/O error on device drbd0,
logical block 0
Post by Lars Ellenberg
what else can we do than give error messages in plain english?
Fair enough, but _I_ wasn't requesting I/O on the drbd device. I didn't
attempt mounting the device, read it with dd, snap shot it, or otherwise
mess with it.

It appears that when installing a kernel RPM, RHEL's %post does a
/sbin/new-kernel-pkg which does a 'echo "showlabels" | /sbin/nash
--force --quiet' which makes the kernel throw those error messages. I'm
not sure why it throws that error as that command displays the uuid
cache.

Not exactly intuitive where the error came from, especially if you're
using RHN and RHEL is patching itself. A solution is to have a line
excluding drbd from lvm.conf, example:

filter = [ "r/drbd/", "a/.*/" ]

Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Poyner, Brandon
2005-08-26 14:05:03 UTC
Permalink
Post by Poyner, Brandon
filter = [ "r/drbd/", "a/.*/" ]
Actually that doesn't prevent the kernel error when running 'echo
showlabels | nash'. It only stops errors from lvm commands such as
vgscan.

To me I/O can mean read or write. Could the error message be made a bit
more clear that a read request is being made? It's worth panicing if I
see a drbd write error on the secondary. Perhaps something like this.
Thanks.

===================================================================
RCS file: drbd_req.c,v
retrieving revision 1.1
diff -c -r1.1 drbd_req.c
*** drbd_req.c 2005/08/26 13:52:47 1.1
--- drbd_req.c 2005/08/26 13:55:41
***************
*** 196,203 ****
if (mdev->state != Primary &&
( !disable_bd_claim || rw == WRITE ) ) {
if (DRBD_ratelimit(5*HZ,5)) {
! ERR("Not in Primary state, no %s requests
allowed\n",
! disable_bd_claim ? "WRITE" :
"IO");
}
drbd_bio_IO_error(bio);
return 0;
--- 196,203 ----
if (mdev->state != Primary &&
( !disable_bd_claim || rw == WRITE ) ) {
if (DRBD_ratelimit(5*HZ,5)) {
! ERR("Not in Primary state, %s request not
allowed\n",
! (rw == WRITE) ? "WRITE" :
"READ");
}
drbd_bio_IO_error(bio);
return 0;


Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Lars Ellenberg
2005-08-26 14:18:20 UTC
Permalink
/ 2005-08-26 10:05:03 -0400
Post by Poyner, Brandon
Post by Poyner, Brandon
filter = [ "r/drbd/", "a/.*/" ]
Actually that doesn't prevent the kernel error when running 'echo
showlabels | nash'. It only stops errors from lvm commands such as
vgscan.
To me I/O can mean read or write. Could the error message be made a bit
more clear that a read request is being made? It's worth panicing if I
see a drbd write error on the secondary. Perhaps something like this.
definetely not. IO means read _and_ write. both are not allowed.

if you see this message, it is some application trying to access
/dev/drbd while it is in secondary state.

this message is never logged for drbd internal requests, those go to the
lower level device, and if such io fails, depending on your config,
a detach or panic will follow... you for sure will notice.
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Poyner, Brandon
2005-08-26 14:44:33 UTC
Permalink
Post by Lars Ellenberg
Post by Poyner, Brandon
To me I/O can mean read or write. Could the error message
be made a bit more clear that a read request is being made?
It's worth panicing if I see a drbd write error on the
secondary. Perhaps something like this.
definetely not. IO means read _and_ write. both are not allowed.
Unless I'm mistaken an I/O request can either be read or
write to a single device with single channel access, but not
both read and write _at the same time_. Are we talking about
completely different things? All I'm really saying is that
logging 'no IO requests allowed' doesn't tell me if the I/O
request was read or write. Many system utilities open the
device read only to try to probe it, and that's not such a
worry when done read only. As you say both modes aren't
allowed and that's fine, but the difference seems significant to me.
Post by Lars Ellenberg
if you see this message, it is some application trying to access
/dev/drbd while it is in secondary state.
this message is never logged for drbd internal requests,
those go to the lower level device, and if such io fails,
depending on your config, a detach or panic will follow...
you for sure will notice.
I follow you here.

Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Lars Ellenberg
2005-08-26 15:17:07 UTC
Permalink
/ 2005-08-26 10:44:33 -0400
Post by Poyner, Brandon
Post by Lars Ellenberg
Post by Poyner, Brandon
To me I/O can mean read or write. Could the error message
be made a bit more clear that a read request is being made?
It's worth panicing if I see a drbd write error on the
secondary. Perhaps something like this.
definetely not. IO means read _and_ write. both are not allowed.
All I'm really saying is that
logging 'no IO requests allowed' doesn't tell me if the I/O
request was read or write.
and it does not matter.
Post by Poyner, Brandon
Many system utilities open the
device read only to try to probe it, and that's not such a
worry when done read only. As you say both modes aren't
allowed and that's fine, but the difference seems significant to me.
well, if someone tries to write to the secondary,
he gets an io error, and the message is "no io allowed".
if someone tries to read from the secondary,
he gets an io error, and the message is "no io allowed".
where is the problem?

I can make the message read
"neither read nor write requests allowed while in secondary mode"

if you are worried about failing writes,
watch out for kernel messages like
"lost page write due to I/O error on <devicename>"
and similar...

ideally, if not primary, we'd disallow on open (/dev/drbd),
in which case the read could not be requested in the first place...

but since the configuration is done by ioctl on the device node,
we have to allow at least a readonly open even on an unconfigured device,
otherwise we could not configure it (or make a secondary primary or...).

configuring a block device by ioctl via the device node itself is broken
by design. we know that. we apologize. it is there for historic reasons.

we plan to move to the configfs interface,
once that proves suitable and is included in the mainline kernel.
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Poyner, Brandon
2005-08-26 15:39:26 UTC
Permalink
Post by Lars Ellenberg
well, if someone tries to write to the secondary,
he gets an io error, and the message is "no io allowed".
if someone tries to read from the secondary,
he gets an io error, and the message is "no io allowed".
where is the problem?
The problem is the assumption that I know what was going on. It's one
thing if I were trying to mount the device, I could directly see the
relation between my commands and the error. It's another matter when
that error shows up over night for no obvious reason. Normally I'd be
left wondering 1) what tried to access the device 2) was the IO request
read or write 3) and is this a serious error? I don't think the drbd
module would have direct access to #1 (yes/no?). It should know the
answer to #2, and #3 depends on #1 and #2. If I only had access to #2 I
could make a more informed decision.
Post by Lars Ellenberg
I can make the message read
"neither read nor write requests allowed while in secondary mode"
if you are worried about failing writes,
watch out for kernel messages like
"lost page write due to I/O error on <devicename>"
and similar...
Ok, that's helpful, but I stand by my assertion that clarifying the IO
type would help me figure out the gravity of the situation.

Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Lars Ellenberg
2005-08-26 15:53:43 UTC
Permalink
/ 2005-08-26 11:39:26 -0400
Post by Poyner, Brandon
Post by Lars Ellenberg
well, if someone tries to write to the secondary,
he gets an io error, and the message is "no io allowed".
if someone tries to read from the secondary,
he gets an io error, and the message is "no io allowed".
where is the problem?
The problem is the assumption that I know what was going on. It's one
thing if I were trying to mount the device, I could directly see the
relation between my commands and the error. It's another matter when
that error shows up over night for no obvious reason. Normally I'd be
left wondering 1) what tried to access the device 2) was the IO request
read or write 3) and is this a serious error? I don't think the drbd
module would have direct access to #1 (yes/no?). It should know the
answer to #2, and #3 depends on #1 and #2. If I only had access to #2 I
could make a more informed decision.
#1: we could log the current comm and pid, if that helps.
#2: see below.
Post by Poyner, Brandon
Post by Lars Ellenberg
I can make the message read
"neither read nor write requests allowed while in secondary mode"
if you are worried about failing writes,
watch out for kernel messages like
"lost page write due to I/O error on <devicename>"
and similar...
Ok, that's helpful, but I stand by my assertion that clarifying the IO
type would help me figure out the gravity of the situation.
as a secondary cannot be opened with write intent,
write requests cannot be submitted.
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Poyner, Brandon
2005-08-26 15:56:10 UTC
Permalink
Post by Lars Ellenberg
1) what tried to access the device 2) was the IO request read or
write
Post by Lars Ellenberg
#1: we could log the current comm and pid, if that helps.
#2: see below.
That would be perfect. With everything else you explained, that would
do nicely.
Post by Lars Ellenberg
Ok, that's helpful, but I stand by my assertion that
clarifying the IO
type would help me figure out the gravity of the situation.
as a secondary cannot be opened with write intent,
write requests cannot be submitted.
Brandon Poyner
Network Engineer III
CCAC - College Office
412-237-3086
Lars Ellenberg
2005-08-26 16:34:20 UTC
Permalink
/ 2005-08-26 11:56:10 -0400
Post by Poyner, Brandon
Post by Lars Ellenberg
1) what tried to access the device 2) was the IO request read or
write
Post by Lars Ellenberg
#1: we could log the current comm and pid, if that helps.
#2: see below.
That would be perfect. With everything else you explained, that would
do nicely.
Index: drbd_req.c
===================================================================
--- drbd_req.c (Revision 1934)
+++ drbd_req.c (Arbeitskopie)
@@ -196,8 +196,11 @@
if (mdev->state != Primary &&
( !disable_bd_claim || rw == WRITE ) ) {
if (DRBD_ratelimit(5*HZ,5)) {
- ERR("Not in Primary state, no %s requests allowed\n",
- disable_bd_claim ? "WRITE" : "IO");
+ ERR("Not in Primary state, no %s requests allowed "
+ "(%s[%u]; %s)\n",
+ disable_bd_claim ? "WRITE" : "IO",
+ current->comm, current->pid,
+ (rw & WRITE) ? "WRITE" : "READ" );
}
drbd_bio_IO_error(bio);
return 0;


output now looks like
head -c1 < /dev/drbd0
drbd0: Not in Primary state, no IO requests allowed (head[927]; READ)

dd if=/dev/drbd0 bs=1 count=1
drbd0: Not in Primary state, no IO requests allowed (dd[928]; READ)

cheers,
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
Loading...