Discussion:
Kernel panic on 6.1: init dies under load
Dan Cross
2017-05-15 14:28:42 UTC
Permalink
Synopsis: init dies causing kernel panic on virtualized hosts.
Category: system
System : OpenBSD 6.1
Details : OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST
2017
***@syspatch-61-amd64.openbsd.org:
/usr/src/sys/arch/amd64/compile/GENERIC

Architecture: OpenBSD.amd64
Machine : amd64
Kernel panics under moderate/heavy load when running under a
hypervisor (I believe my VPS provider is using Xen); init(8)
dies and the machine panics. `boot sync` does not work and
the filesystem requires manual fsck on reboot.

I have not seen this on harware.

Console data from the panic is as follows:

: tempest; cat panic
coredump of syslogd(94574), write failed: errno 14
coredump of init(1), write failed: errno 14
panic: init died (signal 10, exit 0)
Stopped at Debugger+0x9: leave
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*285197 1 0 0x802 0x2000 0 init
Debuggger() at Debugger+0x9
panic() at panic+0xfe
exit1() at exit1+0x58d
trapsignal() at trapsignal+0x110
trap() at trap+0x309
--- trap (number 4) ---
end of kernel
end trace fram: 0xff, count: 10
0x18057281cfdc
https://www.openbsd.org/ddb.html describes the minimum info
required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb>
: tempest;
Run some CPU/memory intensive workload; for example, rebuilding
the Go compiler and toolchain. Occasionally the system will
survive,
but gets into a state where processes are dying.
Unknown.


dmesg:
OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST 2017
***@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/
amd64/compile/GENERIC
real mem = 520093696 (496MB)
avail mem = 499785728 (476MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xeb01f (10 entries)
bios0: vendor Xen version "3.4.4" date 07/15/2016
bios0: Xen HVM domU
acpi0 at bios0: rev 2
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 48 pins
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz, 2267.15 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"PNP0F13" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0400" at acpi0 not configured
pvbus0 at mainbus0: Xen 3.4
xen0 at pvbus0: features 0x5, 32 grant table frames, event channel 2
"vkbd" at xen0: device/vkbd/0 not configured
"vfb" at xen0: device/vfb/0 not configured
xbf0 at xen0 backend 0 channel 4: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, file hda 768, 0000> SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
xnf0 at xen0 backend 0 channel 5: address 00:16:3e:15:9a:43
xnf1 at xen0 backend 0 channel 6: address 00:16:3e:48:5b:04
"console" at xen0: device/console/0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel
0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x01: SMBus
disabled
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
xspd0 at pci0 dev 3 function 0 "XenSource Platform Device" rev 0x01: apic 1
int 28
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: density unknown
fd1 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e0bfc277bba6b729.a) swap on sd0b dump on sd0b

usbdevs:
usbdevs: no USB controllers found
Mike Belopuhov
2017-05-15 15:01:44 UTC
Permalink
Hi,

Thanks for reporting this, however there's not enough info to follow
up on this right now. What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts). We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x. So, in a way, you
can consider Xen 3.x to not be officially supported at this point.

Having said that, I've got a few questions:

- Do you see other write failures as well?

- Do you have swap enabled? (pstat -s)

- Do you see crashes when bsd.mp is used instead of a single processor
kernel (that's right, even on the single processor VM)?

Regards,
Mike
Post by Dan Cross
Synopsis: init dies causing kernel panic on virtualized hosts.
Category: system
System : OpenBSD 6.1
Details : OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST
2017
/usr/src/sys/arch/amd64/compile/GENERIC
Architecture: OpenBSD.amd64
Machine : amd64
Kernel panics under moderate/heavy load when running under a
hypervisor (I believe my VPS provider is using Xen); init(8)
dies and the machine panics. `boot sync` does not work and
the filesystem requires manual fsck on reboot.
I have not seen this on harware.
: tempest; cat panic
coredump of syslogd(94574), write failed: errno 14
coredump of init(1), write failed: errno 14
panic: init died (signal 10, exit 0)
Stopped at Debugger+0x9: leave
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*285197 1 0 0x802 0x2000 0 init
Debuggger() at Debugger+0x9
panic() at panic+0xfe
exit1() at exit1+0x58d
trapsignal() at trapsignal+0x110
trap() at trap+0x309
--- trap (number 4) ---
end of kernel
end trace fram: 0xff, count: 10
0x18057281cfdc
https://www.openbsd.org/ddb.html describes the minimum info
required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb>
: tempest;
Run some CPU/memory intensive workload; for example, rebuilding
the Go compiler and toolchain. Occasionally the system will
survive,
but gets into a state where processes are dying.
Unknown.
OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST 2017
amd64/compile/GENERIC
real mem = 520093696 (496MB)
avail mem = 499785728 (476MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0: vendor Xen version "3.4.4" date 07/15/2016
bios0: Xen HVM domU
acpi0 at bios0: rev 2
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 48 pins
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
acpiprt0 at acpi0: bus 0 (PCI0)
"PNP0F13" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0400" at acpi0 not configured
pvbus0 at mainbus0: Xen 3.4
xen0 at pvbus0: features 0x5, 32 grant table frames, event channel 2
"vkbd" at xen0: device/vkbd/0 not configured
"vfb" at xen0: device/vfb/0 not configured
xbf0 at xen0 backend 0 channel 4: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, file hda 768, 0000> SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
xnf0 at xen0 backend 0 channel 5: address 00:16:3e:15:9a:43
xnf1 at xen0 backend 0 channel 6: address 00:16:3e:48:5b:04
"console" at xen0: device/console/0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel
0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x01: SMBus
disabled
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
xspd0 at pci0 dev 3 function 0 "XenSource Platform Device" rev 0x01: apic 1
int 28
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: density unknown
fd1 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e0bfc277bba6b729.a) swap on sd0b dump on sd0b
usbdevs: no USB controllers found
Dan Cross
2017-05-15 15:18:16 UTC
Permalink
Post by Mike Belopuhov
Thanks for reporting this, however there's not enough info to follow
up on this right now. What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts). We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x. So, in a way, you
can consider Xen 3.x to not be officially supported at this point.
That's unfortunate. Sadly, this is common across two different providers
(Panix and rootbsd.net). The latter, I'm sure, would at least be interested
in coordinating with you guys to get a fix. I'll open a trouble ticket with
them.
Post by Mike Belopuhov
- Do you see other write failures as well?
Yes. E.g, syslogd had a similar write failure before panic.

- Do you have swap enabled? (pstat -s)


Yes; a gig:

: jaan; pstat -s
Device 1K-blocks Used Avail Capacity Priority
/dev/sd0b 1048249 0 1048249 0% 0
: jaan;

- Do you see crashes when bsd.mp is used instead of a single processor

kernel (that's right, even on the single processor VM)?
Yes; the panic happens whether using single- or multi-processor kernels.

- Dan C.


Regards,
Post by Mike Belopuhov
Mike
Post by Dan Cross
Synopsis: init dies causing kernel panic on virtualized hosts.
Category: system
System : OpenBSD 6.1
Details : OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST
2017
/usr/src/sys/arch/amd64/compile/GENERIC
Architecture: OpenBSD.amd64
Machine : amd64
Kernel panics under moderate/heavy load when running under a
hypervisor (I believe my VPS provider is using Xen); init(8)
dies and the machine panics. `boot sync` does not work and
the filesystem requires manual fsck on reboot.
I have not seen this on harware.
: tempest; cat panic
coredump of syslogd(94574), write failed: errno 14
coredump of init(1), write failed: errno 14
panic: init died (signal 10, exit 0)
Stopped at Debugger+0x9: leave
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*285197 1 0 0x802 0x2000 0 init
Debuggger() at Debugger+0x9
panic() at panic+0xfe
exit1() at exit1+0x58d
trapsignal() at trapsignal+0x110
trap() at trap+0x309
--- trap (number 4) ---
end of kernel
end trace fram: 0xff, count: 10
0x18057281cfdc
https://www.openbsd.org/ddb.html describes the minimum info
required in bug
reports. Insufficient info makes it difficult to find and fix
bugs.
Post by Dan Cross
ddb>
: tempest;
Run some CPU/memory intensive workload; for example, rebuilding
the Go compiler and toolchain. Occasionally the system will
survive,
but gets into a state where processes are dying.
Unknown.
OpenBSD 6.1 (GENERIC) #6: Sat May 6 09:33:26 CEST 2017
amd64/compile/GENERIC
real mem = 520093696 (496MB)
avail mem = 499785728 (476MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0: vendor Xen version "3.4.4" date 07/15/2016
bios0: Xen HVM domU
acpi0 at bios0: rev 2
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 48 pins
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
acpiprt0 at acpi0: bus 0 (PCI0)
"PNP0F13" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0400" at acpi0 not configured
pvbus0 at mainbus0: Xen 3.4
xen0 at pvbus0: features 0x5, 32 grant table frames, event channel 2
"vkbd" at xen0: device/vkbd/0 not configured
"vfb" at xen0: device/vfb/0 not configured
xbf0 at xen0 backend 0 channel 4: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, file hda 768, 0000> SCSI3 0/direct
fixed
Post by Dan Cross
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
xnf0 at xen0 backend 0 channel 5: address 00:16:3e:15:9a:43
xnf1 at xen0 backend 0 channel 6: address 00:16:3e:48:5b:04
"console" at xen0: device/console/0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA,
channel
Post by Dan Cross
0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x01: SMBus
disabled
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
apic 1
Post by Dan Cross
int 28
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: density unknown
fd1 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e0bfc277bba6b729.a) swap on sd0b dump on sd0b
usbdevs: no USB controllers found
Mike Belopuhov
2017-05-15 15:28:40 UTC
Permalink
Post by Dan Cross
Post by Mike Belopuhov
Thanks for reporting this, however there's not enough info to follow
up on this right now. What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts). We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x. So, in a way, you
can consider Xen 3.x to not be officially supported at this point.
That's unfortunate. Sadly, this is common across two different providers
(Panix and rootbsd.net). The latter, I'm sure, would at least be interested
in coordinating with you guys to get a fix. I'll open a trouble ticket with
them.
Post by Mike Belopuhov
- Do you see other write failures as well?
Yes. E.g, syslogd had a similar write failure before panic.
Can you reproduce any of these write failures at will?

What happens when you just send a signal to dump the core?
You can test this by running "sleep 100", and then call
"pkill -ABRT -lf sleep".
Post by Dan Cross
- Do you have swap enabled? (pstat -s)
: jaan; pstat -s
Device 1K-blocks Used Avail Capacity Priority
/dev/sd0b 1048249 0 1048249 0% 0
: jaan;
Do you see swap being used under your load?
Post by Dan Cross
- Do you see crashes when bsd.mp is used instead of a single processor
kernel (that's right, even on the single processor VM)?
Yes; the panic happens whether using single- or multi-processor kernels.
Good, nothing has slipped through those cracks again.
Dan Cross
2017-05-15 15:45:58 UTC
Permalink
Post by Dan Cross
Post by Dan Cross
Post by Mike Belopuhov
Thanks for reporting this, however there's not enough info to follow
up on this right now. What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts). We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x. So, in a way, you
can consider Xen 3.x to not be officially supported at this point.
That's unfortunate. Sadly, this is common across two different providers
(Panix and rootbsd.net). The latter, I'm sure, would at least be
interested
Post by Dan Cross
in coordinating with you guys to get a fix. I'll open a trouble ticket
with
Post by Dan Cross
them.
Post by Mike Belopuhov
- Do you see other write failures as well?
Yes. E.g, syslogd had a similar write failure before panic.
Can you reproduce any of these write failures at will?
I'm not sure what you mean. If I induce the load conditions, then the VM
will panic fairly reliably.

What happens when you just send a signal to dump the core?
Post by Dan Cross
You can test this by running "sleep 100", and then call
"pkill -ABRT -lf sleep".
I'm not sure what this shows, but sure I can do that:

: jaan; /bin/sleep 100&
[1] 20701
: jaan; pkill -ABRT -lf sleep
20701 sleep
: jaan;
[1] + abort (core dumped) /bin/sleep 100
: jaan; ls -l sleep.core
-rw------- 1 cross staff 4208416 May 15 15:42 sleep.core
: jaan;

The panic-inducing condition seems to be that, for whatever reason, the
kernel gets into a funny state where processes like init(8) die due to
having part of their VM image corrupted; the kernel then panics because
`init` dies.
Post by Dan Cross
- Do you have swap enabled? (pstat -s)
Post by Dan Cross
: jaan; pstat -s
Device 1K-blocks Used Avail Capacity Priority
/dev/sd0b 1048249 0 1048249 0% 0
: jaan;
Do you see swap being used under your load?
I'm not sure. I can try and crash a machine again and see poke at a kernel
var from ddb to see; anything in particular you want me to look at?
Post by Dan Cross
- Do you see crashes when bsd.mp is used instead of a single processor
Post by Dan Cross
kernel (that's right, even on the single processor VM)?
Yes; the panic happens whether using single- or multi-processor kernels.
Good, nothing has slipped through those cracks again.
I can see the value in narrowing down the search space. :-)

- Dan C.
Mike Belopuhov
2017-05-15 17:24:18 UTC
Permalink
Post by Dan Cross
Post by Dan Cross
Post by Dan Cross
Post by Mike Belopuhov
Thanks for reporting this, however there's not enough info to follow
up on this right now. What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts). We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x. So, in a way, you
can consider Xen 3.x to not be officially supported at this point.
That's unfortunate. Sadly, this is common across two different providers
(Panix and rootbsd.net). The latter, I'm sure, would at least be
interested
Post by Dan Cross
in coordinating with you guys to get a fix. I'll open a trouble ticket
with
Post by Dan Cross
them.
Post by Mike Belopuhov
- Do you see other write failures as well?
Yes. E.g, syslogd had a similar write failure before panic.
Can you reproduce any of these write failures at will?
I'm not sure what you mean. If I induce the load conditions, then the VM
will panic fairly reliably.
I was wondering if you have seen any other write errors apart
from those that cause the panic.
Post by Dan Cross
What happens when you just send a signal to dump the core?
Post by Dan Cross
You can test this by running "sleep 100", and then call
"pkill -ABRT -lf sleep".
There are quite a number of different I/O codepaths in the
kernel and some are wonkier than the other.
Post by Dan Cross
: jaan; /bin/sleep 100&
[1] 20701
: jaan; pkill -ABRT -lf sleep
20701 sleep
: jaan;
[1] + abort (core dumped) /bin/sleep 100
: jaan; ls -l sleep.core
-rw------- 1 cross staff 4208416 May 15 15:42 sleep.core
: jaan;
The panic-inducing condition seems to be that, for whatever reason, the
kernel gets into a funny state where processes like init(8) die due to
having part of their VM image corrupted; the kernel then panics because
`init` dies.
Post by Dan Cross
- Do you have swap enabled? (pstat -s)
Post by Dan Cross
: jaan; pstat -s
Device 1K-blocks Used Avail Capacity Priority
/dev/sd0b 1048249 0 1048249 0% 0
: jaan;
Do you see swap being used under your load?
I'm not sure. I can try and crash a machine again and see poke at a kernel
var from ddb to see; anything in particular you want me to look at?
Indeed. You can run a "show uvmexp" DDB command.

Please try running with the diff below. It will log all polled
and bounced transfers as well as some additional info.



diff --git sys/dev/pv/xbf.c sys/dev/pv/xbf.c
index d5c44770acb..29e7615d0fc 100644
--- sys/dev/pv/xbf.c
+++ sys/dev/pv/xbf.c
@@ -36,11 +36,11 @@
#include <scsi/scsi_all.h>
#include <scsi/cd.h>
#include <scsi/scsi_disk.h>
#include <scsi/scsiconf.h>

-/* #define XBF_DEBUG */
+#define XBF_DEBUG

#ifdef XBF_DEBUG
#define DPRINTF(x...) printf(x)
#else
#define DPRINTF(x...)
@@ -478,10 +478,11 @@ xbf_load_xs(struct scsi_xfer *xs, int desc)
sge->sge_first = i > 0 ? 0 :
((vaddr_t)xs->data & PAGE_MASK) >> XBF_SEC_SHIFT;
sge->sge_last = sge->sge_first +
(map->dm_segs[i].ds_len >> XBF_SEC_SHIFT) - 1;

+ if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s: seg %d/%d ref %lu len %lu first %u last %u\n",
sc->sc_dev.dv_xname, i + 1, map->dm_nsegs,
map->dm_segs[i].ds_addr, map->dm_segs[i].ds_len,
sge->sge_first, sge->sge_last);

@@ -640,10 +641,11 @@ xbf_submit_cmd(struct scsi_xfer *xs)
xrd->xrd_req.req_op = operation;
xrd->xrd_req.req_unit = (uint16_t)sc->sc_unit;
xrd->xrd_req.req_sector = lba;

if (operation == XBF_OP_READ || operation == XBF_OP_WRITE) {
+ if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s: desc %d %s%s lba %llu nsec %u len %d\n",
sc->sc_dev.dv_xname, desc, operation == XBF_OP_READ ?
"read" : "write", ISSET(xs->flags, SCSI_POLL) ? "-poll" :
"", lba, nblk, xs->datalen);

@@ -718,10 +720,11 @@ xbf_complete_cmd(struct scsi_xfer *xs, int desc)
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(sc->sc_dmat, map);

sc->sc_xs[desc] = NULL;

+ if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s: completing desc %d(%llu) op %u with error %d\n",
sc->sc_dev.dv_xname, desc, xrd->xrd_rsp.rsp_id,
xrd->xrd_rsp.rsp_op, xrd->xrd_rsp.rsp_status);

id = xrd->xrd_rsp.rsp_id;
Mike Belopuhov
2017-05-18 22:20:48 UTC
Permalink
Post by Mike Belopuhov
Indeed. You can run a "show uvmexp" DDB command.
Please try running with the diff below. It will log all polled
and bounced transfers as well as some additional info.
Hi,

While I'm still interested in the "show uvmexp" output , I'd like
to ask you to hold off the testing of this diff. I've identified
a few issues and working on resolving them.

Cheers,
Mike
Dan Cross
2017-05-19 00:48:55 UTC
Permalink
Thanks; sorry I've been sidetracked the last couple of days. Let me see if
I can get a machine to panic and grab the "show uvmexp" output.
Post by Mike Belopuhov
Post by Mike Belopuhov
Indeed. You can run a "show uvmexp" DDB command.
Please try running with the diff below. It will log all polled
and bounced transfers as well as some additional info.
Hi,
While I'm still interested in the "show uvmexp" output , I'd like
to ask you to hold off the testing of this diff. I've identified
a few issues and working on resolving them.
Cheers,
Mike
Dan Cross
2017-05-19 01:15:02 UTC
Permalink
Okay, here is the output. I apologize for the screen shot; there's no other
particularly great way to capture the console output from the VPS and I
don't trust myself to type it all in without making a mistake of some kind.
Post by Dan Cross
Thanks; sorry I've been sidetracked the last couple of days. Let me see if
I can get a machine to panic and grab the "show uvmexp" output.
Post by Mike Belopuhov
Post by Mike Belopuhov
Indeed. You can run a "show uvmexp" DDB command.
Please try running with the diff below. It will log all polled
and bounced transfers as well as some additional info.
Hi,
While I'm still interested in the "show uvmexp" output , I'd like
to ask you to hold off the testing of this diff. I've identified
a few issues and working on resolving them.
Cheers,
Mike
Mike Belopuhov
2017-05-19 19:58:53 UTC
Permalink
Post by Dan Cross
Okay, here is the output. I apologize for the screen shot; there's no other
particularly great way to capture the console output from the VPS and I
don't trust myself to type it all in without making a mistake of some kind.
That's OK, I can see that there's quite some swapping going on.
I haven't finished investigating yet, but the first thing I've
noticed is that FFS read-ahead issues 64k read requests. xbf(4)
cannot handle more than 45056 at a time so it fails the request.
This might be causing some serious problems.

Unfortunately, it turned out that our SCSI and VFS layers don't
implement proper handling of short reads (b_resid is ignored
on clustered reads by the buffercache and SCSI doesn't do
anything about it either), so I took a stab at getting it
working.

For now the most appropriate way to solve this that I've found is
to invalidate read-ahead portion of a cluster read: when FFS asks
for a block, e.g. 16k, bread_cluster creates an array of bufs for
a MAXPHYS worth of I/O sliced in chunks of the block size (e.g.
16k). Then (after the I/O is done) we can walk down-up and ditch
all chunks that correspond to failed I/O and throw them away.
For example if b_resid is 20480 and we were using 16k chunks,
then we have to invalidate two last bufs (32k).

Unfortunately, there's a major problem that this diff doesn't
solve: if we've read even less than what we were initially asked
for (excluding all of read-ahead blocks). This is because the
biodone for the xbpp[0] aka "the bp" is done from sd_buf_done
directly *before* we can do buf_fix_mapping and restore it's
intended bp->b_bcount. In other words, when sd_buf_done calls
biodone you cannot correlate b_bcount and b_resid and mark the
buffer B_INVAL because you don't know it's intended length.

This is not a final version, but as I won't get back to it
before Monday, I wanted to post it for a wider audience.


diff --git sys/kern/vfs_bio.c sys/kern/vfs_bio.c
index 95bc80bc0e6..1cc1943d752 100644
--- sys/kern/vfs_bio.c
+++ sys/kern/vfs_bio.c
@@ -534,11 +534,29 @@ bread_cluster_callback(struct buf *bp)
*/
buf_fix_mapping(bp, newsize);
bp->b_bcount = newsize;
}

- for (i = 1; xbpp[i] != 0; i++) {
+ /* Invalidate read-ahead buffers if read short */
+ if (bp->b_resid > 0) {
+ for (i = 0; xbpp[i] != NULL; i++)
+ continue;
+ for (i = i - 1; i != 0; i--) {
+ if (xbpp[i]->b_bufsize <= bp->b_resid) {
+ bp->b_resid -= xbpp[i]->b_bufsize;
+ SET(xbpp[i]->b_flags, B_INVAL);
+ } else if (bp->b_resid > 0) {
+ bp->b_resid = 0;
+ SET(xbpp[i]->b_flags, B_INVAL);
+ } else
+ break;
+ }
+ if (bp->b_resid > 0)
+ printf("short read %ld\n", bp->b_resid);
+ }
+
+ for (i = 1; xbpp[i] != NULL; i++) {
if (ISSET(bp->b_flags, B_ERROR))
SET(xbpp[i]->b_flags, B_INVAL | B_ERROR);
biodone(xbpp[i]);
}

@@ -605,11 +623,11 @@ bread_cluster(struct vnode *vp, daddr_t blkno, int size, struct buf **rbpp)
}
}

bp = xbpp[0];

- xbpp[howmany] = 0;
+ xbpp[howmany] = NULL;

inc = btodb(size);

for (i = 1; i < howmany; i++) {
bcstats.pendingreads++;
diff --git sys/dev/pv/xbf.c sys/dev/pv/xbf.c
index d5c44770acb..9a94e3dc48f 100644
--- sys/dev/pv/xbf.c
+++ sys/dev/pv/xbf.c
@@ -448,29 +448,32 @@ xbf_load_xs(struct scsi_xfer *xs, int desc)
struct xbf_softc *sc = xs->sc_link->adapter_softc;
struct xbf_sge *sge;
union xbf_ring_desc *xrd;
bus_dmamap_t map;
int i, error, mapflags;
+ bus_size_t datalen;

xrd = &sc->sc_xr->xr_desc[desc];
map = sc->sc_xs_map[desc];

+ datalen = MIN(xs->datalen, sc->sc_maxphys);
+
mapflags = (sc->sc_domid << 16);
if (ISSET(xs->flags, SCSI_NOSLEEP))
mapflags |= BUS_DMA_NOWAIT;
else
mapflags |= BUS_DMA_WAITOK;
if (ISSET(xs->flags, SCSI_DATA_IN))
mapflags |= BUS_DMA_READ;
else
mapflags |= BUS_DMA_WRITE;

- error = bus_dmamap_load(sc->sc_dmat, map, xs->data, xs->datalen,
+ error = bus_dmamap_load(sc->sc_dmat, map, xs->data, datalen,
NULL, mapflags);
if (error) {
- DPRINTF("%s: failed to load %d bytes of data\n",
- sc->sc_dev.dv_xname, xs->datalen);
+ DPRINTF("%s: failed to load %ld bytes of data\n",
+ sc->sc_dev.dv_xname, datalen);
return (error);
}

for (i = 0; i < map->dm_nsegs; i++) {
sge = &xrd->xrd_req.req_sgl[i];
@@ -726,11 +729,11 @@ xbf_complete_cmd(struct scsi_xfer *xs, int desc)

id = xrd->xrd_rsp.rsp_id;
memset(xrd, 0, sizeof(*xrd));
xrd->xrd_req.req_id = id;

- xs->resid = 0;
+ xs->resid = xs->datalen - MIN(xs->datalen, sc->sc_maxphys);

xbf_reclaim_xs(xs, desc);
xbf_scsi_done(xs, error);
}
Mike Belopuhov
2017-05-24 23:37:31 UTC
Permalink
Thanks for the patch; I just got a few minutes today and I applied it,
rebuilt and installed the kernel and rebooted. Sadly, I get a similar
panic. Attached is a screenshot of console output. Note that, 'boot sync'
from ddb hangs forever.
- Dan C.
That's OK. I've discovered more problems related to 64k transfers.
The reason why we didn't notice anything bad when aborting sleep
was because sleep has a small memory footprint, but if you dump
core of a larger (> 64k) program, you'd notice the issue because
core dump routine like some other places in the kernel assumes
that 64k transfers always work.

I've attempted to attack this problem from a different angle:
ensure that xbf(4) can handle 64k transfers. Solutions to this
problem are notoriously messy and complicated and so far this
one is no exception. Today I got to the point where the system
boots multiuser but couldn't test further. I've noticed however
that "boot dump" from ddb still crashes so I know it's not 100%
right just yet, but since I won't get around doing anything
about this until early next week, I'd appreciate a quick test
if possible.

I'm not attaching the diff since it's rather large:

http://gir.theapt.org/~mike/xbf.diff

Cheers,
Mike
Dan Cross
2017-05-27 01:33:47 UTC
Permalink
Thanks for this latest patch; it seems to help. At least, I was able to put
a fairly significant amount of load on the machine with out a panic. I'll
try and load it up more and see where we get, but so far this is positive.
Post by Mike Belopuhov
Thanks for the patch; I just got a few minutes today and I applied it,
rebuilt and installed the kernel and rebooted. Sadly, I get a similar
panic. Attached is a screenshot of console output. Note that, 'boot sync'
from ddb hangs forever.
- Dan C.
That's OK. I've discovered more problems related to 64k transfers.
The reason why we didn't notice anything bad when aborting sleep
was because sleep has a small memory footprint, but if you dump
core of a larger (> 64k) program, you'd notice the issue because
core dump routine like some other places in the kernel assumes
that 64k transfers always work.
ensure that xbf(4) can handle 64k transfers. Solutions to this
problem are notoriously messy and complicated and so far this
one is no exception. Today I got to the point where the system
boots multiuser but couldn't test further. I've noticed however
that "boot dump" from ddb still crashes so I know it's not 100%
right just yet, but since I won't get around doing anything
about this until early next week, I'd appreciate a quick test
if possible.
http://gir.theapt.org/~mike/xbf.diff
Cheers,
Mike
Mike Belopuhov
2017-06-03 17:03:22 UTC
Permalink
Hi Dan,

That's good news, thanks for testing! I've updated the diff
slightly. Unfortunately I couldn't figure out what's causing
"boot dump" to crash. I've exercised coredump, physio and
read-ahead codepaths. I'll commit the diff next week unless
there's going to be reports of some breakage.

The diff is available from the same location as previously:
http://gir.theapt.org/~mike/xbf.diff

Thanks for testing!
Post by Dan Cross
Thanks for this latest patch; it seems to help. At least, I was able to
put a fairly significant amount of load on the machine with out a panic.
I'll try and load it up more and see where we get, but so far this is
positive.
Post by Mike Belopuhov
Thanks for the patch; I just got a few minutes today and I applied it,
rebuilt and installed the kernel and rebooted. Sadly, I get a similar
panic. Attached is a screenshot of console output. Note that, 'boot
sync'
from ddb hangs forever.
- Dan C.
That's OK. I've discovered more problems related to 64k transfers.
The reason why we didn't notice anything bad when aborting sleep
was because sleep has a small memory footprint, but if you dump
core of a larger (> 64k) program, you'd notice the issue because
core dump routine like some other places in the kernel assumes
that 64k transfers always work.
ensure that xbf(4) can handle 64k transfers. Solutions to this
problem are notoriously messy and complicated and so far this
one is no exception. Today I got to the point where the system
boots multiuser but couldn't test further. I've noticed however
that "boot dump" from ddb still crashes so I know it's not 100%
right just yet, but since I won't get around doing anything
about this until early next week, I'd appreciate a quick test
if possible.
http://gir.theapt.org/~mike/xbf.diff
Cheers,
Mike
Mike Belopuhov
2017-06-06 21:26:55 UTC
Permalink
Hi,

I've checked in a slightly amended version of the diff.

Regards,
Mike
Post by Mike Belopuhov
Hi Dan,
That's good news, thanks for testing! I've updated the diff
slightly. Unfortunately I couldn't figure out what's causing
"boot dump" to crash. I've exercised coredump, physio and
read-ahead codepaths. I'll commit the diff next week unless
there's going to be reports of some breakage.
http://gir.theapt.org/~mike/xbf.diff
Thanks for testing!
Post by Dan Cross
Thanks for this latest patch; it seems to help. At least, I was able to
put a fairly significant amount of load on the machine with out a panic.
I'll try and load it up more and see where we get, but so far this is
positive.
Post by Mike Belopuhov
Thanks for the patch; I just got a few minutes today and I applied it,
rebuilt and installed the kernel and rebooted. Sadly, I get a similar
panic. Attached is a screenshot of console output. Note that, 'boot
sync'
from ddb hangs forever.
- Dan C.
That's OK. I've discovered more problems related to 64k transfers.
The reason why we didn't notice anything bad when aborting sleep
was because sleep has a small memory footprint, but if you dump
core of a larger (> 64k) program, you'd notice the issue because
core dump routine like some other places in the kernel assumes
that 64k transfers always work.
ensure that xbf(4) can handle 64k transfers. Solutions to this
problem are notoriously messy and complicated and so far this
one is no exception. Today I got to the point where the system
boots multiuser but couldn't test further. I've noticed however
that "boot dump" from ddb still crashes so I know it's not 100%
right just yet, but since I won't get around doing anything
about this until early next week, I'd appreciate a quick test
if possible.
http://gir.theapt.org/~mike/xbf.diff
Cheers,
Mike
Dan Cross
2017-06-07 16:03:52 UTC
Permalink
Wonderful, Mike. I just rebuilt kernels and am running some largish jobs
and everything seems to be surviving. Thanks for turning around a fix so
quickly!
Post by Mike Belopuhov
Hi,
I've checked in a slightly amended version of the diff.
Regards,
Mike
Post by Mike Belopuhov
Hi Dan,
That's good news, thanks for testing! I've updated the diff
slightly. Unfortunately I couldn't figure out what's causing
"boot dump" to crash. I've exercised coredump, physio and
read-ahead codepaths. I'll commit the diff next week unless
there's going to be reports of some breakage.
http://gir.theapt.org/~mike/xbf.diff
Thanks for testing!
Post by Dan Cross
Thanks for this latest patch; it seems to help. At least, I was able to
put a fairly significant amount of load on the machine with out a
panic.
Post by Mike Belopuhov
Post by Dan Cross
I'll try and load it up more and see where we get, but so far this is
positive.
Post by Mike Belopuhov
Thanks for the patch; I just got a few minutes today and I applied
it,
Post by Mike Belopuhov
Post by Dan Cross
Post by Mike Belopuhov
rebuilt and installed the kernel and rebooted. Sadly, I get a
similar
Post by Mike Belopuhov
Post by Dan Cross
Post by Mike Belopuhov
panic. Attached is a screenshot of console output. Note that, 'boot
sync'
from ddb hangs forever.
- Dan C.
That's OK. I've discovered more problems related to 64k transfers.
The reason why we didn't notice anything bad when aborting sleep
was because sleep has a small memory footprint, but if you dump
core of a larger (> 64k) program, you'd notice the issue because
core dump routine like some other places in the kernel assumes
that 64k transfers always work.
ensure that xbf(4) can handle 64k transfers. Solutions to this
problem are notoriously messy and complicated and so far this
one is no exception. Today I got to the point where the system
boots multiuser but couldn't test further. I've noticed however
that "boot dump" from ddb still crashes so I know it's not 100%
right just yet, but since I won't get around doing anything
about this until early next week, I'd appreciate a quick test
if possible.
http://gir.theapt.org/~mike/xbf.diff
Cheers,
Mike
Loading...