Discussion:
System hang when connecting sockets+setsockopt on same port in a loop
o***@ethnao.no-ip.biz
2017-05-31 08:11:45 UTC
Permalink
Synopsis: An unpriviledged user can hang the system with socket+setsockopt+connect in a loop
Category: system user kernel amd64 i386
System : OpenBSD 6.1
Details : OpenBSD 6.1-current (GENERIC.MP) #78: Wed May 24 06:44:15 MDT 2017
***@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP

Architecture: OpenBSD.i386
Machine : i386
The following code seems to trigger something weird.
I've tested on slow i386, faster amd64, all bare metal, on a VMM. Everytime the system freeze after a few iterations.
No log, no trace, no panic, no ping from outside. I don't know what happen.
$ cat tcp_connect.c [7/46]
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

#include <err.h>
#include <string.h>

int
main()
{
struct addrinfo hints, *res;
int error;
int s;
int i;

memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
error = getaddrinfo("localhost", "25", &hints, &res);
if (error)
errx(1, "%s", gai_strerror(error));

for (i = 0 ; i <1024 ; i++) {
s = socket(res->ai_family, res->ai_socktype,res->ai_protocol);
if (s == -1)
errx(1,"socket");
int enable = 1;
if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int)) < 0)
errx(1, "setsockopt");
if (connect(s, res->ai_addr, res->ai_addrlen) == -1)
errx(1,"connect");
}
return 0;
}

$ gcc tcp_connect.c
$ while true ; do ./a.out ; done
a.out: socket
a.out: socket
packet_write_wait: Connection to 10.0.0.2 port 22: Broken pipe
Nope.


dmesg:
OpenBSD 6.1-current (GENERIC.MP) #78: Wed May 24 06:44:15 MDT 2017
***@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Atom(TM) CPU N270 @ 1.60GHz ("GenuineIntel" 686-class) 1.61 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,SENSOR
real mem = 1064452096 (1015MB)
avail mem = 1031258112 (983MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: date 01/04/10, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.5 @ 0xf06b0 (32 entries)
bios0: vendor American Megatrends Inc. version "1302" date 01/04/2010
bios0: ASUSTeK Computer INC. B202
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
acpi0: wakeup devices P0P2(S4) P0P1(S4) MC97(S4) P0P4(S4) P0P5(S4) P0P7(S4) P0P8(S4) P0P9(S4) USB0(S4) USB1(S4) EUSB(S4) P0P6(S4) HDAC(S4) USB2(S4) USB3(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 134MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.0.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Atom(TM) CPU N270 @ 1.60GHz ("GenuineIntel" 686-class) 1.61 GHz
cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,SENSOR
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 3 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P5)
acpiprt4 at acpi0: bus 1 (P0P6)
acpicpu0 at acpi0: C1(@1 halt!), PSS
acpicpu1 at acpi0: C1(@1 halt!), PSS
aibs0 at acpi0
aibs0: TSIF not found
aibs0: FSIF not found
aibs0: VSIF not found
aibs0: no sensors found
acpibtn0 at acpi0: PWRB
acpivideo0 at acpi0: VGA_
bios0: ROM list: 0xc0000/0xec00!
cpu0: Enhanced SpeedStep 1609 MHz: speeds: 1600, 1333, 1067, 800 MHz
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82945GME Host" rev 0x03
inteldrm0 at pci0 dev 2 function 0 "Intel 82945GME Video" rev 0x03
drm0 at inteldrm0
intagp0 at inteldrm0
agp0 at intagp0: aperture at 0xd0000000, size 0x10000000
inteldrm0: apic 2 int 16
inteldrm0: 1024x768, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 82945GM Video" rev 0x03 at pci0 dev 2 function 1 not configured
azalia0 at pci0 dev 27 function 0 "Intel 82801GB HD Audio" rev 0x02: msi
azalia0: codecs: Realtek ALC888
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 82801GB PCIE" rev 0x02: apic 2 int 16
pci1 at ppb0 bus 3
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x02: RTL8168C/8111C (0x3c00), msi, address 00:22:15:d5:1d:4f
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 2
ppb1 at pci0 dev 28 function 1 "Intel 82801GB PCIE" rev 0x02: apic 2 int 17
pci2 at ppb1 bus 2
ral0 at pci2 dev 0 function 0 "Ralink RT2790" rev 0x00: apic 2 int 17, address 00:22:43:24:92:32
ral0: MAC/BBP RT2872 (rev 0x0200), RF RT2720 (MIMO 1T2R)
ppb2 at pci0 dev 28 function 2 "Intel 82801GB PCIE" rev 0x02: apic 2 int 18
pci3 at ppb2 bus 1
uhci0 at pci0 dev 29 function 0 "Intel 82801GB USB" rev 0x02: apic 2 int 23
uhci1 at pci0 dev 29 function 1 "Intel 82801GB USB" rev 0x02: apic 2 int 19
uhci2 at pci0 dev 29 function 2 "Intel 82801GB USB" rev 0x02: apic 2 int 21
uhci3 at pci0 dev 29 function 3 "Intel 82801GB USB" rev 0x02: apic 2 int 22
ehci0 at pci0 dev 29 function 7 "Intel 82801GB USB" rev 0x02: apic 2 int 23
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb3 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0xe2
pci4 at ppb3 bus 4
ichpcib0 at pci0 dev 31 function 0 "Intel 82801GBM LPC" rev 0x02: PM disabled
pciide0 at pci0 dev 31 function 2 "Intel 82801GBM SATA" rev 0x02: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <SAMSUNG HM080JI>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 6
ichiic0 at pci0 dev 31 function 3 "Intel 82801GB SMBus" rev 0x02: apic 2 int 19
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 512MB DDR2 SDRAM non-parity PC2-5300CL5 SO-DIMM
spdmem1 at iic0 addr 0x51: 512MB DDR2 SDRAM non-parity PC2-5300CL5 SO-DIMM
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb4 at uhci3: USB revision 1.0
uhub4 at usb4 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627DHG rev 0x23
lm1 at wbsio0 port 0x290/8: W83627DHG
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
umass0 at uhub0 port 6 configuration 1 interface 0 "Generic USB2.0-CRW" rev 2.00/58.87 addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <Generic-, Multi-Card, 1.00> SCSI0 0/direct removable serial.0bda0158114173400000
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (8cd5b95cfb345a5a.a) swap on wd0b dump on wd0b

usbdevs:
Controller /dev/usb0:
addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), Intel(0x8086), rev 1.00
port 1 powered
port 2 powered
port 3 powered
port 4 powered
port 5 powered
port 6 addr 2: high speed, power 500 mA, config 1, USB2.0-CRW(0x0158), Generic(0x0bda), rev 58.87, iSerialNumber 20071114173400000
port 7 powered
port 8 powered
Controller /dev/usb1:
addr 1: full speed, self powered, config 1, UHCI root hub(0x0000), Intel(0x8086), rev 1.00
port 1 powered
port 2 powered
Controller /dev/usb2:
addr 1: full speed, self powered, config 1, UHCI root hub(0x0000), Intel(0x8086), rev 1.00
port 1 powered
port 2 powered
Controller /dev/usb3:
addr 1: full speed, self powered, config 1, UHCI root hub(0x0000), Intel(0x8086), rev 1.00
port 1 powered
port 2 powered
Controller /dev/usb4:
addr 1: full speed, self powered, config 1, UHCI root hub(0x0000), Intel(0x8086), rev 1.00
port 1 powered
port 2 powered
Alexander Bluhm
2017-05-31 12:03:53 UTC
Permalink
Post by o***@ethnao.no-ip.biz
I've tested on slow i386, faster amd64, all bare metal, on a VMM. Everytime the system freeze after a few iterations.
No log, no trace, no panic, no ping from outside. I don't know what happen.
I cannot reproduce a hang with your programm.
Post by o***@ethnao.no-ip.biz
OpenBSD 6.1-current (GENERIC.MP) #78: Wed May 24 06:44:15 MDT 2017
Can you try a newer kernel? I have used

kern.version=OpenBSD 6.1-current (GENERIC.MP) #83: Mon May 29 18:42:49 MDT 2017
***@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP

bluhm
Olivier A.
2017-05-31 13:25:38 UTC
Permalink
Date: Wed, 31 May 2017 14:03:53 +0200
Subject: Re: System hang when connecting sockets+setsockopt on same port in a
loop
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.8.2 (2017-04-18)
X-Provags-ID: V03:K0:Vb9PeyUJY82wfo2RZQ8yDyXDNLl1xC+h8QTVC2JRxQOzecKudKF
wCaFEBNTf71ofKVm0Pe/SbNld7iost/63VwVLJzS8PYZHiCIRNiDg34win7mwua81MKr9RK
4+SHpG9gNkEgaYpheUKF2zVZcA8JyanupY+FrvOMi4l40mrJ3CV3uH/w/0HO6nE9DlvlTIL
GrSS3mY5tdgwX70hZcUSQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:IhRDPJ8GoeM=:DvFSTqVOTw7ph4/6aQ177W
L+VDFvwwPhvQmkntEijkZOBKR7mIwJjJNj/LGXMZiku9UHjrZdI5j3Iyl8vAbCSoQbfNvEnfj
HTPXloaLhUFjZPP1F+WzYeADAj609w1jDkKOGr7/yDkSZXL6iorbfc7o0z0UvTbuUu71myupn
NS7g6PlVllCWYUG7J3IZMsyKqGazLxUjLrKYBwD2mDmABy9d2i51tSxpnK6/DPRhmVhngrHpo
TIUByGBeZDpZd7HJOI6PigRrsqqwZI/Rw//9eJTAcNG1qqXzzUKQJpsODn86QvxHe1+7dv1nJ
UQLh3x3Lqxy0gCl1eNk3DBupFE+cmY9AQqvwcQ07QzmP7fpTalWR9Ib2YlHUL29UyTvcOwgeG
0gPWtFHjEgHciy5jyxB23lsotNI6dwMV5o3Hp7totDi4kA2wHpjAfyKS0+IAbN9qco6XUzgfJ
k9W0Qlo3+ybtc2ohdToFTB08SfX3VULQOgojTXBPjMBBhdXN93Q+CrUV01XgK00WrSrwz6ax8
UoFWSJWFDnoPj+g92s0BvGlNMgx7i88ZgrZw8GtJVvVK5gpfndllIDDMGoaS60sRSqkPejw9W
F1fuzGacbBoKOIFJw3LU5gXiZxOwdt6IpgtTU9C/gkc07ln0EHTjJvJYkwxP5nMY+X7X5mKo+
BL+V/i7kLfrhiPSB6hp3SAQE5RNV6q8h9RRXB9mBNRVwT2ZjV+vyfHp02Nt94mmaNO7TmfMdn
Mjq5bzwo9dbunG/Eiwh8GWC/6Pn59OwXrRhUNulIixa+4a4olY9RuK8gmvY=
Post by o***@ethnao.no-ip.biz
I've tested on slow i386, faster amd64, all bare metal, on a VMM. Everytime the system freeze after a few iterations.
No log, no trace, no panic, no ping from outside. I don't know what happen.
I cannot reproduce a hang with your programm.
Post by o***@ethnao.no-ip.biz
OpenBSD 6.1-current (GENERIC.MP) #78: Wed May 24 06:44:15 MDT 2017
Can you try a newer kernel? I have used
kern.version=OpenBSD 6.1-current (GENERIC.MP) #83: Mon May 29 18:42:49 MDT 2017
bluhm
Tried with :
kern.version=OpenBSD 6.1-current (GENERIC) #84: Tue May 30 21:02:10 MDT 2017
***@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

Same problÃem.
Seems to be more obvious on port 22 with infinit loop (ie. for(;;))

Only tested on a VM this time. Will try on real hardware as soon as possible.
Olivier A.
2017-05-31 19:05:34 UTC
Permalink
Post by Olivier A.
Only tested on a VM this time. Will try on real hardware as soon as possible.
This time on real hardware with last -current snapshot available :
kern.version=OpenBSD 6.1-current (GENERIC.MP) #87: Tue May 30 21:05:25 MDT 2017
***@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

on ssh port, infinit loop enclosed by while true in the shell as described above. It freeze within two minutes on my machine.

Nothing fancy on my system, I think. Just a base system+xenocara. It happen also on my B202 ASUS Box (i386 plateform ; dmesg on the original sendbug report).
Martin Pieuchot
2017-06-01 08:40:33 UTC
Permalink
Post by Olivier A.
Post by Olivier A.
Only tested on a VM this time. Will try on real hardware as soon as possible.
kern.version=OpenBSD 6.1-current (GENERIC.MP) #87: Tue May 30 21:05:25 MDT 2017
on ssh port, infinit loop enclosed by while true in the shell as described above. It freeze within two minutes on my machine.
Can you enter ddb and type 'ps' ?
Olivier A.
2017-06-01 11:36:15 UTC
Permalink
Post by Martin Pieuchot
Can you enter ddb and type 'ps' ?
Unfortunately no. There's no panic. No ddb prompt. On my laptop under X, xenocara is completly frozen, mouse pointer stop mouving. On a VMM, with ddb.console=1 and cu(1) sending BREAK does nothing.
Martin Pieuchot
2017-06-01 11:46:32 UTC
Permalink
Post by Olivier A.
Post by Martin Pieuchot
Can you enter ddb and type 'ps' ?
Unfortunately no. There's no panic. No ddb prompt. On my laptop under X, xenocara is completly frozen, mouse pointer stop mouving. On a VMM, with ddb.console=1 and cu(1) sending BREAK does nothing.
In this case could you build a kernel with option MP_LOCKDEBUG and see if
it enters ddb?
Olivier A.
2017-06-01 13:48:38 UTC
Permalink
Post by Martin Pieuchot
In this case could you build a kernel with option MP_LOCKDEBUG and see if
it enters ddb?
Ok, I've compiled a new kernel with option MP_LOCKDEBUG uncommented on GENRERIC.MP conf file.
After booting :

kern.version=OpenBSD 6.1 (GENERIC.MP) #1: Thu Jun 1 15:47:28 CEST 2017
***@openbsd.noip.biz:/sys/arch/amd64/compile/GENERIC.MP

$ while true ; do ./a.out ; done
a.out: socket
a.out: socket
<- somewhere after this point, the server stop responding

Sending a BREAK via cu(1) does nothing.

From another server :
...
64 bytes from 10.0.0.2: icmp_seq=52 ttl=255 time=0.505 ms
64 bytes from 10.0.0.2: icmp_seq=53 ttl=255 time=0.565 ms
64 bytes from 10.0.0.2: icmp_seq=54 ttl=255 time=0.497 ms
64 bytes from 10.0.0.2: icmp_seq=55 ttl=255 time=104.545 ms <- this at last print 'a.out: socket'
64 bytes from 10.0.0.2: icmp_seq=56 ttl=255 time=0.560 ms
64 bytes from 10.0.0.2: icmp_seq=57 ttl=255 time=0.596 ms
64 bytes from 10.0.0.2: icmp_seq=58 ttl=255 time=0.551 ms
64 bytes from 10.0.0.2: icmp_seq=59 ttl=255 time=0.568 ms
64 bytes from 10.0.0.2: icmp_seq=60 ttl=255 time=0.510 ms
64 bytes from 10.0.0.2: icmp_seq=61 ttl=255 time=0.614 ms
64 bytes from 10.0.0.2: icmp_seq=62 ttl=255 time=0.511 ms
64 bytes from 10.0.0.2: icmp_seq=63 ttl=255 time=0.564 ms
^C
--- 10.0.0.2 ping statistics ---
468 packets transmitted, 64 packets received, 86.3% packet loss
round-trip min/avg/max/std-dev = 0.485/2.188/104.545/12.896 ms
Martin Pieuchot
2017-06-01 15:00:15 UTC
Permalink
Post by Olivier A.
Post by Martin Pieuchot
In this case could you build a kernel with option MP_LOCKDEBUG and see if
it enters ddb?
Ok, I've compiled a new kernel with option MP_LOCKDEBUG uncommented on GENRERIC.MP conf file.
kern.version=OpenBSD 6.1 (GENERIC.MP) #1: Thu Jun 1 15:47:28 CEST 2017
$ while true ; do ./a.out ; done
a.out: socket
a.out: socket
<- somewhere after this point, the server stop responding
Sending a BREAK via cu(1) does nothing.
So the only difference is that now ping works?
Olivier A.
2017-06-01 15:36:07 UTC
Permalink
Post by Martin Pieuchot
So the only difference is that now ping works?
Oh. I would have explicited that.
No. Ping don't work after 64th icmp echo request. I have CTRL-C at 468th packet according to ping status.
It doesn't seems to be determinist. Sometimes it hang faster that other time.
Martin Pieuchot
2017-06-01 15:45:24 UTC
Permalink
Post by Olivier A.
Post by Martin Pieuchot
So the only difference is that now ping works?
Oh. I would have explicited that.
No. Ping don't work after 64th icmp echo request. I have CTRL-C at 468th packet according to ping status.
It doesn't seems to be determinist. Sometimes it hang faster that other time.
Ok. So it's not a KERNEL_LOCK() related issue, you can remove option
MP_LOCKDEBUG.

Could you try with option WITNESS instead? Note that it won't work if
you run X, so try to reproduce the problem in a VM with option WITNESS.

If it's a lock issue it should catch it.
Olivier A.
2017-06-01 16:22:34 UTC
Permalink
Post by Martin Pieuchot
In this case could you build a kernel with option MP_LOCKDEBUG and see if
it enters ddb?

Oh wait! I've a ddb when running with MP_LOCKDEBUG on real hardware!
Previously it was on VMM.

so 'ps' show a huge list of sshd command:
PID TID PPID UID S FLAGS WAIT COMMAND
68506 314235 39940 0 3 0x82 netio sshd
56104 208367 39940 0 3 0x82 netio sshd
...
77266 154542 66005 1000 3 0x83 netcon2 a.out
36210 35972 35940 0 2 0x82 netio sshd
...


trace show:

Debugger() at Debugger+0x9
__mp_lock() at __mp_lock+0x73
__mp_acquire_count() at __mp_acquire_count+0x2b
mi_switch() at mi_switch+0x21e
sleep_finish() at tsleep+0x154
soreceive() at socreceive+0x429
soo_read at soo_read+0x22
dofilereadv() at dofilereadv+0x204
sys_read() at sys_read+0x89
syscall() at syscall+0x27b
--- syscall (number 3) ---
end trace frame: 0x0, count: -11
0x145b662b0c5a

I can't copy/paste so I've typed this from pictures.

Loading...