close() of active socket does not work on FreeBSD 6  
Author Message
davidxu





PostPosted: 2006-12-13 7:33:00 Top

java-programmer, close() of active socket does not work on FreeBSD 6 On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
> > In message <email***@***.com>, Bruce Evans writes:
> >> On Mon, 11 Dec 2006, Daniel Eischen wrote:
> >>
> >> It's probably a nightmare in the kernel too. close() starts looking
> >> like revoke(), and revoke() has large problems and bugs in this area.
> >
> > There is the distinctive difference that revoke() operates on a name
> > and close() on a filedescriptor, but otherwise I agree.
>
> Well, if threads waiting on IO are interruptable by signals,
> can't we make a new signal that's only used by the kernel
> and send it to all threads waiting on IO for that descriptor?
> When it gets out to actually setup the signal handler, it
> just resumes like it is returning from an SA_RESTART signal
> handler (which according to another posting would reissue
> the IO command and get EBADF).

Stop using signal, it is slow for threaded process, first you don't
know which threads are using the descriptor, second, you have
to run long code path in kernel signal code to find and deliver
the signals to all interested threads, that is too expensive for
benchmark like apache benchmark.

David Xu


_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
davidxu





PostPosted: 2006-12-13 7:33:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
> > In message <email***@***.com>, Bruce Evans writes:
> >> On Mon, 11 Dec 2006, Daniel Eischen wrote:
> >>
> >> It's probably a nightmare in the kernel too. close() starts looking
> >> like revoke(), and revoke() has large problems and bugs in this area.
> >
> > There is the distinctive difference that revoke() operates on a name
> > and close() on a filedescriptor, but otherwise I agree.
>
> Well, if threads waiting on IO are interruptable by signals,
> can't we make a new signal that's only used by the kernel
> and send it to all threads waiting on IO for that descriptor?
> When it gets out to actually setup the signal handler, it
> just resumes like it is returning from an SA_RESTART signal
> handler (which according to another posting would reissue
> the IO command and get EBADF).

Stop using signal, it is slow for threaded process, first you don't
know which threads are using the descriptor, second, you have
to run long code path in kernel signal code to find and deliver
the signals to all interested threads, that is too expensive for
benchmark like apache benchmark.

David Xu


_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
Arne H. Juul





PostPosted: 2006-12-13 10:00:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Tue, 12 Dec 2006, Greg Lewis wrote:
> This is, unfortunately, a known problem. See
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/97921

Yeah, I got redirected there by somebody else too; it seems I've
managed to spark lots of debate now at least. Copying linux_close.c and
thereby working around the problem in the Java VM looks like it solves
my immediate problem, and could probably be integrated into the diablo
jdk too? I've tried the C version of the test program and it behaves
"badly" on Linux, FreeBSD 6, and NetBSD at least, so patching over it
is probably the best medium-term solution.

- Arne H. J.
 
 
arnej





PostPosted: 2006-12-13 10:03:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Tue, 12 Dec 2006, Greg Lewis wrote:
> This is, unfortunately, a known problem. See
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/97921

Yeah, I got redirected there by somebody else too; it seems I've
managed to spark lots of debate now at least. Copying linux_close.c and
thereby working around the problem in the Java VM looks like it solves
my immediate problem, and could probably be integrated into the diablo
jdk too? I've tried the C version of the test program and it behaves
"badly" on Linux, FreeBSD 6, and NetBSD at least, so patching over it
is probably the best medium-term solution.

- Arne H. J.
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
 
Bruce Evans





PostPosted: 2006-12-13 11:28:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Wed, 13 Dec 2006, David Xu wrote:

> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
>> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
>>> In message <email***@***.com>, Bruce Evans writes:
>>>> On Mon, 11 Dec 2006, Daniel Eischen wrote:
>>>>
>>>> It's probably a nightmare in the kernel too. close() starts looking
>>>> like revoke(), and revoke() has large problems and bugs in this area.
>>>
>>> There is the distinctive difference that revoke() operates on a name
>>> and close() on a filedescriptor, but otherwise I agree.
>>
>> Well, if threads waiting on IO are interruptable by signals,
>> can't we make a new signal that's only used by the kernel
>> and send it to all threads waiting on IO for that descriptor?
>> When it gets out to actually setup the signal handler, it
>> just resumes like it is returning from an SA_RESTART signal
>> handler (which according to another posting would reissue
>> the IO command and get EBADF).
>
> Stop using signal, it is slow for threaded process, first you don't
> know which threads are using the descriptor, second, you have
> to run long code path in kernel signal code to find and deliver
> the signals to all interested threads, that is too expensive for
> benchmark like apache benchmark.

A signal would be fast enough for revoke() since revoke() is not used
much, and would work well if the signal could be sent, and is unmaskable,
and all device drivers catch signals (oops, all of them act like
applications whose signal catching function just sets a flag, except
while they sleep, so they have the usual problems with just setting a
flag -- they may run for too long before actually using the setting).
However, I think there is no way to determine which threads are using
an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
Kernel data structures just aren't set up to do this search efficiently,
and shouldn't be bloated to do it.

For close() on non-devices, there is the additional problem of infinite
disk waits due to things like nfs servers down and bugs. Then signals
don't work and you wouldn't like close() by a thread trying to clean
up the problem to hang too. Otherwise close()/revoke() would be a good
way to cancel an infinite disk wait.

Bruce
 
 
bde





PostPosted: 2006-12-13 11:30:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Wed, 13 Dec 2006, David Xu wrote:

> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
>> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
>>> In message <email***@***.com>, Bruce Evans writes:
>>>> On Mon, 11 Dec 2006, Daniel Eischen wrote:
>>>>
>>>> It's probably a nightmare in the kernel too. close() starts looking
>>>> like revoke(), and revoke() has large problems and bugs in this area.
>>>
>>> There is the distinctive difference that revoke() operates on a name
>>> and close() on a filedescriptor, but otherwise I agree.
>>
>> Well, if threads waiting on IO are interruptable by signals,
>> can't we make a new signal that's only used by the kernel
>> and send it to all threads waiting on IO for that descriptor?
>> When it gets out to actually setup the signal handler, it
>> just resumes like it is returning from an SA_RESTART signal
>> handler (which according to another posting would reissue
>> the IO command and get EBADF).
>
> Stop using signal, it is slow for threaded process, first you don't
> know which threads are using the descriptor, second, you have
> to run long code path in kernel signal code to find and deliver
> the signals to all interested threads, that is too expensive for
> benchmark like apache benchmark.

A signal would be fast enough for revoke() since revoke() is not used
much, and would work well if the signal could be sent, and is unmaskable,
and all device drivers catch signals (oops, all of them act like
applications whose signal catching function just sets a flag, except
while they sleep, so they have the usual problems with just setting a
flag -- they may run for too long before actually using the setting).
However, I think there is no way to determine which threads are using
an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
Kernel data structures just aren't set up to do this search efficiently,
and shouldn't be bloated to do it.

For close() on non-devices, there is the additional problem of infinite
disk waits due to things like nfs servers down and bugs. Then signals
don't work and you wouldn't like close() by a thread trying to clean
up the problem to hang too. Otherwise close()/revoke() would be a good
way to cancel an infinite disk wait.

Bruce
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
 
kostikbel





PostPosted: 2006-12-13 15:11:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6


--5uO961YFyoDlzFnP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 12, 2006 at 08:18:32AM -0500, Daniel Eischen wrote:
> On Mon, 11 Dec 2006, Daniel Eischen wrote:
>=20
> >On Tue, 12 Dec 2006, Arne H. Juul wrote:
> >
> >>On Tue, 12 Dec 2006, David Xu wrote:
> >>>On Tuesday 12 December 2006 06:34, Arne H. Juul wrote:
> >>><snip>
> >>>>This is exactly the sort of issue that should be solved by the
> >>>>thread library / kernel threads implementation and not in every
> >>>>threaded application that needs it, in my view.
> >>>>
> >>>It should not be done in new thread library, do you want a bloat
> >>>and error-prone thread library ? Instead if this semantic is really
> >>>necessary, it should be done in kernel.
> >>
> >>Well, it depends on the alternatives.
> >>If a clean kernel implementation is possible - yes please, of course.
> >>If only a complex, error-prone kernel implementation is possible,
> >>I would prefer to have the complexity in the thread library.
> >
> >Hacking libthr or libpthread to do this for you is not
> >an option. They would then look like libc_r since all
> >fd's accesses would need to be wrapped. If this needs
> >to be done, it must be in the kernel.
>=20
> It's also couldn't be entirely solved by fixing it in the
> threads library. You could still have a non-threaded
> application that waits on a read operation, but receives
> a signal and closes the socket in the signal handler.

This is not the problem. The read (as syscall being executed) is aborted
when signal is delivered. Original poster considered situation where
read() is active (in particular, f_count of struct file is incremented
by fget, that caused the reported behaviour).



--5uO961YFyoDlzFnP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFfrUeC3+MBN1Mb4gRAhCXAKCJxzJsY0KFk3GYwKTqTSC2ZLWybQCgjA8M
Lfnc6O8F144t8wd826jDuX0=
=6wvE
-----END PGP SIGNATURE-----



--5uO961YFyoDlzFnP--
 
 
Julian Elischer





PostPosted: 2006-12-13 15:12:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 Bruce Evans wrote:
> On Wed, 13 Dec 2006, David Xu wrote:
>
>> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
>>> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
>>>> In message <email***@***.com>, Bruce Evans writes:
>>>>> On Mon, 11 Dec 2006, Daniel Eischen wrote:
>>>>>
>>>>> It's probably a nightmare in the kernel too. close() starts looking
>>>>> like revoke(), and revoke() has large problems and bugs in this area.
>>>>
>>>> There is the distinctive difference that revoke() operates on a name
>>>> and close() on a filedescriptor, but otherwise I agree.
>>>
>>> Well, if threads waiting on IO are interruptable by signals,
>>> can't we make a new signal that's only used by the kernel
>>> and send it to all threads waiting on IO for that descriptor?
>>> When it gets out to actually setup the signal handler, it
>>> just resumes like it is returning from an SA_RESTART signal
>>> handler (which according to another posting would reissue
>>> the IO command and get EBADF).
>>
>> Stop using signal, it is slow for threaded process, first you don't
>> know which threads are using the descriptor, second, you have
>> to run long code path in kernel signal code to find and deliver
>> the signals to all interested threads, that is too expensive for
>> benchmark like apache benchmark.
>
> A signal would be fast enough for revoke() since revoke() is not used
> much, and would work well if the signal could be sent, and is unmaskable,
> and all device drivers catch signals (oops, all of them act like
> applications whose signal catching function just sets a flag, except
> while they sleep, so they have the usual problems with just setting a
> flag -- they may run for too long before actually using the setting).
> However, I think there is no way to determine which threads are using
> an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
> Kernel data structures just aren't set up to do this search efficiently,
> and shouldn't be bloated to do it.

that's processes.. which thread in the process is the one that is
currently waiting on the socket?

>
> For close() on non-devices, there is the additional problem of infinite
> disk waits due to things like nfs servers down and bugs. Then signals
> don't work and you wouldn't like close() by a thread trying to clean
> up the problem to hang too. Otherwise close()/revoke() would be a good
> way to cancel an infinite disk wait.
>
> Bruce
> _______________________________________________
> email***@***.com mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "email***@***.com"

 
 
julian





PostPosted: 2006-12-13 15:14:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 Bruce Evans wrote:
> On Wed, 13 Dec 2006, David Xu wrote:
>
>> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
>>> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
>>>> In message <email***@***.com>, Bruce Evans writes:
>>>>> On Mon, 11 Dec 2006, Daniel Eischen wrote:
>>>>>
>>>>> It's probably a nightmare in the kernel too. close() starts looking
>>>>> like revoke(), and revoke() has large problems and bugs in this area.
>>>>
>>>> There is the distinctive difference that revoke() operates on a name
>>>> and close() on a filedescriptor, but otherwise I agree.
>>>
>>> Well, if threads waiting on IO are interruptable by signals,
>>> can't we make a new signal that's only used by the kernel
>>> and send it to all threads waiting on IO for that descriptor?
>>> When it gets out to actually setup the signal handler, it
>>> just resumes like it is returning from an SA_RESTART signal
>>> handler (which according to another posting would reissue
>>> the IO command and get EBADF).
>>
>> Stop using signal, it is slow for threaded process, first you don't
>> know which threads are using the descriptor, second, you have
>> to run long code path in kernel signal code to find and deliver
>> the signals to all interested threads, that is too expensive for
>> benchmark like apache benchmark.
>
> A signal would be fast enough for revoke() since revoke() is not used
> much, and would work well if the signal could be sent, and is unmaskable,
> and all device drivers catch signals (oops, all of them act like
> applications whose signal catching function just sets a flag, except
> while they sleep, so they have the usual problems with just setting a
> flag -- they may run for too long before actually using the setting).
> However, I think there is no way to determine which threads are using
> an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
> Kernel data structures just aren't set up to do this search efficiently,
> and shouldn't be bloated to do it.

that's processes.. which thread in the process is the one that is
currently waiting on the socket?

>
> For close() on non-devices, there is the additional problem of infinite
> disk waits due to things like nfs servers down and bugs. Then signals
> don't work and you wouldn't like close() by a thread trying to clean
> up the problem to hang too. Otherwise close()/revoke() would be a good
> way to cancel an infinite disk wait.
>
> Bruce
> _______________________________________________
> email***@***.com mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "email***@***.com"

_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
 
Bruce Evans





PostPosted: 2006-12-13 19:28:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Tue, 12 Dec 2006, Julian Elischer wrote:

> Bruce Evans wrote:
>> A signal would be fast enough for revoke() since revoke() is not used
>> much, and would work well if the signal could be sent, and is unmaskable,
>> and all device drivers catch signals (oops, all of them act like
>> ...
>> However, I think there is no way to determine which threads are using
>> an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
>> Kernel data structures just aren't set up to do this search efficiently,
>> and shouldn't be bloated to do it.
>
> that's processes.. which thread in the process is the one that is currently
> waiting on the socket?

Do you mean that this wouldn't work the signal would need to be per-thread
but signals are per-process? Aren't there per-thread signals now?

It's not just one thread, at least for general files. There can be any
number. I just remembered that SIGIO delivery has problems near here.
There is some i/o on a device, and the kernel has to figure out all
open fd's on the device with O_ASYNC set on the open file of the fd. It
has difficulty doing this, even with some data structures pointing from
the device back to the processes. In theory there can be any number of
fd's with the same open file and the signal should be broadcast to the
processes owning these fd's). This is still simpler than signaling
threads in i/o functions since the signal is broadcast.

I said that something like fstat(1) groping in kmem is needed to find all
the relevant threads. That is nowhere near enough -- fstat cannot tell
which threads are currently in i/o functions. Something closer to what
debuggers do is needed -- stop all threads and stack trace them all to
see where they are :-).

Bruce
 
 
bde





PostPosted: 2006-12-13 19:45:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Tue, 12 Dec 2006, Julian Elischer wrote:

> Bruce Evans wrote:
>> A signal would be fast enough for revoke() since revoke() is not used
>> much, and would work well if the signal could be sent, and is unmaskable,
>> and all device drivers catch signals (oops, all of them act like
>> ...
>> However, I think there is no way to determine which threads are using
>> an fd short of doing the equivalent of fstat(1) searching throuhj kmem.
>> Kernel data structures just aren't set up to do this search efficiently,
>> and shouldn't be bloated to do it.
>
> that's processes.. which thread in the process is the one that is currently
> waiting on the socket?

Do you mean that this wouldn't work the signal would need to be per-thread
but signals are per-process? Aren't there per-thread signals now?

It's not just one thread, at least for general files. There can be any
number. I just remembered that SIGIO delivery has problems near here.
There is some i/o on a device, and the kernel has to figure out all
open fd's on the device with O_ASYNC set on the open file of the fd. It
has difficulty doing this, even with some data structures pointing from
the device back to the processes. In theory there can be any number of
fd's with the same open file and the signal should be broadcast to the
processes owning these fd's). This is still simpler than signaling
threads in i/o functions since the signal is broadcast.

I said that something like fstat(1) groping in kmem is needed to find all
the relevant threads. That is nowhere near enough -- fstat cannot tell
which threads are currently in i/o functions. Something closer to what
debuggers do is needed -- stop all threads and stack trace them all to
see where they are :-).

Bruce
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
 
David Xu





PostPosted: 2006-12-13 20:11:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
> > In message <email***@***.com>, Bruce Evans writes:
> >> On Mon, 11 Dec 2006, Daniel Eischen wrote:
> >>
> >> It's probably a nightmare in the kernel too. close() starts looking
> >> like revoke(), and revoke() has large problems and bugs in this area.
> >
> > There is the distinctive difference that revoke() operates on a name
> > and close() on a filedescriptor, but otherwise I agree.
>
> Well, if threads waiting on IO are interruptable by signals,
> can't we make a new signal that's only used by the kernel
> and send it to all threads waiting on IO for that descriptor?
> When it gets out to actually setup the signal handler, it
> just resumes like it is returning from an SA_RESTART signal
> handler (which according to another posting would reissue
> the IO command and get EBADF).

Even if you have implemented the close() with the interruption, another
thread openning a file still can reuse the file handle immediately,
according to specifications, the lowest free file handle will be returned,
if SA_RESTART is used, the interrupted thread restart the syscall,
it will be using a wrong file, I think even if we have implemented the
feature in kernel, useland threads still has serious race to fix.

David Xu
 
 
davidxu





PostPosted: 2006-12-13 20:12:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
> On Tue, 12 Dec 2006, Poul-Henning Kamp wrote:
> > In message <email***@***.com>, Bruce Evans writes:
> >> On Mon, 11 Dec 2006, Daniel Eischen wrote:
> >>
> >> It's probably a nightmare in the kernel too. close() starts looking
> >> like revoke(), and revoke() has large problems and bugs in this area.
> >
> > There is the distinctive difference that revoke() operates on a name
> > and close() on a filedescriptor, but otherwise I agree.
>
> Well, if threads waiting on IO are interruptable by signals,
> can't we make a new signal that's only used by the kernel
> and send it to all threads waiting on IO for that descriptor?
> When it gets out to actually setup the signal handler, it
> just resumes like it is returning from an SA_RESTART signal
> handler (which according to another posting would reissue
> the IO command and get EBADF).

Even if you have implemented the close() with the interruption, another
thread openning a file still can reuse the file handle immediately,
according to specifications, the lowest free file handle will be returned,
if SA_RESTART is used, the interrupted thread restart the syscall,
it will be using a wrong file, I think even if we have implemented the
feature in kernel, useland threads still has serious race to fix.

David Xu
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"
 
 
deischen





PostPosted: 2006-12-13 22:28:00 Top

java-programmer >> close() of active socket does not work on FreeBSD 6 On Mon, 11 Dec 2006, Daniel Eischen wrote:
>
> Common sense leads me to think that a close() should release
> threads in IO operations (reads/writes/selects/polls) and
> return EBADF or something appropriate. At least when behavior
> is not dictated by POSIX or other historical/defactor behavior.

BTW, I tested the behavior on Solaris. Solaris returns EBADF
with the posted sample C program.

--
DE
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "email***@***.com"
_______________________________________________
email***@***.com mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to "email***@***.com"