Issue Details (XML | Word | Printable)

Key: GLASSFISH-16217
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Harshad Vilekar
Reporter: makiey
Votes: 5
Watchers: 8
Operations

If you were logged in you would be able to see more operations.
glassfish

Hanging threads caused by POAImpl.acquireLock(...)

Created: 15/Mar/11 08:29 AM   Updated: 16/Aug/13 03:23 PM   Resolved: 29/Aug/12 01:54 PM
Component/s: orb
Affects Version/s: 3.1
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Java Archive File glassfish-corba-orb.jar (1.52 MB) 14/Aug/12 06:42 AM - Harshad Vilekar
2. Text File ThreadDump_POAImpl.txt (1.18 MB) 15/Mar/11 08:29 AM - makiey

Environment:

Linux SLES 10.2 x86_64 2 Dual-Core CPUs, JDK build 1.6.0_24-b07, 64 bit


Tags: 3_1_1-scrubbed 3_1_2-exclude
Participants: bantunes, bebbo, djgerbavore, dlaudams, Harshad Vilekar, makiey, seanespn, szehau and ywaka


 Description  « Hide

Hanging threads caused by POAImpl.acquireLock(...) are causing busy waits, and slowly but surely the whole machine will become unresponsive (huge CPU load); evtl. the given GF Instance will run out-of-threads. See also attached thread dump.

It seems we are dealing here with JDK Bug 6822370. Judging from the code, POAImpl.acquireLock(...) shold be already prepared for a "lost-wakeup" situation.

However, the implemented workaround seems to be partially broken. In our opinion the "Thread.currentThread().interrupt();" should be moved away out of the while-loop. The proposed change has been successfully tested on our side.

Related Issue: http://java.net/jira/browse/GLASSFISH-14348

Standalone reproducer unfortunately does not exist.



makiey added a comment - 16/Mar/11 03:58 AM

and here the proposed patch:

com.sun.corba.ee.impl.oa.poa.POAImpl:
...
private void acquireLock( Lock lock ) {
...
while (!locked) {
if (count >= reportingThreshhold) {
acquireLockWaiting( count ) ;
if (reportingThreshhold < Integer.MAX_VALUE/2) { reportingThreshhold *= 2 ; } // assume that no one will wait for 2^31 SECONDS.
}
try { locked = lock.tryLock( timeout, TimeUnit.SECONDS ) ; count++ ; } catch (InterruptedException exc) { interrupted = true ; }
}

// check and restore out of the while-loop (!!!):
if (interrupted) { // restore thread interrupted status Thread.currentThread().interrupt() ; }
}
...


szehau added a comment - 11/Jun/12 02:34 AM - edited

makiey,

I think the interrupt statement should be moved out from the "while" loop instead of comment out the line. This is so that other source code that waiting for interrupt signal can wake up.

May I know how I can custom build this file. I can't find this source file in Glassfish 3.1.2.

By the way, is there any plan on fixing this bug and include it in next release. My program hung every single day!


bebbo added a comment - 11/Jun/12 07:57 AM

see also http://java.net/jira/browse/GLASSFISH_CORBA-11

The suggested patch won't help much, because the InterruptedException is still caught and eaten.

The real problem is: thread.interrupt() does not throw an InterruptedException again! Thus the caller will continue to call this function since no error/exception was received.

=> remove the try catch block and don't touch the InterruptedException. Then everything works.

@szehau: I also don't know where to obtain the source. Jad produces not working code for this class. The best way is to use jasmin and patch the java assembly code.


szehau added a comment - 02/Aug/12 02:48 AM

Is there any update on this issue?


bebbo added a comment - 02/Aug/12 05:43 AM - edited

You can try the POAImpl attachment I made on http://java.net/jira/browse/GLASSFISH_CORBA-11.


Harshad Vilekar added a comment - 14/Aug/12 06:42 AM

Please try attached patch - drop attached glassfish-corba-orb.jar in glassfish/modules directory and restart glassfish.

6822370 was fixed in jdk6u21. So use newer JDK6 update, or use JDK7.

The patch updates com.sun.corba.ee.impl.oa.poa.POAImpl.acquireLock().


ywaka added a comment - 16/Aug/12 09:25 AM

Harshad,

I tested the glassfish-corba-orb.jar you attached.

I interrupted the thread while it was running in the try block of the com.sun.corba.ee.impl.oa.poa.POAImpl.acquireLock as shown below.
When the patch is applied, I didn't see any high CPU usage. (without the patch I can see the high CPU usage with the same step.)

  1. /usr/java/jdk1.6.0_33/bin/jdb -connect com.sun.jdi.SocketAttach:hostname=xx.xx.xx.xx,port=9009
    Set uncaught java.lang.Throwable
    Set deferred uncaught java.lang.Throwable
    Initializing jdb ...
    > stop at com.sun.corba.ee.impl.oa.poa.POAImpl:377
    Set breakpoint com.sun.corba.ee.impl.oa.poa.POAImpl:377
    >
    Breakpoint hit: "thread=http-thread-pool-8080(1)", com.sun.corba.ee.impl.oa.poa.POAImpl.acquireLock(), line=377 bci=29

http-thread-pool-8080(1)[1] where
[1] com.sun.corba.ee.impl.oa.poa.POAImpl.acquireLock (POAImpl.java:377)
[2] com.sun.corba.ee.impl.oa.poa.POAImpl.readLock (POAImpl.java:427)
[3] com.sun.corba.ee.impl.oa.poa.POAImpl.find_POA (POAImpl.java:1,000)
[4] com.sun.corba.ee.impl.oa.poa.POAFactory.find (POAFactory.java:224)
...
http-thread-pool-8080(1)[1] threads
...
Group grizzly-kernel:
(com.sun.grizzly.util.WorkerThreadImpl)0x21d3 Grizzly-kernel-thread(1) running
(com.sun.grizzly.http.HttpWorkerThread)0x21d7 http-thread-pool-8080(1) running (at breakpoint)
(com.sun.grizzly.http.HttpWorkerThread)0x21db http-thread-pool-8080(2) cond. waiting
(com.sun.grizzly.http.HttpWorkerThread)0x21da http-thread-pool-8080(3) cond. waiting
(com.sun.grizzly.http.HttpWorkerThread)0x21d9 http-thread-pool-8080(4) cond. waiting
(com.sun.grizzly.http.HttpWorkerThread)0x21d8 http-thread-pool-8080(5) cond. waiting
...
http-thread-pool-8080(1)[1] interrupt 0x21d7
http-thread-pool-8080(1)[1] clear
Breakpoints set:
breakpoint com.sun.corba.ee.impl.oa.poa.POAImpl:377
http-thread-pool-8080(1)[1] clear com.sun.corba.ee.impl.oa.poa.POAImpl:377
Removed: breakpoint com.sun.corba.ee.impl.oa.poa.POAImpl:377
http-thread-pool-8080(1)[1] cont
> exit

  1. top

top - 12:23:47 up 6 days, 1:28, 1 user, load average: 0.70, 0.61, 0.35
Tasks: 147 total, 2 running, 145 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3590144k total, 3489320k used, 100824k free, 240948k buffers
Swap: 5996536k total, 184k used, 5996352k free, 2111624k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 15 0 10368 672 564 S 0.0 0.0 0:04.44 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.12 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
9 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/1
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
11 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
13 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 xenwatch
14 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 xenbus
17 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0
18 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/1
19 root 17 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
98 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
99 root 17 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/1
103 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
105 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
173 root 15 0 0 0 0 S 0.0 0.0 0:00.00 khungtaskd
174 root 15 0 0 0 0 S 0.0 0.0 0:00.74 pdflush
176 root 10 -5 0 0 0 S 0.0 0.0 0:00.40 kswapd0

I hope "line=377 bci=29" is the right place to put the breakpoint at.

>c:\Java\jdk1.6.0_31\bin\javap.exe -classpath glassfish-corba-orb.jar -c -s -verbose -l -private com.sun.corba.ee.impl.oa.poa.POAImpl

...
private void acquireLock(java.util.concurrent.locks.Lock);
Signature: (Ljava/util/concurrent/locks/Lock;)V
LineNumberTable:
line 377: 29
line 406: 35

LocalVariableTable:
Start Length Slot Name Signature
29 18 0 this Lcom/sun/corba/ee/impl/oa/poa/POAImpl;
29 18 1 lock Ljava/util/concurrent/locks/Lock;
29 18 2 _$mm$_ Lcom/sun/corba/ee/spi/orbutil/tf/MethodMonitor;
47 0 3 _$exc$_ Ljava/lang/Throwable;

Code:
Stack=6, Locals=4, Args_size=2
0: getstatic #179; //Field _$mm$_0:Lcom/sun/corba/ee/spi/orbutil/generic/Holder;
3: invokevirtual #185; //Method com/sun/corba/ee/spi/orbutil/generic/Holder.content)Ljava/lang/Object;
6: checkcast #187; //class com/sun/corba/ee/spi/orbutil/tf/MethodMonitor
9: astore_2
10: aload_2
11: ifnull 29
14: aload_2
15: iconst_1
16: iconst_1
17: anewarray #115; //class java/lang/Object
20: dup
21: iconst_0
22: aload_1
23: aastore
24: invokeinterface #192, 3; //InterfaceMethod com/sun/corba/ee/spi/orbutil/tf/MethodMonitor.enter:(I[Ljava/lang/Object;)V
29: aload_1
30: invokeinterface #268, 1; //InterfaceMethod java/util/concurrent/locks/Lock.lock)V
35: aload_2
36: ifnull 46
39: aload_2
40: iconst_1
41: invokeinterface #208, 2; //InterfaceMethod com/sun/corba/ee/spi/orbutil/tf/MethodMonitor.exit:(I)V
46: return
47: astore_3
48: aload_2
49: ifnull 59
52: aload_2
53: iconst_1
54: invokeinterface #208, 2; //InterfaceMethod com/sun/corba/ee/spi/orbutil/tf/MethodMonitor.exit:(I)V
59: aload_3
60: athrow
Exception table:
from to target type
29 47 47 any
47 48 47 any
LocalVariableTable:
Start Length Slot Name Signature
29 18 0 this Lcom/sun/corba/ee/impl/oa/poa/POAImpl;
29 18 1 lock Ljava/util/concurrent/locks/Lock;
29 18 2 _$mm$_ Lcom/sun/corba/ee/spi/orbutil/tf/MethodMonitor;
47 0 3 _$exc$_ Ljava/lang/Throwable;

LineNumberTable:
line 377: 29
line 406: 35

RuntimeVisibleAnnotations: length = 0x6
00 01 00 08 00 00
...


Harshad Vilekar added a comment - 16/Aug/12 04:22 PM

Thanks Yu for the confirmation. Yes, you set the break point at the right place.


bantunes added a comment - 21/Aug/12 10:13 AM

Can this patch be applied to GlassFish 3.1.2 or 3.1.2.2 ?

We are observing this same problem with GlassFish 3.1.2 and Java SE Update 6 update 33

CPU Load for thread 0x112a: 63.1

Thread Dump:

"http-thread-pool-8080(25)" daemon prio=10 tid=0x00007f5f04036800 nid=0x112a runnable [0x00007f5ec84f7000]
java.lang.Thread.State: RUNNABLE
at java.lang.Thread.interrupt0(Native Method)
at java.lang.Thread.interrupt(Thread.java:895)
at com.sun.corba.ee.impl.oa.poa.POAImpl.acquireLock(POAImpl.java:398)
at com.sun.corba.ee.impl.oa.poa.POAImpl.readLock(POAImpl.java:422)
at com.sun.corba.ee.impl.oa.poa.POAImpl.exit(POAImpl.java:1788)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.runServantPostInvoke(CorbaMessageMediatorImpl.java:2173)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.createResponseHelper(CorbaMessageMediatorImpl.java:2100)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.createResponseHelper(CorbaMessageMediatorImpl.java:2072)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.createResponse(CorbaMessageMediatorImpl.java:1921)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.createReply(CorbaMessageMediatorImpl.java:666)
at com.sun.corba.ee.impl.presentation.rmi.ReflectiveTie._invoke(ReflectiveTie.java:176)
at com.sun.corba.ee.impl.protocol.CorbaServerRequestDispatcherImpl.dispatchToServant(CorbaServerRequestDispatcherImpl.java:528)
at com.sun.corba.ee.impl.protocol.CorbaServerRequestDispatcherImpl.dispatch(CorbaServerRequestDispatcherImpl.java:199)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequestRequest(CorbaMessageMediatorImpl.java:1624)
at com.sun.corba.ee.impl.protocol.SharedCDRClientRequestDispatcherImpl.marshalingComplete(SharedCDRClientRequestDispatcherImpl.java:126)
at com.sun.corba.ee.impl.protocol.CorbaClientDelegateImpl.invoke(CorbaClientDelegateImpl.java:273)
at com.sun.corba.ee.impl.presentation.rmi.StubInvocationHandlerImpl.privateInvoke(StubInvocationHandlerImpl.java:200)
at com.sun.corba.ee.impl.presentation.rmi.StubInvocationHandlerImpl.invoke(StubInvocationHandlerImpl.java:152)
at com.sun.corba.ee.impl.presentation.rmi.codegen.CodegenStubBase.invoke(CodegenStubBase.java:227)
...


Harshad Vilekar added a comment - 21/Aug/12 04:32 PM

Bantunes, The attached patch is for 3.1.2.2. Please confirm if it works for you.


Harshad Vilekar added a comment - 29/Aug/12 01:54 PM

Fixed in GF version 3.1.2.3.
ORB Workspace gf-corba-v3-mirror~gfv31-master revision 694, ORB release 3.1.0-b034.


Harshad Vilekar added a comment - 09/Oct/12 12:23 AM

Verification done by GrhrdBlmln. Adding the relevant information from thread http://forums.java.net/node/889790 Updated:October 8, 2012 - 02:36 #9
-----------------
The patch ... works fine for our problem with GF.
.. the GF detects an idle thread and tries to interrupt it. … Now with success !!! Obviously the patched GF is now able to interrupt the faulty thread.
-----------------


Harshad Vilekar added a comment - 25/Feb/13 06:14 PM

Fixed: ORB 4.0 hg revision 832.

Integrated with GlassFish: glassfish-corba-4.0.0-b003


seanespn added a comment - 18/Apr/13 06:52 PM

Is it possible to apply this patch to a single domain and not at the glassfish root modules level?

That is, we have a shared installed of GF running a bunch of instances and we'd like to test this patch on just a single instance / domain first (w/o forcing it ALL the instances / domain that share the bits).

That is, can i put this in the domain's lib dir or some other place to accomplish the same thing?

Once we validate there are no side effects I don't have a problem replacing the one $GF_HOME/modules but would like to be a little more cautious introducing this to our prod environment.


djgerbavore added a comment - 16/Aug/13 03:50 AM

what code can I use to reproduce this issue in Glassfish 3.1.2.2 (build 5). I want to see how this bug gets triggered. And after applying the orb module fix, verify that the issue did indeed go away.

I believe we are running into this issue currently, but we are having a hard time reproducing it in our test environment. We are only seeing this in our production environment. So it would be nice see this error in test before we make blind changes to production.

Thanks,


dlaudams added a comment - 16/Aug/13 04:09 AM

djgerbavore:

Try disabling debug mode in your dev environment.

This is the scenario that worked for me:

1) Set a small HTTP request timeout.
2) Servlet makes an EJB request to backend
3) EJB backend calls Thread.sleep() for longer than the timeout
4) HTTP listener times out. If debug=false, HTTP timeout will call Thread.interrupt()
5) The interrupt will trigger the bug.

I guess that the HTTP timeout is disabled in debug mode to prevent it from triggering during debug sessions.


djgerbavore added a comment - 16/Aug/13 02:13 PM - edited

dlaudams,

Thanks. With debug mode turned off and I set a small Request timeout, I'm able to reproduce this issue without adding any sleeps. You are a life savor! Now I'm going to patch our test environment and see if these interrupting idle threads go aways more gracefully.

I owe you a beer or 10.

Edit: applying the new jar does indeed successfully interrupt the faulty threads without taking down the whole system. Thanks again you made this an easy Friday for me.