glassfish
  1. glassfish
  2. GLASSFISH-16217

Hanging threads caused by POAImpl.acquireLock(...)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: None
    • Component/s: orb
    • Labels:
      None
    • Environment:

      Linux SLES 10.2 x86_64 2 Dual-Core CPUs, JDK build 1.6.0_24-b07, 64 bit

      Description

      Hanging threads caused by POAImpl.acquireLock(...) are causing busy waits, and slowly but surely the whole machine will become unresponsive (huge CPU load); evtl. the given GF Instance will run out-of-threads. See also attached thread dump.

      It seems we are dealing here with JDK Bug 6822370. Judging from the code, POAImpl.acquireLock(...) shold be already prepared for a "lost-wakeup" situation.

      However, the implemented workaround seems to be partially broken. In our opinion the "Thread.currentThread().interrupt();" should be moved away out of the while-loop. The proposed change has been successfully tested on our side.

      Related Issue: http://java.net/jira/browse/GLASSFISH-14348

      Standalone reproducer unfortunately does not exist.

      1. ThreadDump_POAImpl.txt
        1.18 MB
        makiey

        Activity

        Hide
        Harshad Vilekar added a comment -

        Fixed: ORB 4.0 hg revision 832.

        Integrated with GlassFish: glassfish-corba-4.0.0-b003

        Show
        Harshad Vilekar added a comment - Fixed: ORB 4.0 hg revision 832. Integrated with GlassFish: glassfish-corba-4.0.0-b003
        Hide
        seanespn added a comment -

        Is it possible to apply this patch to a single domain and not at the glassfish root modules level?

        That is, we have a shared installed of GF running a bunch of instances and we'd like to test this patch on just a single instance / domain first (w/o forcing it ALL the instances / domain that share the bits).

        That is, can i put this in the domain's lib dir or some other place to accomplish the same thing?

        Once we validate there are no side effects I don't have a problem replacing the one $GF_HOME/modules but would like to be a little more cautious introducing this to our prod environment.

        Show
        seanespn added a comment - Is it possible to apply this patch to a single domain and not at the glassfish root modules level? That is, we have a shared installed of GF running a bunch of instances and we'd like to test this patch on just a single instance / domain first (w/o forcing it ALL the instances / domain that share the bits). That is, can i put this in the domain's lib dir or some other place to accomplish the same thing? Once we validate there are no side effects I don't have a problem replacing the one $GF_HOME/modules but would like to be a little more cautious introducing this to our prod environment.
        Hide
        djgerbavore added a comment -

        what code can I use to reproduce this issue in Glassfish 3.1.2.2 (build 5). I want to see how this bug gets triggered. And after applying the orb module fix, verify that the issue did indeed go away.

        I believe we are running into this issue currently, but we are having a hard time reproducing it in our test environment. We are only seeing this in our production environment. So it would be nice see this error in test before we make blind changes to production.

        Thanks,

        Show
        djgerbavore added a comment - what code can I use to reproduce this issue in Glassfish 3.1.2.2 (build 5). I want to see how this bug gets triggered. And after applying the orb module fix, verify that the issue did indeed go away. I believe we are running into this issue currently, but we are having a hard time reproducing it in our test environment. We are only seeing this in our production environment. So it would be nice see this error in test before we make blind changes to production. Thanks,
        Hide
        dlaudams added a comment -

        djgerbavore:

        Try disabling debug mode in your dev environment.

        This is the scenario that worked for me:

        1) Set a small HTTP request timeout.
        2) Servlet makes an EJB request to backend
        3) EJB backend calls Thread.sleep() for longer than the timeout
        4) HTTP listener times out. If debug=false, HTTP timeout will call Thread.interrupt()
        5) The interrupt will trigger the bug.

        I guess that the HTTP timeout is disabled in debug mode to prevent it from triggering during debug sessions.

        Show
        dlaudams added a comment - djgerbavore: Try disabling debug mode in your dev environment. This is the scenario that worked for me: 1) Set a small HTTP request timeout. 2) Servlet makes an EJB request to backend 3) EJB backend calls Thread.sleep() for longer than the timeout 4) HTTP listener times out. If debug=false, HTTP timeout will call Thread.interrupt() 5) The interrupt will trigger the bug. I guess that the HTTP timeout is disabled in debug mode to prevent it from triggering during debug sessions.
        Hide
        djgerbavore added a comment - - edited

        dlaudams,

        Thanks. With debug mode turned off and I set a small Request timeout, I'm able to reproduce this issue without adding any sleeps. You are a life savor! Now I'm going to patch our test environment and see if these interrupting idle threads go aways more gracefully.

        I owe you a beer or 10.

        Edit: applying the new jar does indeed successfully interrupt the faulty threads without taking down the whole system. Thanks again you made this an easy Friday for me.

        Show
        djgerbavore added a comment - - edited dlaudams, Thanks. With debug mode turned off and I set a small Request timeout, I'm able to reproduce this issue without adding any sleeps. You are a life savor! Now I'm going to patch our test environment and see if these interrupting idle threads go aways more gracefully. I owe you a beer or 10. Edit: applying the new jar does indeed successfully interrupt the faulty threads without taking down the whole system. Thanks again you made this an easy Friday for me.

          People

          • Assignee:
            Harshad Vilekar
            Reporter:
            makiey
          • Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: