glassfish
  1. glassfish
  2. GLASSFISH-18763

EJB bundle hangs on stopping when the bundle is updated

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.2
    • Fix Version/s: None
    • Component/s: OSGi-JavaEE
    • Labels:
      None

      Description

      Bundles that contain EJBs deadlock on "stopping" when the bundle is restarted for an update. Glassfish has to be killed (shutdown doesn't work) and the bundle cache cleaned to resolve the issue.
      Probably not related, but the update is triggered by DeploymentAdmin (Apache ACE). Other bundles (non-ejb) work without problem.

      The attached bundle is an example of a bundle that always hangs when updated.

        Activity

        Hide
        pbakker added a comment -

        I did a thread dump of the Glassfish process and found the following that might be related. I also attached the full thread dump.

        "FelixFrameworkWiring" daemon prio=5 tid=0000000001f00000 nid=0xb4e1d000 waiting for monitor entry [00000000b4e1c000]
        java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.osgijavaeebase.OSGiContainer.isDeployed(OSGiContainer.java:218)

        • waiting to lock <00000000142e6890> (a org.glassfish.osgijavaeebase.OSGiContainer)
          at org.glassfish.osgijavaeebase.JavaEEExtender$HybridBundleTrackerCustomizer.removedBundle(JavaEEExtender.java:186)
          at org.osgi.util.tracker.BundleTracker$Tracked.customizerRemoved(BundleTracker.java:508)
          at org.osgi.util.tracker.BundleTracker$Tracked.customizerRemoved(BundleTracker.java:424)
          at org.osgi.util.tracker.AbstractTracked.untrack(AbstractTracked.java:352)
          at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:464)
          at org.apache.felix.framework.util.EventDispatcher.invokeBundleListenerCallback(EventDispatcher.java:868)
          at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:789)
          at org.apache.felix.framework.util.EventDispatcher.fireBundleEvent(EventDispatcher.java:514)
          at org.apache.felix.framework.Felix.fireBundleEvent(Felix.java:4244)
          at org.apache.felix.framework.Felix.stopBundle(Felix.java:2351)
          at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
          at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
          at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
          at java.lang.Thread.run(Thread.java:680)

        Locked ownable synchronizers:

        • None
        Show
        pbakker added a comment - I did a thread dump of the Glassfish process and found the following that might be related. I also attached the full thread dump. "FelixFrameworkWiring" daemon prio=5 tid=0000000001f00000 nid=0xb4e1d000 waiting for monitor entry [00000000b4e1c000] java.lang.Thread.State: BLOCKED (on object monitor) at org.glassfish.osgijavaeebase.OSGiContainer.isDeployed(OSGiContainer.java:218) waiting to lock <00000000142e6890> (a org.glassfish.osgijavaeebase.OSGiContainer) at org.glassfish.osgijavaeebase.JavaEEExtender$HybridBundleTrackerCustomizer.removedBundle(JavaEEExtender.java:186) at org.osgi.util.tracker.BundleTracker$Tracked.customizerRemoved(BundleTracker.java:508) at org.osgi.util.tracker.BundleTracker$Tracked.customizerRemoved(BundleTracker.java:424) at org.osgi.util.tracker.AbstractTracked.untrack(AbstractTracked.java:352) at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:464) at org.apache.felix.framework.util.EventDispatcher.invokeBundleListenerCallback(EventDispatcher.java:868) at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:789) at org.apache.felix.framework.util.EventDispatcher.fireBundleEvent(EventDispatcher.java:514) at org.apache.felix.framework.Felix.fireBundleEvent(Felix.java:4244) at org.apache.felix.framework.Felix.stopBundle(Felix.java:2351) at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629) at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951) at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: None
        Hide
        Sanjeeb Sahoo added a comment -

        This may be related to GLASSFISH-18159 . Can you try the following:

        1. download http://search.maven.org/remotecontent?filepath=org/glassfish/fighterfish/osgi-javaee-base/1.0.2/osgi-javaee-base-1.0.2.jar
        2. make sure you copy it over glassfish/modules/osgi-javaee-base.jar - you can take a backup of original file if you like.
        3. retry the operation that causes hang.

        Thanks much for reporting,
        Sahoo

        Show
        Sanjeeb Sahoo added a comment - This may be related to GLASSFISH-18159 . Can you try the following: 1. download http://search.maven.org/remotecontent?filepath=org/glassfish/fighterfish/osgi-javaee-base/1.0.2/osgi-javaee-base-1.0.2.jar 2. make sure you copy it over glassfish/modules/osgi-javaee-base.jar - you can take a backup of original file if you like. 3. retry the operation that causes hang. Thanks much for reporting, Sahoo
        Hide
        pbakker added a comment -

        After updating the osgi-javaee-base bundle the problem is slightly different.

        The EJB bundle restarts correctly now, but instead a plain OSGi bundle that uses the service published by the EJB bundle now deadlocks on stopping with the exact same thread dump.
        I also attached the bundle that now deadlocks.

        Show
        pbakker added a comment - After updating the osgi-javaee-base bundle the problem is slightly different. The EJB bundle restarts correctly now, but instead a plain OSGi bundle that uses the service published by the EJB bundle now deadlocks on stopping with the exact same thread dump. I also attached the bundle that now deadlocks.
        Hide
        Sanjeeb Sahoo added a comment -

        Pl. mention the exact steps needed to reproduce the problem using a glassfish installation. Thanks much.

        Show
        Sanjeeb Sahoo added a comment - Pl. mention the exact steps needed to reproduce the problem using a glassfish installation. Thanks much.
        Hide
        pbakker added a comment -

        We had a look at the code that deadlocks and found some major issues that can easily lead to deadlocks.
        It's the org.glassfish.osgijavaeebase.OSGiContainer class.

        The problem is that the deploy/undeploy methods are synchronized, even while services are registered and looked up. This is also not allowed by the OSGi spec.

        Chapter 4.7.3:

        "Synchronization Pitfalls
        Generally, a bundle that calls a listener should not hold any Java monitors. This means that neither the Framework nor the originator of a synchronous event should be in a monitor when a callback is initiated.
        The purpose of a Java monitor is to protect the update of data structures. This should be a small region of code that does not call any code the effect of which cannot be overseen. Calling the OSGi Framework from synchronized code can cause unexpected side effects. One of these side effects might be deadlock. A deadlock is the situation where two threads are blocked because they are waiting for each other.
        Time-outs can be used to break deadlocks, but Java monitors do not have time-outs. Therefore, the code will hang forever until the system is reset (Java has deprecated all methods that can stop a thread). This type of dead- lock is prevented by not calling the Framework (or other code that might cause callbacks) in a synchronized block.
        If locks are necessary when calling other code, use the Java monitor to create semaphores that can time-out and thus provide an opportunity to escape a deadlocked situation."

        Show
        pbakker added a comment - We had a look at the code that deadlocks and found some major issues that can easily lead to deadlocks. It's the org.glassfish.osgijavaeebase.OSGiContainer class. The problem is that the deploy/undeploy methods are synchronized, even while services are registered and looked up. This is also not allowed by the OSGi spec. Chapter 4.7.3: "Synchronization Pitfalls Generally, a bundle that calls a listener should not hold any Java monitors. This means that neither the Framework nor the originator of a synchronous event should be in a monitor when a callback is initiated. The purpose of a Java monitor is to protect the update of data structures. This should be a small region of code that does not call any code the effect of which cannot be overseen. Calling the OSGi Framework from synchronized code can cause unexpected side effects. One of these side effects might be deadlock. A deadlock is the situation where two threads are blocked because they are waiting for each other. Time-outs can be used to break deadlocks, but Java monitors do not have time-outs. Therefore, the code will hang forever until the system is reset (Java has deprecated all methods that can stop a thread). This type of dead- lock is prevented by not calling the Framework (or other code that might cause callbacks) in a synchronized block. If locks are necessary when calling other code, use the Java monitor to create semaphores that can time-out and thus provide an opportunity to escape a deadlocked situation."
        Hide
        Sanjeeb Sahoo added a comment -

        Yes, I know OSGiContainer methods are synchronized, but I didn't expect them to deadlock during normal course of action. It's a different matter if there are multiple management agents managing life cycle of bundles.
        BTW, I am unable to deploy your bundles because you have not attached the bundle that provides agenda.api package. Pl. provide the complete set of bundles that I need and the instructions to reproduce. Thanks.

        Show
        Sanjeeb Sahoo added a comment - Yes, I know OSGiContainer methods are synchronized, but I didn't expect them to deadlock during normal course of action. It's a different matter if there are multiple management agents managing life cycle of bundles. BTW, I am unable to deploy your bundles because you have not attached the bundle that provides agenda.api package. Pl. provide the complete set of bundles that I need and the instructions to reproduce. Thanks.
        Hide
        marrs added a comment -

        Calling services or the OSGi framework itself whilst holding locks is a very bad idea in general, because you do not know the exact consequences of such calls and I have seen many examples where this caused deadlocks. In this case it's particularly bad as it hangs the whole framework. This has nothing to do with having multiple management agents, if you ask me, and I would advise you to refactor the code so it no longer holds any locks while calling the framework. In this case, I don't think a test is that important as in general unit and integration tests are very bad at spotting concurrency issues anyway. These are best resolved by code reviews (at least that's my opinion). However, if you insist I'm sure Paul can come up with a working test case.

        Show
        marrs added a comment - Calling services or the OSGi framework itself whilst holding locks is a very bad idea in general, because you do not know the exact consequences of such calls and I have seen many examples where this caused deadlocks. In this case it's particularly bad as it hangs the whole framework. This has nothing to do with having multiple management agents, if you ask me, and I would advise you to refactor the code so it no longer holds any locks while calling the framework. In this case, I don't think a test is that important as in general unit and integration tests are very bad at spotting concurrency issues anyway. These are best resolved by code reviews (at least that's my opinion). However, if you insist I'm sure Paul can come up with a working test case.
        Hide
        Sanjeeb Sahoo added a comment -

        Yes, please provide a complete test case as earlier requested. Thanks.

        Show
        Sanjeeb Sahoo added a comment - Yes, please provide a complete test case as earlier requested. Thanks.
        Hide
        pbakker added a comment -

        I have spent a few hours creating a test case as simple as possible. The problem is I keep on seeing different results, and it often doesn't brake (but sometimes it does). Although not the simplest way, but the most effective way to test this is by using Apache ACE to deploy bundles.

        1) Download and install ACE (ace.apache.org).
        2) Add two system properties to glass fish (e.g. in the domain.xml)
        <system-property name="discovery" value="http://localhost:8080"></system-property>
        <system-property name="identification" value="glassfish"></system-property>
        The discovery url is the url where ACE is running.
        3) Upload an EJB bundle to ACE and deploy it to GF
        4) Upload a new version of the EJB bundle (just change the filename and version in the manifest)
        5) "Save" the new version in the ACE UI so it will be pushed to GF
        6) The EJB bundle is now deadlocked in "stopping" in most cases, if not, just update again.

        Sorry the test case isn't that easy to execute, but as marrs said it is often hard to spot concurrency issues from automated tests because timings are very different.
        You can use the ACE management agent without ACE too by using a file url in the discovery property, but this test fails a lot less often (on my machine). It may be hard to reproduce the issue that way. The file url should specify a directory where you "install" bundles. Start with one EJB bundle and let it deploy, then add a second bundle to the directory with a higher version in the manifest.
        <system-property name="discovery" value="file:///Users/paul/Desktop/glassfish/"></system-property>

        Show
        pbakker added a comment - I have spent a few hours creating a test case as simple as possible. The problem is I keep on seeing different results, and it often doesn't brake (but sometimes it does). Although not the simplest way, but the most effective way to test this is by using Apache ACE to deploy bundles. 1) Download and install ACE (ace.apache.org). 2) Add two system properties to glass fish (e.g. in the domain.xml) <system-property name="discovery" value="http://localhost:8080"></system-property> <system-property name="identification" value="glassfish"></system-property> The discovery url is the url where ACE is running. 3) Upload an EJB bundle to ACE and deploy it to GF 4) Upload a new version of the EJB bundle (just change the filename and version in the manifest) 5) "Save" the new version in the ACE UI so it will be pushed to GF 6) The EJB bundle is now deadlocked in "stopping" in most cases, if not, just update again. Sorry the test case isn't that easy to execute, but as marrs said it is often hard to spot concurrency issues from automated tests because timings are very different. You can use the ACE management agent without ACE too by using a file url in the discovery property, but this test fails a lot less often (on my machine). It may be hard to reproduce the issue that way. The file url should specify a directory where you "install" bundles. Start with one EJB bundle and let it deploy, then add a second bundle to the directory with a higher version in the manifest. <system-property name="discovery" value="file:///Users/paul/Desktop/glassfish/"></system-property>
        Hide
        Sanjeeb Sahoo added a comment -

        Thanks for this instructions. I will try them out. Although I have many osgi/ejb bundles with me, I would rather use what you used to reproduce than using my own. I had mentioned in my comment on 25 May that the set of bundles you have attached to this issue does not include some required bundles and I had asked for the missing bundles to be attached. Could you do the same? Thanks again.

        Show
        Sanjeeb Sahoo added a comment - Thanks for this instructions. I will try them out. Although I have many osgi/ejb bundles with me, I would rather use what you used to reproduce than using my own. I had mentioned in my comment on 25 May that the set of bundles you have attached to this issue does not include some required bundles and I had asked for the missing bundles to be attached. Could you do the same? Thanks again.
        Hide
        pbakker added a comment -

        I have attached the agenda.api bundle, you should now be able to deploy them. Thanks for looking into it!

        Show
        pbakker added a comment - I have attached the agenda.api bundle, you should now be able to deploy them. Thanks for looking into it!
        Hide
        Sanjeeb Sahoo added a comment -

        Pl. accept my apology for not investigating yet. I am busy in another higher priority task and hope to get back to this early next week. Thanks for your patience.

        Show
        Sanjeeb Sahoo added a comment - Pl. accept my apology for not investigating yet. I am busy in another higher priority task and hope to get back to this early next week. Thanks for your patience.
        Hide
        Sanjeeb Sahoo added a comment -

        I am using osgi-javaee-base 1.0.2 which has some deadlock fix. I have tried deploying and updating your ejb bundle several times and I could not reproduce. Knowing timing issues, I am not surprised.

        I am afraid I am not very inclined to change the locking model unless I know what's exactly going on. Lack of any framework API to lock a bundle does not make things easier. As per the requirements of WAB and other enterprise applications spec, when an enterprise bundle is stopped, the extender must undeploy synchronously upon receiving the Bundle.STOPPING event, which means there is some amount locking that needs to happen in a synchronous listener. On the other hand, the spec allows for bundle to be deployed asynchronously. So, we had seen some deadlocks when bundles were started and stopped in quick succession. Those deadlocks were (successfully) broken by introduction of a timeout and cancellation facility in our undeployer thread. We did that in osgi-javaee-base:1.0.2 when we fixed GLASSFISH-18159. The original thread dump you have noticed should not occur after our fix. You have also mentioned that after upgrading to osgi-javaee-base 1.0.2, the behavior slightly changed. Could you tell me how the behavior changed? What's the new thread dump? Part of the reason for me being reluctant to make drastic changes is that we have other people using it and they seem to be fine. So, I would like to get to the root of the current problem before really doing something of that sort.

        Show
        Sanjeeb Sahoo added a comment - I am using osgi-javaee-base 1.0.2 which has some deadlock fix. I have tried deploying and updating your ejb bundle several times and I could not reproduce. Knowing timing issues, I am not surprised. I am afraid I am not very inclined to change the locking model unless I know what's exactly going on. Lack of any framework API to lock a bundle does not make things easier. As per the requirements of WAB and other enterprise applications spec, when an enterprise bundle is stopped, the extender must undeploy synchronously upon receiving the Bundle.STOPPING event, which means there is some amount locking that needs to happen in a synchronous listener. On the other hand, the spec allows for bundle to be deployed asynchronously. So, we had seen some deadlocks when bundles were started and stopped in quick succession. Those deadlocks were (successfully) broken by introduction of a timeout and cancellation facility in our undeployer thread. We did that in osgi-javaee-base:1.0.2 when we fixed GLASSFISH-18159 . The original thread dump you have noticed should not occur after our fix. You have also mentioned that after upgrading to osgi-javaee-base 1.0.2, the behavior slightly changed. Could you tell me how the behavior changed? What's the new thread dump? Part of the reason for me being reluctant to make drastic changes is that we have other people using it and they seem to be fine. So, I would like to get to the root of the current problem before really doing something of that sort.

          People

          • Assignee:
            Sanjeeb Sahoo
            Reporter:
            pbakker
          • Votes:
            3 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: