glassfish
  1. glassfish
  2. GLASSFISH-20934

"org.osgi.framework.BundleException: Unable to acquire global lock for resolve" happened while executing "asadmin start-cluster"

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.1.2, 3.1.2.2, 4.0
    • Fix Version/s: None
    • Component/s: hk2, jms, jts, OSGi
    • Labels:
      None

      Description

      While executing "asadmin start-cluster" in an environment with heavy-load, sometimes, the following exception happened in server.log

      ...
      [#|2013-12-04T16:46:56.553+0900|SEVERE|||_ThreadID=68;_ThreadName=Recovery Helper Thread;|Exception in thread "Recovery Helper Thread" |#]

      [#|2013-12-04T16:46:56.558+0900|SEVERE|||_ThreadID=68;_ThreadName=Recovery Helper Thread;|com.sun.enterprise.module.ResolveError: Failed to start Bundle Id [40] State [INSTALLED] [org.glassfish.main.connectors.inbound-runtime(Connectors Inbound Support):3.1.2.2-XXX-SNAPSHOT]
      at org.jvnet.hk2.osgiadapter.OSGiModuleImpl.start(OSGiModuleImpl.java:177)
      at org.jvnet.hk2.osgiadapter.OSGiModuleImpl$2$1$1.loadClass(OSGiModuleImpl.java:344)
      at com.sun.hk2.component.LazyInhabitant.loadClass(LazyInhabitant.java:124)
      at com.sun.hk2.component.LazyInhabitant.fetch(LazyInhabitant.java:111)
      at com.sun.hk2.component.EventPublishingInhabitant.get(EventPublishingInhabitant.java:135)
      at com.sun.hk2.component.AbstractInhabitantImpl.get(AbstractInhabitantImpl.java:78)
      at org.jvnet.hk2.component.Habitat$5.get(Habitat.java:703)
      at java.util.AbstractList$Itr.next(AbstractList.java:358)
      at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
      at java.util.ArrayList.<init>(ArrayList.java:164)
      at com.sun.enterprise.transaction.jts.ResourceRecoveryManagerImpl.configure(ResourceRecoveryManagerImpl.java:339)
      at com.sun.enterprise.transaction.jts.ResourceRecoveryManagerImpl.recoverXAResources(ResourceRecoveryManagerImpl.java:231)
      at com.sun.enterprise.transaction.jts.ResourceRecoveryManagerImpl.recoverXAResources(ResourceRecoveryManagerImpl.java:331)
      at com.sun.enterprise.transaction.jts.ResourceRecoveryManagerImpl.postConstruct(ResourceRecoveryManagerImpl.java:106)
      at com.sun.hk2.component.AbstractCreatorImpl.inject(AbstractCreatorImpl.java:131)
      at com.sun.hk2.component.ConstructorCreator.initialize(ConstructorCreator.java:91)
      at com.sun.hk2.component.AbstractCreatorImpl.get(AbstractCreatorImpl.java:82)
      at com.sun.hk2.component.SingletonInhabitant.get(SingletonInhabitant.java:67)
      at com.sun.hk2.component.EventPublishingInhabitant.get(EventPublishingInhabitant.java:139)
      at com.sun.hk2.component.AbstractInhabitantImpl.get(AbstractInhabitantImpl.java:78)
      at org.jvnet.hk2.component.Habitat.getByContract(Habitat.java:1050)
      at com.sun.jts.jta.TransactionServiceProperties$RecoveryHelperThread.run(TransactionServiceProperties.java:358)
      Caused by: org.osgi.framework.BundleException: Unable to acquire global lock for resolve.
      at org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3832)
      at org.apache.felix.framework.Felix.startBundle(Felix.java:1868)
      at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
      at org.jvnet.hk2.osgiadapter.OSGiModuleImpl.start(OSGiModuleImpl.java:169)
      ... 21 more

      #]
      ...

      Among the above, [40] is org.glassfish.main.connectors.inbound-runtime bundle.

      This issue maybe also happened on 4.0 based on my source analyze.

      About "Component/s", firstly, I selected HK2, OSGi and JMS. About why selecting JMS, I will say the reason in the following comment. secondly, I firstly assign to Sahoo to evaluate.

        Activity

        Hide
        TangYong added a comment -

        [My investigation]

        In starting cluster process, there are two threads which all influence org.glassfish.main.connectors.inbound-runtime module's state. The two threads are:

        1) Recovery Helper Thread
        The thread will auto-start org.glassfish.main.connectors.inbound-runtime module while the following from org.glassfish.main.transaction.jts module is executed:

        ResourceRecoveryManager recoveryManager = serviceLocator.getService(ResourceRecoveryManager.class)

        By a series of calling, finally, executing logic will search the implementation or service of com.sun.enterprise.transaction.spi.RecoveryResourceHandler interface. And InboundRecoveryHandler class from org.glassfish.main.connectors.inbound-runtime module will be a candidate.

        2) RunLevelController Thread
        The thread comes from HK2. And the thread is produced because while gf kernel is starting, the following from com.sun.enterprise.v3.server.AppServerStartup will make effect,

        if (!proceedTo(PostStartupRunLevel.VAL))

        { … }

        Well, hk2 will filter all descriptors meeting run level equaling with PostStartupRunLevel.VAL(20) and then starting these modules. Because JmsProviderLifecycle class from org.glassfish.main.jms.core defines @RunLevel(value=PostStartupRunLevel.VAL, mode=RunLevel.RUNLEVEL_MODE_NON_VALIDATING), so it is a candidate. Deeply, while starting org.glassfish.main.jms.core, because org.glassfish.main.jms.core depends on org.glassfish.main.connectors.inbound-runtime module[1], firstly, OSGi runtime will resolve org.glassfish.main.connectors.inbound-runtime module.

        [1]: only ActiveJmsResourceAdapter class depends on org.glassfish.main.connectors.inbound-runtime.

        Based on the above analyze, if the two threads are in cross running, Recovery Helper Thread will be interrupted by felix because it can not obtain global lock, about the issue, pl. seeing [2].

        Richard says:
        "This is not necessarily a bug. That can happen if your bundle is holding a bundle lock and you try to acquire the global lock, which in this case the thread holds the bundle lock for the bundle it is trying to start and it needs the global lock to resolve it. If someone else already has the global lock and needs a bundle lock, then it will interrupt the other thread only holding the bundle lock. This avoids deadlocks."

        [2]:http://mail-archives.apache.org/mod_mbox/felix-users/201102.mbox/%3C4D498058.7080104@ungoverned.org%3E

        So, this is why I selected HK2, OSGi and JMS.

        Show
        TangYong added a comment - [My investigation] In starting cluster process, there are two threads which all influence org.glassfish.main.connectors.inbound-runtime module's state. The two threads are: 1) Recovery Helper Thread The thread will auto-start org.glassfish.main.connectors.inbound-runtime module while the following from org.glassfish.main.transaction.jts module is executed: ResourceRecoveryManager recoveryManager = serviceLocator.getService(ResourceRecoveryManager.class) By a series of calling, finally, executing logic will search the implementation or service of com.sun.enterprise.transaction.spi.RecoveryResourceHandler interface. And InboundRecoveryHandler class from org.glassfish.main.connectors.inbound-runtime module will be a candidate. 2) RunLevelController Thread The thread comes from HK2. And the thread is produced because while gf kernel is starting, the following from com.sun.enterprise.v3.server.AppServerStartup will make effect, if (!proceedTo(PostStartupRunLevel.VAL)) { … } Well, hk2 will filter all descriptors meeting run level equaling with PostStartupRunLevel.VAL(20) and then starting these modules. Because JmsProviderLifecycle class from org.glassfish.main.jms.core defines @RunLevel(value=PostStartupRunLevel.VAL, mode=RunLevel.RUNLEVEL_MODE_NON_VALIDATING), so it is a candidate. Deeply, while starting org.glassfish.main.jms.core, because org.glassfish.main.jms.core depends on org.glassfish.main.connectors.inbound-runtime module [1] , firstly, OSGi runtime will resolve org.glassfish.main.connectors.inbound-runtime module. [1] : only ActiveJmsResourceAdapter class depends on org.glassfish.main.connectors.inbound-runtime. Based on the above analyze, if the two threads are in cross running, Recovery Helper Thread will be interrupted by felix because it can not obtain global lock, about the issue, pl. seeing [2] . Richard says: "This is not necessarily a bug. That can happen if your bundle is holding a bundle lock and you try to acquire the global lock, which in this case the thread holds the bundle lock for the bundle it is trying to start and it needs the global lock to resolve it. If someone else already has the global lock and needs a bundle lock, then it will interrupt the other thread only holding the bundle lock. This avoids deadlocks." [2] : http://mail-archives.apache.org/mod_mbox/felix-users/201102.mbox/%3C4D498058.7080104@ungoverned.org%3E So, this is why I selected HK2, OSGi and JMS.
        Hide
        TangYong added a comment -

        Deeply, I have some new finding as following:

        Recovery Helper Thread is triggered by the following from com.sun.enterprise.v3.server.AppServerStartup,

        if (!postStartupJob())

        { ... }

        And RunLevelControllerThread running PostStartupRunLevel.VAL(20) is triggered by the following:

        if (!proceedTo(PostStartupRunLevel.VAL))

        { ... }

        Apparently, if postStartupJob has not truly been finished and RunLevelControllerThread running PostStartupRunLevel.VAL(20) starts to run, then, the issue or similar issues(module state changing) can happen.

        Tang

        Show
        TangYong added a comment - Deeply, I have some new finding as following: Recovery Helper Thread is triggered by the following from com.sun.enterprise.v3.server.AppServerStartup, if (!postStartupJob()) { ... } And RunLevelControllerThread running PostStartupRunLevel.VAL(20) is triggered by the following: if (!proceedTo(PostStartupRunLevel.VAL)) { ... } Apparently, if postStartupJob has not truly been finished and RunLevelControllerThread running PostStartupRunLevel.VAL(20) starts to run, then, the issue or similar issues(module state changing) can happen. Tang
        Hide
        TangYong added a comment -

        A fixing way is delaying Recovery Helper Thread's running and after PostStartupRunLevel has been finished, by sending event, running Recovery Helper Thread.

        about detailed fixing way, pl. seeing and confirming the patch.

        Thanks
        Tang

        Show
        TangYong added a comment - A fixing way is delaying Recovery Helper Thread's running and after PostStartupRunLevel has been finished, by sending event, running Recovery Helper Thread. about detailed fixing way, pl. seeing and confirming the patch. Thanks Tang
        Hide
        TangYong added a comment -

        Adding JTS component and request JTS leader to confirm the patch and evaluate the issue.

        Show
        TangYong added a comment - Adding JTS component and request JTS leader to confirm the patch and evaluate the issue.

          People

          • Assignee:
            paul_parkinson
            Reporter:
            TangYong
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: