glassfish
  1. glassfish
  2. GLASSFISH-15537

restart-instance takes a long time on Solaris 11 (sun.security.pkcs11.SunPKCS11)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1
    • Fix Version/s: future release
    • Component/s: admin
    • Labels:
      None
    • Environment:

      Oracle Solaris 11 Express snv_151a X86
      java version "1.6.0_23"
      Private build based on r44408

      Description

      restart-instance often takes minutes to execute on Solaris 11. To reproduce:

      1. Start the DAS. On the DAS machine:
        asadmin create-local-instance i1
        asadmin start-local-instance i1
        asadmin stop-local-instance i1
        asadmin start-local-instance i1
      1. So far, so good. This all works great. I can stop/start
      2. many times and I don't see the problem. Then do:

      asadmin restart-instance i1

      Sometimes this returns promptly. But if I run restart repeatedly
      often it takes a while to return (one time 9 minutes).
      The instance does start eventually. If I do a jstack of
      the instance process while it is "hung" I see consistently this
      thread which appears to be blocked in a pkcs11 call (see stack
      trace below).

      I have not been able to reproduce this on Linux.

      "FelixStartLevel" daemon prio=3 tid=0x089a5400 nid=0xe runnable [0xcd17e000]
      java.lang.Thread.State: RUNNABLE
      at sun.security.pkcs11.wrapper.PKCS11.C_Initialize(Native Method)
      at sun.security.pkcs11.wrapper.PKCS11.getInstance(PKCS11.java:160)

      • locked <0xcebc0c58> (a java.lang.Class for sun.security.pkcs11.wrapper.PKCS11)
        at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:281)
        at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:86)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at sun.security.jca.ProviderConfig$3.run(ProviderConfig.java:243)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.security.jca.ProviderConfig.doLoadProvider(ProviderConfig.java:225)
        at sun.security.jca.ProviderConfig.getProvider(ProviderConfig.java:205)
      • locked <0xf1e8f838> (a sun.misc.Launcher$AppClassLoader)
        at sun.security.jca.ProviderList.getProvider(ProviderList.java:215)
        at sun.security.jca.ProviderList.getService(ProviderList.java:313)
        at sun.security.jca.GetInstance.getInstance(GetInstance.java:140)
        at java.security.Security.getImpl(Security.java:659)
        at java.security.MessageDigest.getInstance(MessageDigest.java:129)
        at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1759)
        at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:52)
        at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:205)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:202)
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:558)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at java.util.HashMap.readObject(HashMap.java:1030)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at org.jvnet.hk2.osgiadapter.OSGiModulesRegistryImpl.loadCachedData(OSGiModulesRegistryImpl.java:195)
        at org.jvnet.hk2.osgiadapter.OSGiModulesRegistryImpl.<init>(OSGiModulesRegistryImpl.java:91)
        at org.jvnet.hk2.osgiadapter.OSGiFactoryImpl.createModulesRegistry(OSGiFactoryImpl.java:70)
        at org.jvnet.hk2.osgiadapter.OSGiFactoryImpl.createModulesRegistry(OSGiFactoryImpl.java:52)
        at org.jvnet.hk2.osgiadapter.HK2Main.createModulesRegistry(HK2Main.java:185)
        at org.jvnet.hk2.osgiadapter.HK2Main.start(HK2Main.java:142)
        at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:629)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:1827)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:922)
        at org.glassfish.kernel.GlassFishActivator.startBundle(GlassFishActivator.java:239)
        at org.glassfish.kernel.GlassFishActivator.startBundles(GlassFishActivator.java:194)
        at org.glassfish.kernel.GlassFishActivator.start(GlassFishActivator.java:80)
        at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:629)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:1827)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:922)
        at org.jvnet.hk2.osgimain.Main.start(Main.java:154)
        at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:629)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:1827)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1148)
        at org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264)
        at java.lang.Thread.run(Thread.java:619)

      The instance log file has nothing of interest. The message before the "hang" is:

      Jan 11, 2011 5:21:30 PM com.sun.enterprise.admin.launcher.GFLauncherLogger info
      INFO: Successfully launched in 39 msec.

      and the first message after the "hang" is:

      Jan 11, 2011 5:25:02 PM null
      INFO: Running GlassFish Version: GlassFish Server Open Source Edition 3.1-SNAPSHOT (build dipol-private)
      [#|2011-01-11T17:25:02.483-0800|INFO|glassfish3.1|null|_ThreadID=12;_ThreadName=Thread-1;|Grizzly Framework 1.9.28 started in: 89ms - bound to [0.0.0.0:28080]|#]

      1. rpc.c
        15 kB
        Joe Di Pol

        Activity

        Hide
        Scott Fordin added a comment -

        Added issue to 3.1.1 Release Notes.

        Show
        Scott Fordin added a comment - Added issue to 3.1.1 Release Notes.
        Hide
        Paul Davies added a comment -

        The Release Notes should just mention the problem of intermittent slow start on Solaris 11. There is no workaround (except, perhaps, to wait).

        Show
        Paul Davies added a comment - The Release Notes should just mention the problem of intermittent slow start on Solaris 11. There is no workaround (except, perhaps, to wait).
        Hide
        Byron Nevins added a comment -

        In 3.1 this will be documented.
        It should be fixed for 3.2

        Show
        Byron Nevins added a comment - In 3.1 this will be documented. It should be fixed for 3.2
        Hide
        Joe Di Pol added a comment - - edited

        sendTCSDPacket() and send_init() are in the Trouser 0.3.4 client code (0.3.4 looks to be the version on my Solaris 11 box). The Trouser 0.3.4 source can be found at: http://sourceforge.net/projects/trousers/files/trousers/0.3.4/. I've attached rpc.c that contains sendTCSDPacket() and send_init(). Nothing seems obviously wrong. Note in particular that send_init() does close the socket on an error.

        Show
        Joe Di Pol added a comment - - edited sendTCSDPacket() and send_init() are in the Trouser 0.3.4 client code (0.3.4 looks to be the version on my Solaris 11 box). The Trouser 0.3.4 source can be found at: http://sourceforge.net/projects/trousers/files/trousers/0.3.4/ . I've attached rpc.c that contains sendTCSDPacket() and send_init(). Nothing seems obviously wrong. Note in particular that send_init() does close the socket on an error.
        Hide
        Joe Di Pol added a comment - - edited

        More data:

        I ran "truss -f -t connect -v connect -p 13064", where 13064 was the pid of the instance I was going to restart. Here is the interesting part:

        When things work fine I see:

        13076/2: connect(50, 0xFE12D2A0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED
        13076/2: AF_INET name = 127.0.0.1 port = 30003
        13076/2: connect(50, 0xFE12D220, 16, SOV_DEFAULT) Err#146 ECONNREFUSED
        13076/2: AF_INET name = 127.0.0.1 port = 30003
        13076/2: connect(50, 0xFE12D2C0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED
        13076/2: AF_INET name = 127.0.0.1 port = 30003

        When I get the hang I see:

        13078/14: connect(280, 0xCD1906A0, 16, SOV_DEFAULT) (sleeping...)
        13078/14: AF_INET name = 127.0.0.1 port = 30003
        13076/2: Received signal #13, SIGPIPE [caught]
        . . .
        13076/2: Received signal #13, SIGPIPE [caught]
        13078/14: connect(280, 0xCD1906A0, 16, SOV_DEFAULT) Err#145 ETIMEDOUT
        13078/14: AF_INET name = 127.0.0.1 port = 30003
        13076/2: Received signal #13, SIGPIPE [caught]
        13078/14: connect(280, 0xCD190620, 16, SOV_DEFAULT) (sleeping...)
        13078/14: AF_INET name = 127.0.0.1 port = 30003
        13076/2: Received signal #13, SIGPIPE [caught]
        13076/2: Received signal #13, SIGPIPE [caught]
        13078/14: connect(280, 0xCD190620, 16, SOV_DEFAULT) Err#145 ETIMEDOUT
        13078/14: AF_INET name = 127.0.0.1 port = 30003
        13078/14: connect(280, 0xCD1906C0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED
        13078/14: AF_INET name = 127.0.0.1 port = 30003

        Observations:

        • The connect is trying to connect to port 30003. That's the default tcsd(8) port.
        • When connect is hanging it fails with ETIMEDOUT instead of failing immediately with ECONNREFUSED.
        • It looks like the connect in question is called 3 times.
        • During a "hang" netstat -a shows:
          localhost.59145 localhost.30003 0 0 131072 0 SYN_SENT
        • Netstat shows nothing listening on port 30003
        Show
        Joe Di Pol added a comment - - edited More data: I ran "truss -f -t connect -v connect -p 13064", where 13064 was the pid of the instance I was going to restart. Here is the interesting part: When things work fine I see: 13076/2: connect(50, 0xFE12D2A0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED 13076/2: AF_INET name = 127.0.0.1 port = 30003 13076/2: connect(50, 0xFE12D220, 16, SOV_DEFAULT) Err#146 ECONNREFUSED 13076/2: AF_INET name = 127.0.0.1 port = 30003 13076/2: connect(50, 0xFE12D2C0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED 13076/2: AF_INET name = 127.0.0.1 port = 30003 When I get the hang I see: 13078/14: connect(280, 0xCD1906A0, 16, SOV_DEFAULT) (sleeping...) 13078/14: AF_INET name = 127.0.0.1 port = 30003 13076/2: Received signal #13, SIGPIPE [caught] . . . 13076/2: Received signal #13, SIGPIPE [caught] 13078/14: connect(280, 0xCD1906A0, 16, SOV_DEFAULT) Err#145 ETIMEDOUT 13078/14: AF_INET name = 127.0.0.1 port = 30003 13076/2: Received signal #13, SIGPIPE [caught] 13078/14: connect(280, 0xCD190620, 16, SOV_DEFAULT) (sleeping...) 13078/14: AF_INET name = 127.0.0.1 port = 30003 13076/2: Received signal #13, SIGPIPE [caught] 13076/2: Received signal #13, SIGPIPE [caught] 13078/14: connect(280, 0xCD190620, 16, SOV_DEFAULT) Err#145 ETIMEDOUT 13078/14: AF_INET name = 127.0.0.1 port = 30003 13078/14: connect(280, 0xCD1906C0, 16, SOV_DEFAULT) Err#146 ECONNREFUSED 13078/14: AF_INET name = 127.0.0.1 port = 30003 Observations: The connect is trying to connect to port 30003. That's the default tcsd(8) port. When connect is hanging it fails with ETIMEDOUT instead of failing immediately with ECONNREFUSED. It looks like the connect in question is called 3 times. During a "hang" netstat -a shows: localhost.59145 localhost.30003 0 0 131072 0 SYN_SENT Netstat shows nothing listening on port 30003

          People

          • Assignee:
            Byron Nevins
            Reporter:
            Joe Di Pol
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Due:
              Created:
              Updated: