glassfish
  1. glassfish
  2. GLASSFISH-18267

Traffic loss during instance start between the time 8080 is up and application is loaded

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.2
    • Fix Version/s: 4.1
    • Component/s: web_container
    • Labels:
      None
    • Environment:

      OEL6 + JRockit

      Description

      Build : 3.1.2 build 17 , JRockit VM 6u29., OEL6

      When running HA Stress tests we are observing traffic loss when a cluster instance is restarting. There is a time gap between the time when port 8080 is up and application loading is done. Loadbalancer probably detects http port is up and starts forwarding traffic. In this particular case, the time is approx. 11 sec and the request rate is 25 reqs/instance. 325 requests failed.

      The client receives a 404 during this time :

      SEVERE: 4xx or 5xx Response code - 404 HttpClientSession-47926
      Jan 17, 2012 10:58:01 PM com.sun.dft.glassfish.stress.ReadWriteTest$MyTestListener postInvoke
      SEVERE: <Unable to render embedded object: File (//www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>GlassFish v3 - Error report</title><style type="text/css"><) not found.--H1

      {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}

      H2

      {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}

      H3

      {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}

      BODY

      {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}

      B

      {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}

      P

      {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}

      A

      {color : black;}

      HR

      {color : #525D76;}

      --></style> </head><body><h1>HTTP Status 404 - </h1><hr/><p><b>type</b> Status report</p><p><b>message</b></p><p><b>description</b>The requested resource () is not available.</p><hr/><h3>GlassFish Server Open Source Edition 3.1.2-b17</h3></body></html>

      Server log :

      http://agni-1.us.oracle.com/net/asqe-logs.us.oracle.com/export1/3.1.2/Results/build17/ha/oel6_stress/stress/com.sun.dft.glassfish.stress.ReadWriteTest/testReadWrite/logs/st-cluster/instance101/logs/server.log

      [#|2012-01-17T22:58:00.873-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0169: Created HTTP listener [http-listener-1] on host/port [0.0.0.0:28080]|#]

      [#|2012-01-17T22:58:00.994-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0169: Created HTTP listener [http-listener-2] on host/port [0.0.0.0:28181]|#]

      [#|2012-01-17T22:58:01.025-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0169: Created HTTP listener [admin-listener] on host/port [0.0.0.0:24848]|#]

      [#|2012-01-17T22:58:01.509-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0171: Created virtual server [server]|#]

      [#|2012-01-17T22:58:01.518-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0171: Created virtual server [__asadmin]|#]

      [#|2012-01-17T22:58:05.306-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0172: Virtual server [server] loaded default web module []|#]

      [#|2012-01-17T22:58:08.619-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.services.impl|_ThreadID=10;_ThreadName=Thread-2;|core.start_container_done|#]

      [#|2012-01-17T22:58:09.940-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=10;_ThreadName=Thread-2;|SEC1002: Security Manager is OFF.|#]

      [#|2012-01-17T22:58:10.115-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=10;_ThreadName=Thread-2;|SEC1010: Entering Security Startup Service|#]

      [#|2012-01-17T22:58:10.147-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=10;_ThreadName=Thread-2;|SEC1143: Loading policy provider com.sun.enterprise.security.provider.PolicyWrapper.|#]

      [#|2012-01-17T22:58:10.405-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security.auth.realm|_ThreadID=10;_ThreadName=Thread-2;|SEC1115: Realm [admin-realm] of classtype [com.sun.enterprise.security.auth.realm.file.FileRealm] successfully created.|#]

      [#|2012-01-17T22:58:10.417-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security.auth.realm|_ThreadID=10;_ThreadName=Thread-2;|SEC1115: Realm [file] of classtype [com.sun.enterprise.security.auth.realm.file.FileRealm] successfully created.|#]

      [#|2012-01-17T22:58:10.539-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security.auth.realm|_ThreadID=10;_ThreadName=Thread-2;|SEC1115: Realm [certificate] of classtype [com.sun.enterprise.security.auth.realm.certificate.CertificateRealm] successfully created.|#]

      [#|2012-01-17T22:58:10.591-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=10;_ThreadName=Thread-2;|SEC1011: Security Service(s) Started Successfully|#]

      [#|2012-01-17T22:58:13.758-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=10;_ThreadName=Thread-2;|**GroupServiceProvider:: REGISTERED member event listeners for <group, instance> => <st-cluster, instance101>|#]

      [#|2012-01-17T22:58:14.061-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=10;_ThreadName=Thread-2;|WEB0671: Loading application [ReadWriteServletTest] at [/ReadWriteServletTest]|#]

      [#|2012-01-17T22:58:14.076-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=10;_ThreadName=Thread-2;|CORE10010: Loading application ReadWriteServletTest done in 5,380 ms|#]

      [#|2012-01-17T22:58:14.085-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=10;_ThreadName=Thread-2;|GlassFish Server Open Source Edition 3.1.2-b17 (17) startup time : Felix (8,857ms), startup services(21,300ms), total(30,157ms)|#]

      1. framework.log.1
        631 kB
        sonymanuel
      2. server.log
        245 kB
        sonymanuel

        Issue Links

          Activity

          Hide
          Shing Wai Chan added a comment -

          I notice the following in the server.log:

          exception or error occurred in the container during the request processing
          java.lang.NullPointerException
          at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)
          at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407)
          at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396)
          at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052)
          at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015)
          at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151)
          at org.apache.catalina.connector.Request.doGetSession(Request.java:2852)
          at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777)
          at org.apache.catalina.connector.Request.unlockSession(Request.java:4201)
          at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342)
          at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231)
          at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174)
          at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:849)
          at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:746)
          at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1045)
          at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:228)
          at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
          at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
          at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
          at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
          at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
          at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
          at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
          at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
          at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
          at java.lang.Thread.run(Thread.java:662)

          Show
          Shing Wai Chan added a comment - I notice the following in the server.log: exception or error occurred in the container during the request processing java.lang.NullPointerException at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97) at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407) at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396) at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052) at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015) at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151) at org.apache.catalina.connector.Request.doGetSession(Request.java:2852) at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777) at org.apache.catalina.connector.Request.unlockSession(Request.java:4201) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174) at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:849) at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:746) at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1045) at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:228) at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90) at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79) at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54) at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59) at com.sun.grizzly.ContextTask.run(ContextTask.java:71) at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532) at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513) at java.lang.Thread.run(Thread.java:662)
          Hide
          Shing Wai Chan added a comment -

          Assign to shoal team to investigate the NPE from org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97).

          Show
          Shing Wai Chan added a comment - Assign to shoal team to investigate the NPE from org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97).
          Hide
          Joe Fialli added a comment -

          The reported NPE in shoal is happening AFTER the clustered instance is started and immediately stops. Here are relevant log messages. Shoal GMS has an glassfish PREPARE_SHUTDOWN event handler. As part of that handling,
          GMS leaves the cluster. The server initiated shutdown is at 22:52:23.747. I can not tell if someone ran
          "asadmin stop-instance" on the instance or if something failed in the startup of the server and then glassfish server
          shutdown was initiated. The NPE is not causing the server to shutdown, the NPE is occurring after the instance leaves the gms cluster. The shoal cache code is not handling quick shutdown properly but I do not think that is the main
          issue for this bug.

          [#|2012-01-17T22:44:08.990-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=21;_ThreadName=Thread-2;|SEC1011: Security Service(s) Started Successfully|#]

          [#|2012-01-17T22:44:10.170-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=21;_ThreadName=Thread-2;|**GroupServiceProvider:: REGISTERED member event listeners for <group, instance> => <st-cluster, instance101>|#]

          [#|2012-01-17T22:44:10.361-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=21;_ThreadName=Thread-2;|WEB0671: Loading application [ReadWriteServletTest] at [/ReadWriteServletTest]|#]

          [#|2012-01-17T22:44:10.568-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=21;_ThreadName=Thread-2;|ReadWriteServletTest was successfully deployed in 2,513 milliseconds.|#]

          [#|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;|Server shutdown initiated|#]

          [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#]

          [#|2012-01-17T22:52:23.757-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1096: member: instance101 is leaving group: st-cluster|#]

          [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#]

          [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#]

          <deleted some log messages>

          [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing
          java.lang.NullPointerException
          at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)
          at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407)
          at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396)
          at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052)
          at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015)
          at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151)
          at org.apache.catalina.connector.Request.doGetSession(Request.java:2852)
          at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777)
          at org.apache.catalina.connector.Request.unlockSession(Request.java:4201)
          at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342)
          at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231)
          at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174)
          at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.ja3DThread-2;|PWC3989: An exception or error occurred in the container during the request processing
          java.lang.NullPointerException
          at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)
          at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407)
          at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396)
          at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052)
          at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015)
          at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151)
          at org.apache.catalina.connector.Request.doGetSession(Request.java:2852)
          at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777)
          at org.apache.catalina.connector.Request.unlockSession(Request.java:4201)
          at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342)
          at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter
          Bug tracking, issue tracking and project management software powered by Atlassian JIRA (v4.0.2#472) | Report a problem | Request a feature | Contact administrators

          Show
          Joe Fialli added a comment - The reported NPE in shoal is happening AFTER the clustered instance is started and immediately stops. Here are relevant log messages. Shoal GMS has an glassfish PREPARE_SHUTDOWN event handler. As part of that handling, GMS leaves the cluster. The server initiated shutdown is at 22:52:23.747. I can not tell if someone ran "asadmin stop-instance" on the instance or if something failed in the startup of the server and then glassfish server shutdown was initiated. The NPE is not causing the server to shutdown, the NPE is occurring after the instance leaves the gms cluster. The shoal cache code is not handling quick shutdown properly but I do not think that is the main issue for this bug. [#|2012-01-17T22:44:08.990-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.security.com.sun.enterprise.security|_ThreadID=21;_ThreadName=Thread-2;|SEC1011: Security Service(s) Started Successfully|#] [#|2012-01-17T22:44:10.170-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=21;_ThreadName=Thread-2;|**GroupServiceProvider:: REGISTERED member event listeners for <group, instance> => <st-cluster, instance101>|#] [#|2012-01-17T22:44:10.361-0800|INFO|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=21;_ThreadName=Thread-2;|WEB0671: Loading application [ReadWriteServletTest] at [/ReadWriteServletTest] |#] [#|2012-01-17T22:44:10.568-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=21;_ThreadName=Thread-2;|ReadWriteServletTest was successfully deployed in 2,513 milliseconds.|#] [#|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;|Server shutdown initiated|#] [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#] [#|2012-01-17T22:52:23.757-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1096: member: instance101 is leaving group: st-cluster|#] [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#] [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#] <deleted some log messages> [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing java.lang.NullPointerException at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97) at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407) at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396) at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052) at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015) at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151) at org.apache.catalina.connector.Request.doGetSession(Request.java:2852) at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777) at org.apache.catalina.connector.Request.unlockSession(Request.java:4201) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174) at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.ja3DThread-2;|PWC3989: An exception or error occurred in the container during the request processing java.lang.NullPointerException at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97) at org.glassfish.web.ha.session.management.ReplicationStore.loadFromBackingStore(ReplicationStore.java:407) at org.glassfish.web.ha.session.management.ReplicationStore.load(ReplicationStore.java:396) at org.apache.catalina.session.PersistentManagerBase.doSwapIn(PersistentManagerBase.java:1052) at org.apache.catalina.session.PersistentManagerBase.swapIn(PersistentManagerBase.java:1015) at org.glassfish.web.ha.session.management.ReplicationManagerBase.findSession(ReplicationManagerBase.java:151) at org.apache.catalina.connector.Request.doGetSession(Request.java:2852) at org.apache.catalina.connector.Request.getSessionInternal(Request.java:2777) at org.apache.catalina.connector.Request.unlockSession(Request.java:4201) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter Bug tracking, issue tracking and project management software powered by Atlassian JIRA (v4.0.2#472) | Report a problem | Request a feature | Contact administrators
          Hide
          Joe Fialli added a comment -

          The NPE in shoal cache in this bug report duplicates GLASSFISH-18268.
          The NPE is happening AFTER the instance is shutdown.

          [#|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;|Server shutdown initiated|#]

          [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#]

          [#|2012-01-17T22:52:23.757-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1096: member: instance101 is leaving group: st-cluster|#]

          [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#]

          [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#]

          [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing
          java.lang.NullPointerException
          at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)

          Show
          Joe Fialli added a comment - The NPE in shoal cache in this bug report duplicates GLASSFISH-18268 . The NPE is happening AFTER the instance is shutdown. [#|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;|Server shutdown initiated|#] [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#] [#|2012-01-17T22:52:23.757-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1096: member: instance101 is leaving group: st-cluster|#] [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#] [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#] [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing java.lang.NullPointerException at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)
          Hide
          sonymanuel added a comment -

          To clarify NPE is reported as a separate bug 18268. It has not connection with this particular issue.

          Test scenario is as follows :

          1. Deploy app.
          2. Start http traffic (75 new requests per sec)
          3. After 5 minutes, stop a cluster instance (asadmin stop-instance)

          • This shows the NPE in server logs (18268)
            4. After 5 minutes, start instance. During the instance start there is traffic loss (this issue). LB probably detects http port is up and starts forwarding traffic. But the application loading is not complete.
            5. After startup is complete test runs fine for few hours.
          Show
          sonymanuel added a comment - To clarify NPE is reported as a separate bug 18268. It has not connection with this particular issue. Test scenario is as follows : 1. Deploy app. 2. Start http traffic (75 new requests per sec) 3. After 5 minutes, stop a cluster instance (asadmin stop-instance) This shows the NPE in server logs (18268) 4. After 5 minutes, start instance. During the instance start there is traffic loss (this issue). LB probably detects http port is up and starts forwarding traffic. But the application loading is not complete. 5. After startup is complete test runs fine for few hours.
          Hide
          Joe Fialli added a comment -

          this bug is concerning the http traffic loss.
          The NPE in shoal cache is occurring after the server instance is being shutdown.
          This issue is already reported as GF-18268 and has nothing to do with the traffic loss.
          The instance is already shutdown when the NPE is occurring (as described in 18268).

          The NPE is 2 seconds after server shutdown was initiated. This is exactly what GLASSFISH-18268 in the server.log attached to this issue.

          To summarize, see that server shutdown is initiated (probably by asdmin stop-instance) and
          then the NPE happens 2 seconds later.

          #|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;|
          Server shutdown initiated|#]

          [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#]

          [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#]

          [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#]

          [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing
          java.lang.NullPointerException
          at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)

          Show
          Joe Fialli added a comment - this bug is concerning the http traffic loss. The NPE in shoal cache is occurring after the server instance is being shutdown. This issue is already reported as GF-18268 and has nothing to do with the traffic loss. The instance is already shutdown when the NPE is occurring (as described in 18268). The NPE is 2 seconds after server shutdown was initiated. This is exactly what GLASSFISH-18268 in the server.log attached to this issue. To summarize, see that server shutdown is initiated (probably by asdmin stop-instance) and then the NPE happens 2 seconds later. #|2012-01-17T22:52:23.747-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=22;_ThreadName=Thread-2;| Server shutdown initiated|#] [#|2012-01-17T22:52:23.752-0800|INFO|glassfish3.1.2|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=22;_ThreadName=Thread-2;|GMSAD1008: GMSAdapter for member: instance101 group: st-cluster received GlassfishEventType: prepare_shutdown|#] [#|2012-01-17T22:52:23.759-0800|INFO|glassfish3.1.2|ShoalLogger|_ThreadID=22;_ThreadName=Thread-2;|GMS1010: Leaving GMS group: st-cluster with shutdown type set to InstanceShutdown|#] [#|2012-01-17T22:52:23.774-0800|CONFIG|glassfish3.1.2|ShoalLogger|_ThreadID=23;_ThreadName=Thread-2;|GMS1065: Completed processing outstanding master node messages for member: instance101 group: st-cluster outstandingMessages to process: 1|#] [#|2012-01-17T22:52:25.453-0800|SEVERE|glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=29;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing java.lang.NullPointerException at org.shoal.adapter.store.ReplicatedBackingStore.load(ReplicatedBackingStore.java:97)
          Hide
          sb110099 added a comment -

          Upgrading to P2 as discussed with Sony .

          Show
          sb110099 added a comment - Upgrading to P2 as discussed with Sony .
          Hide
          Shing Wai Chan added a comment -

          While AppServerStartup.run, the following is called in order:
          (a) GrizzlyService create a GrizzlyProxy, which creates ContainerMapper and Mapper.
          (b) WebContainer.postConstruct is invoked and then calls Connector.start and the above mapper is set.
          (c) WebApplication.start is called for each web application

          If the request comes before (b) or (c) are done, then ContainerMapper will process the request and hence 404 in this case.

          The fix should be to let (b) and (c) finish. Then the request will be handled by CoyoteAdapter. One may need a mechanism to notify that the loading is started and done.

          Show
          Shing Wai Chan added a comment - While AppServerStartup.run, the following is called in order: (a) GrizzlyService create a GrizzlyProxy, which creates ContainerMapper and Mapper. (b) WebContainer.postConstruct is invoked and then calls Connector.start and the above mapper is set. (c) WebApplication.start is called for each web application If the request comes before (b) or (c) are done, then ContainerMapper will process the request and hence 404 in this case. The fix should be to let (b) and (c) finish. Then the request will be handled by CoyoteAdapter. One may need a mechanism to notify that the loading is started and done.
          Hide
          Shing Wai Chan added a comment -

          In this case, Grizzly should not process the request before (b) and (c) are completed.

          Show
          Shing Wai Chan added a comment - In this case, Grizzly should not process the request before (b) and (c) are completed.
          Hide
          Joe Di Pol added a comment -

          As discussed in Bug Swat we will not stop the 3.1.2 release to fix this bug, therefore I am excluding it from the release. Note that these tests (HA stress) were not run on 3.1.1 so it is difficult to know if this is a regression or not, but engineering thinks that the same design problem exists in 3.1.1.

          Show
          Joe Di Pol added a comment - As discussed in Bug Swat we will not stop the 3.1.2 release to fix this bug, therefore I am excluding it from the release. Note that these tests (HA stress) were not run on 3.1.1 so it is difficult to know if this is a regression or not, but engineering thinks that the same design problem exists in 3.1.1.
          Hide
          oleksiys added a comment -

          not sure I understand why it's assigned to Grizzly...
          Grizzly has no idea if any other container is going or not going to be started as part of GF startup.

          Show
          oleksiys added a comment - not sure I understand why it's assigned to Grizzly... Grizzly has no idea if any other container is going or not going to be started as part of GF startup.
          Hide
          Shing Wai Chan added a comment -

          One can know when other containers are started in AppServerStartup.run.
          The issue is in GrizzlyService (a) above. (a) has done the following:
          set up mapper
          (ii) creates ContainerMapper
          (iii) ContainerMapper serves requests.

          need to happens before (b) above.
          (iii) need to happens after other containers are started.

          The fix need to separate and (iii) within (a). That is why I assign the issue to Grizzly team.

          Show
          Shing Wai Chan added a comment - One can know when other containers are started in AppServerStartup.run. The issue is in GrizzlyService (a) above. (a) has done the following: set up mapper (ii) creates ContainerMapper (iii) ContainerMapper serves requests. need to happens before (b) above. (iii) need to happens after other containers are started. The fix need to separate and (iii) within (a). That is why I assign the issue to Grizzly team.
          Hide
          Rebecca Parks added a comment -

          Added to the 3.1.2 Release Notes:

          Description

          A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when port 8080 is running and when application loading is complete. During this gap client requests are denied with a 404 error.

          Workaround

          None. The client must retry the request after application loading is complete.

          Show
          Rebecca Parks added a comment - Added to the 3.1.2 Release Notes: Description A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when port 8080 is running and when application loading is complete. During this gap client requests are denied with a 404 error. Workaround None. The client must retry the request after application loading is complete.
          Hide
          oleksiys added a comment - - edited

          Should be fixed as part of GLASSFISH-18211 fix.

          Show
          oleksiys added a comment - - edited Should be fixed as part of GLASSFISH-18211 fix.
          Hide
          Tom Mueller added a comment -

          Has this issue been resolved in 4.0?

          Show
          Tom Mueller added a comment - Has this issue been resolved in 4.0?
          Hide
          oleksiys added a comment -

          It hasn't. This issue and GLASSFISH-18211 have to be revisited for Glassfish v4.

          Show
          oleksiys added a comment - It hasn't. This issue and GLASSFISH-18211 have to be revisited for Glassfish v4.
          Hide
          oleksiys added a comment -

          fixed

          Project: glassfish
          Revision: 62129
          Author: oleksiys
          Date: 2013-05-30 01:27:59 UTC

          Log Message:
          ------------
          + fix issue #18267
          https://java.net/jira/browse/GLASSFISH-18267

          "Traffic loss during instance start between the time 8080 is up and application is loaded"

          Startup AJP/JK listener only after receiving Glassfish SERVER_READY event

          Show
          oleksiys added a comment - fixed Project: glassfish Revision: 62129 Author: oleksiys Date: 2013-05-30 01:27:59 UTC Log Message: ------------ + fix issue #18267 https://java.net/jira/browse/GLASSFISH-18267 "Traffic loss during instance start between the time 8080 is up and application is loaded" Startup AJP/JK listener only after receiving Glassfish SERVER_READY event
          Hide
          Gail Risdal added a comment -

          The following has been added to the release notes:

          Traffic loss during instance start between the time 8080 is up and application is loaded (18267)

          Description
          A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when port 8080 is running and when application loading is complete. During this gap client requests are denied with a 404 error.

          Workaround
          None. The client must retry the request after application loading is complete.

          Show
          Gail Risdal added a comment - The following has been added to the release notes: Traffic loss during instance start between the time 8080 is up and application is loaded (18267) Description A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when port 8080 is running and when application loading is complete. During this gap client requests are denied with a 404 error. Workaround None. The client must retry the request after application loading is complete.
          Hide
          Gail Risdal added a comment -

          Actually, looks like the write-up was carried forward from the 3.1.2 Release Notes.

          Show
          Gail Risdal added a comment - Actually, looks like the write-up was carried forward from the 3.1.2 Release Notes.
          Hide
          oleksiys added a comment -

          @Gail

          can we change the description a bit like:

          Traffic loss during instance start between the time AJP (Apache) connector port is up and application is loaded (18267)

          Description
          A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when AJP (Apache) connector port is running and when application loading is complete. During this gap client requests are denied with a 404 error.

          Show
          oleksiys added a comment - @Gail can we change the description a bit like: Traffic loss during instance start between the time AJP (Apache) connector port is up and application is loaded (18267) Description A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when AJP (Apache) connector port is running and when application loading is complete. During this gap client requests are denied with a 404 error.
          Hide
          dlaudams added a comment - - edited

          A workaround:

          GlassFish will not start without an enabled listener. To get around this I add a secondary listener on a different port, disable the primary listener, restart/deploy, and then enabled the primary listener when running again.

          My deployment sequence looks like this:

          asadmin set server.network-config.network-listeners.network-listener.http-listener-1.enabled=false
          asadmin undeploy <appname>
          asadmin stop-domain domain1
          asadmin start-domain domain1
          asadmin deploy --name <appname> <earfile>
          asadmin set server.network-config.network-listeners.network-listener.http-listener-1.enabled=true

          Another benefit is that post-deployment checkout can be done against the secondary port before enabling the primary port.

          Show
          dlaudams added a comment - - edited A workaround: GlassFish will not start without an enabled listener. To get around this I add a secondary listener on a different port, disable the primary listener, restart/deploy, and then enabled the primary listener when running again. My deployment sequence looks like this: asadmin set server.network-config.network-listeners.network-listener.http-listener-1.enabled=false asadmin undeploy <appname> asadmin stop-domain domain1 asadmin start-domain domain1 asadmin deploy --name <appname> <earfile> asadmin set server.network-config.network-listeners.network-listener.http-listener-1.enabled=true Another benefit is that post-deployment checkout can be done against the secondary port before enabling the primary port.
          Hide
          Gail Risdal added a comment -

          Revised the Description in the release notes write-up to now read:

          A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when the AJP (Apache) connector port is running and when application loading is complete. During this gap client requests are denied with a 404 error.

          I left the title as it was because that reflects the actual title of the JIRA issue.

          Show
          Gail Risdal added a comment - Revised the Description in the release notes write-up to now read: A traffic loss occurs when a clustered server instance is restarting. There is a time gap of a few seconds between when the AJP (Apache) connector port is running and when application loading is complete. During this gap client requests are denied with a 404 error. I left the title as it was because that reflects the actual title of the JIRA issue.
          Hide
          oleksiys added a comment -

          Ok, thank you Gail!

          Show
          oleksiys added a comment - Ok, thank you Gail!

            People

            • Assignee:
              oleksiys
              Reporter:
              sonymanuel
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: