glassfish
  1. glassfish
  2. GLASSFISH-15425

[STRESS][umbrella] 24x7 RichAccess run on OEL with JRockit-jdk1.6.0_22 failed.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.1_b34
    • Fix Version/s: None
    • Component/s: other
    • Labels:
      None

      Description

      This is a Umbrella Issue Opened to track the BigApps - RichAccess - run on Windows 2008.

      ************
      Build : GF nightly build 35 from 28-Dec-2010.
      Platform : OEL
      JDK: JRockit-jdk 1.6.0_22
      Setup : 3 Instance Cluster
      Application: RichAccess
      Target Duration: 24x7
      Completed Duration: 24x4
      Client Settings: call_rate 20

      Observations: The run saw an exception from the grizzly layer and a crash on all the instances. Some other exceptions were also observed. Separate issues will be filed for each observed issue.
      ************

        Issue Links

          Activity

          Hide
          Joe Fialli added a comment -

          It appears that instance03 stopped responding to GMS heartbeat and was reported as suspect and then failed.
          This can happen when the GC is running for long periods of times and not allowing for an instance to respond to its
          heartbeat within 6 seconds. However, since the instance was also confirmed failed, that means that GMS was not able
          to create a new TCP connection to the instance and ping it with 10 seconds of the missed heartbeats.

          It would assist in evaluation if the DAS server log is attached since that provides an overview of unexpected
          failures. Then the server log and gc log of that instance that failed first should be inspected to
          see if it is long GC and/or OutOfMemory issues that is causing that instance to seem failed. Or if there is a fatal
          exception showing that instance has crashed.

          NoClassDefFound error reported in http://java.net/jira/browse/GLASSFISH-15428 seems more likely to occur on a
          failing JVM due to OutOfMemory issues.

          Show
          Joe Fialli added a comment - It appears that instance03 stopped responding to GMS heartbeat and was reported as suspect and then failed. This can happen when the GC is running for long periods of times and not allowing for an instance to respond to its heartbeat within 6 seconds. However, since the instance was also confirmed failed, that means that GMS was not able to create a new TCP connection to the instance and ping it with 10 seconds of the missed heartbeats. It would assist in evaluation if the DAS server log is attached since that provides an overview of unexpected failures. Then the server log and gc log of that instance that failed first should be inspected to see if it is long GC and/or OutOfMemory issues that is causing that instance to seem failed. Or if there is a fatal exception showing that instance has crashed. NoClassDefFound error reported in http://java.net/jira/browse/GLASSFISH-15428 seems more likely to occur on a failing JVM due to OutOfMemory issues.
          Hide
          varunrupela added a comment - - edited

          Started a re-run to collect more information.
          Logs location has been sent by e-mail to Joe.

          Show
          varunrupela added a comment - - edited Started a re-run to collect more information. Logs location has been sent by e-mail to Joe.
          Hide
          Nazrul added a comment -

          Umbrella bug; excluding from un-scrubbed list

          Show
          Nazrul added a comment - Umbrella bug; excluding from un-scrubbed list
          Hide
          varunrupela added a comment - - edited

          The re-run of a of this scenario to debug an initial failure (where the instances crashed) lead to the following observations:

          [Filed Bugster Issue: 7011216 for the below problem]

          • The initial run and the re-run, both show the below stack trace almost exactly after 4 days and 6 hrs:
            [#|2011-01-09T23:20:02.662+0530|SEVERE|glassfish3.1|grizzly|_ThreadID=13;_ThreadName=Thread-3;|doSelect exception
            java.lang.NoClassDefFoundError: java/lang/String
            at java.lang.Class.getDeclaredMethods0(Native Method)
            at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
            at java.lang.Class.getMethod0(Class.java:2670)
            at java.lang.Class.getMethod(Class.java:1603)
            at com.sun.enterprise.connectors.jms.system.ActiveJmsResourceAdapter.handleRequest(ActiveJmsResourceAdapter.java:2196)
            at com.sun.enterprise.v3.services.impl.ServiceInitializerHandler.onAcceptInterest(ServiceInitializerHandler.java:114)
            at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:301)
            at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263)
            at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200)
            at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            at java.lang.Thread.run(Thread.java:662)
            #]
          • 15 minutes after the above stack the following is logged by Grizzly:
            [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(3).|#]
            [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(5).|#]
            [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(1).|#]
            [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(4).|#]
            [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(2).|#]
          • One instance survived (i.e. did not crash) and so we were able to take a jstack on it. The jstack shows the attached deadlock, but we do not yet know the exact time the deadlock appeared.
          • 2 of the instances crashed at different times. instance102 crashed on 10th Jan, 02:07, while instance103 crashed on 10th Jan, 00:24.
            [Filed Bugster Issue: 7011219]

          Issues for all the above issues will be filed and linked to this issue.

          Show
          varunrupela added a comment - - edited The re-run of a of this scenario to debug an initial failure (where the instances crashed) lead to the following observations: [Filed Bugster Issue: 7011216 for the below problem] The initial run and the re-run, both show the below stack trace almost exactly after 4 days and 6 hrs: [#|2011-01-09T23:20:02.662+0530|SEVERE|glassfish3.1|grizzly|_ThreadID=13;_ThreadName=Thread-3;|doSelect exception java.lang.NoClassDefFoundError: java/lang/String at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2427) at java.lang.Class.getMethod0(Class.java:2670) at java.lang.Class.getMethod(Class.java:1603) at com.sun.enterprise.connectors.jms.system.ActiveJmsResourceAdapter.handleRequest(ActiveJmsResourceAdapter.java:2196) at com.sun.enterprise.v3.services.impl.ServiceInitializerHandler.onAcceptInterest(ServiceInitializerHandler.java:114) at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:301) at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263) at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200) at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) #] 15 minutes after the above stack the following is logged by Grizzly: [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(3).|#] [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(5).|#] [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(1).|#] [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(4).|#] [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(2).|#] One instance survived (i.e. did not crash) and so we were able to take a jstack on it. The jstack shows the attached deadlock, but we do not yet know the exact time the deadlock appeared. 2 of the instances crashed at different times. instance102 crashed on 10th Jan, 02:07, while instance103 crashed on 10th Jan, 00:24. [Filed Bugster Issue: 7011219] Issues for all the above issues will be filed and linked to this issue.
          Hide
          varunrupela added a comment -

          Attached logs for the re-run.

          Show
          varunrupela added a comment - Attached logs for the re-run.
          Hide
          varunrupela added a comment - - edited

          Here's a netstat out for instance101's jms port. instance101 did not crash.

          [root@sf-x2200-21 log]# netstat -an | grep 27676
          tcp 0 0 ::ffff:127.0.0.1:15400 ::ffff:127.0.0.1:27676 ESTABLISHED
          tcp 4 0 ::ffff:127.0.0.1:27676 ::ffff:127.0.0.1:15400 ESTABLISHED

          Show
          varunrupela added a comment - - edited Here's a netstat out for instance101's jms port. instance101 did not crash. [root@sf-x2200-21 log] # netstat -an | grep 27676 tcp 0 0 ::ffff:127.0.0.1:15400 ::ffff:127.0.0.1:27676 ESTABLISHED tcp 4 0 ::ffff:127.0.0.1:27676 ::ffff:127.0.0.1:15400 ESTABLISHED
          Hide
          Nazrul added a comment -
          Show
          Nazrul added a comment - In the release note, you may refer to the following two JRockit issues: https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=11070311 https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=11070336
          Hide
          Nazrul added a comment -

          We need to provide the necessary information to JRockit team. Getting help from Chris.

          Show
          Nazrul added a comment - We need to provide the necessary information to JRockit team. Getting help from Chris.
          Hide
          Scott Fordin added a comment -

          Need more info to add issue to 3.1 Release Notes.

          Show
          Scott Fordin added a comment - Need more info to add issue to 3.1 Release Notes.
          Hide
          Chris Kasso added a comment -

          This item does not need to be covered in the Release Notes since we have documented that there is no support for JRocket in 3.1

          Show
          Chris Kasso added a comment - This item does not need to be covered in the Release Notes since we have documented that there is no support for JRocket in 3.1
          Hide
          Chris Kasso added a comment -

          Transferring ownership of this umbrella issue to Sathyan because he owns the 3.1.1 release.

          Show
          Chris Kasso added a comment - Transferring ownership of this umbrella issue to Sathyan because he owns the 3.1.1 release.
          Hide
          scatari added a comment -

          Required tracking bug for 3.1.1 support for JRockit.

          Show
          scatari added a comment - Required tracking bug for 3.1.1 support for JRockit.
          Hide
          scatari added a comment -

          JRockit support has been deferred to next release.

          Show
          scatari added a comment - JRockit support has been deferred to next release.
          Hide
          scatari added a comment -

          JRockit support has been deferred out of 3.1.1.

          Show
          scatari added a comment - JRockit support has been deferred out of 3.1.1.

            People

            • Assignee:
              scatari
              Reporter:
              varunrupela
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated: