[GLASSFISH-15425] [STRESS][umbrella] 24x7 RichAccess run on OEL with JRockit-jdk1.6.0_22 failed. Created: 04/Jan/11  Updated: 05/Jul/11  Due: 18/Jan/11

Status: Open
Project: glassfish
Component/s: other
Affects Version/s: 3.1_b34
Fix Version/s: None

Type: New Feature Priority: Critical
Reporter: varunrupela Assignee: scatari
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive re-run-logs.zip    
Issue Links:
Dependency
depends on GLASSFISH-16568 GMS can select incorrect network inte... In Progress
depends on GLASSFISH-15377 [STRESS] java.lang.ArrayIndexOutOfBou... Reopened
depends on GLASSFISH-15376 [STRESS] java.lang.ArrayIndexOutOfBou... Resolved
depends on GLASSFISH-15503 [STRESS] JRockit: Deadlock observed f... Resolved
depends on GLASSFISH-15426 [STRESS] "java.util.concurrent.Reject... Closed
depends on GLASSFISH-15427 [STRESS] ClassFormatError observed fr... Closed
depends on GLASSFISH-15428 [STRESS] NoClassDefFoundError from Sh... Closed
Tags: 3_1-next, 3_1_1-scrubbed

 Description   

This is a Umbrella Issue Opened to track the BigApps - RichAccess - run on Windows 2008.

************
Build : GF nightly build 35 from 28-Dec-2010.
Platform : OEL
JDK: JRockit-jdk 1.6.0_22
Setup : 3 Instance Cluster
Application: RichAccess
Target Duration: 24x7
Completed Duration: 24x4
Client Settings: call_rate 20

Observations: The run saw an exception from the grizzly layer and a crash on all the instances. Some other exceptions were also observed. Separate issues will be filed for each observed issue.
************



 Comments   
Comment by Joe Fialli [ 04/Jan/11 ]

It appears that instance03 stopped responding to GMS heartbeat and was reported as suspect and then failed.
This can happen when the GC is running for long periods of times and not allowing for an instance to respond to its
heartbeat within 6 seconds. However, since the instance was also confirmed failed, that means that GMS was not able
to create a new TCP connection to the instance and ping it with 10 seconds of the missed heartbeats.

It would assist in evaluation if the DAS server log is attached since that provides an overview of unexpected
failures. Then the server log and gc log of that instance that failed first should be inspected to
see if it is long GC and/or OutOfMemory issues that is causing that instance to seem failed. Or if there is a fatal
exception showing that instance has crashed.

NoClassDefFound error reported in http://java.net/jira/browse/GLASSFISH-15428 seems more likely to occur on a
failing JVM due to OutOfMemory issues.

Comment by varunrupela [ 05/Jan/11 ]

Started a re-run to collect more information.
Logs location has been sent by e-mail to Joe.

Comment by Nazrul [ 05/Jan/11 ]

Umbrella bug; excluding from un-scrubbed list

Comment by varunrupela [ 10/Jan/11 ]

The re-run of a of this scenario to debug an initial failure (where the instances crashed) lead to the following observations:

[Filed Bugster Issue: 7011216 for the below problem]

  • The initial run and the re-run, both show the below stack trace almost exactly after 4 days and 6 hrs:
    [#|2011-01-09T23:20:02.662+0530|SEVERE|glassfish3.1|grizzly|_ThreadID=13;_ThreadName=Thread-3;|doSelect exception
    java.lang.NoClassDefFoundError: java/lang/String
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
    at java.lang.Class.getMethod0(Class.java:2670)
    at java.lang.Class.getMethod(Class.java:1603)
    at com.sun.enterprise.connectors.jms.system.ActiveJmsResourceAdapter.handleRequest(ActiveJmsResourceAdapter.java:2196)
    at com.sun.enterprise.v3.services.impl.ServiceInitializerHandler.onAcceptInterest(ServiceInitializerHandler.java:114)
    at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:301)
    at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263)
    at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200)
    at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    #]
  • 15 minutes after the above stack the following is logged by Grizzly:
    [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(3).|#]
    [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(5).|#]
    [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(1).|#]
    [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(4).|#]
    [#|2011-01-09T23:35:02.944+0530|WARNING|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=14;_ThreadName=Thread-3;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-28080(2).|#]
  • One instance survived (i.e. did not crash) and so we were able to take a jstack on it. The jstack shows the attached deadlock, but we do not yet know the exact time the deadlock appeared.
  • 2 of the instances crashed at different times. instance102 crashed on 10th Jan, 02:07, while instance103 crashed on 10th Jan, 00:24.
    [Filed Bugster Issue: 7011219]

Issues for all the above issues will be filed and linked to this issue.

Comment by varunrupela [ 10/Jan/11 ]

Attached logs for the re-run.

Comment by varunrupela [ 10/Jan/11 ]

Here's a netstat out for instance101's jms port. instance101 did not crash.

[root@sf-x2200-21 log]# netstat -an | grep 27676
tcp 0 0 ::ffff:127.0.0.1:15400 ::ffff:127.0.0.1:27676 ESTABLISHED
tcp 4 0 ::ffff:127.0.0.1:27676 ::ffff:127.0.0.1:15400 ESTABLISHED

Comment by Nazrul [ 18/Jan/11 ]

In the release note, you may refer to the following two JRockit issues:

https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=11070311
https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=11070336

Comment by Nazrul [ 08/Mar/11 ]

We need to provide the necessary information to JRockit team. Getting help from Chris.

Comment by Scott Fordin [ 23/Mar/11 ]

Need more info to add issue to 3.1 Release Notes.

Comment by Chris Kasso [ 07/Apr/11 ]

This item does not need to be covered in the Release Notes since we have documented that there is no support for JRocket in 3.1

Comment by Chris Kasso [ 23/May/11 ]

Transferring ownership of this umbrella issue to Sathyan because he owns the 3.1.1 release.

Comment by scatari [ 02/Jun/11 ]

Required tracking bug for 3.1.1 support for JRockit.

Comment by scatari [ 25/Jun/11 ]

JRockit support has been deferred to next release.

Comment by scatari [ 05/Jul/11 ]

JRockit support has been deferred out of 3.1.1.

Generated at Sun Apr 19 14:57:45 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.