[GLASSFISH-15347] java.lang.OutOfMemoryError: Java heap space and other failures during Nile Book Store longevity run. Created: 25/Dec/10  Updated: 28/Dec/10  Resolved: 27/Dec/10

Status: Closed
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1_b33
Fix Version/s: 3.1_b34

Type: Bug Priority: Blocker
Reporter: zorro Assignee: Mahesh Kannan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris Sparc jdk 6u22


Issue Links:
Duplicate
is duplicated by GLASSFISH-15357 Command stop-cluster failed after 600... Resolved
Tags: 3_1-blocking

 Description   

b33 started 7-day longevity runs using NileBookStore bigapp against a 3-node cluster on 4 sparc solaris machines.

Bug:
After a few hours of run the following exceptions were thrown massively.
Note: Intermittently transactions succeed.

[#|2010-12-25T13:58:04.991-0800|SEVERE|oracle-glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=16;_ThreadName=Thread-1;|java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at com.sun.enterprise.web.PEAccessLogValve.log(PEAccessLogValve.java:652)
at com.sun.enterprise.web.PEAccessLogValve.run(PEAccessLogValve.java:1122)
at java.lang.Thread.run(Thread.java:662)

#]

[#|2010-12-24T22:35:15.600-0800|WARNING|oracle-glassfish3.1|org.shoal.ha.cache.command.load_request|_ThreadID=16;_ThreadName=Thread-1;|LoadRequestCommand timed out while waiting for result java.util.concurrent.TimeoutException|#]

[#|2010-12-24T22:35:15.900-0800|WARNING|oracle-glassfish3.1|org.shoal.ha.cache.command.load_request|_ThreadID=16;_ThreadName=Thread-1;|LoadRequestCommand timed out while waiting for result java.util.concurrent.TimeoutException|#]

[#|2010-12-24T22:35:17.000-0800|WARNING|oracle-glassfish3.1|org.shoal.ha.cache.command.load_request|_ThreadID=16;_ThreadName=Thread-1;|LoadRequestCommand timed out while waiting for result java.util.concurrent.TimeoutException|#]

[#|2010-12-24T22:35:18.530-0800|WARNING|oracle-glassfish3.1|org.shoal.ha.cache.command.load_request|_ThreadID=16;_ThreadName=Thread-1;|LoadRequestCommand timed out while waiting for result java.util.concurrent.TimeoutException|#]

org.shoal.ha.cache.command.save|_ThreadID=16;_ThreadName=Thread-1;|Aborting command transmission for ReplicationFramePayloadCommand:1 because beforeTransmit returned false|#]
[#|2010-12-25T13:41:39.169-0800|WARNING|oracle-glassfish3.1|ShoalLogger|_ThreadID=16;_ThreadName=Thread-1;|Error during groupHandle.sendMessage(null, /NileBookStore; size=287193|#]

java.net.SocketException: Invalid argument
at sun.nio.ch.Net.setIntOption0(Native Method)
at sun.nio.ch.Net.setIntOption(Net.java:157)
at sun.nio.ch.SocketChannelImpl$1.setInt(SocketChannelImpl.java:406)
at sun.nio.ch.SocketOptsImpl.setBoolean(SocketOptsImpl.java:38)
at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(SocketOptsImpl.java:284)
at sun.nio.ch.OptionAdaptor.setTcpNoDelay(OptionAdaptor.java:48)
at sun.nio.ch.SocketAdaptor.setTcpNoDelay(SocketAdaptor.java:268)
at com.sun.grizzly.http.SelectorThread.setSocketOptions(SelectorThread.java:1490)
at com.sun.grizzly.http.SelectorThreadHandler.configureChannel(SelectorThreadHandler.java:91)
at com.sun.grizzly.http.SelectorThreadHandler.onAcceptInterest(SelectorThreadHandler.java:102)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:300)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263)
at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

#]

All logs:
http://aras2.us.oracle.com:8080/logs/gf31/gms/set_12_25_10_t_14_13_41/scenario_0001_Sat_Dec_25_14_14_09_PST_2010/

physical location:
/net/asqe-logs.us.oracle.com/export1/gms/gf31/gms/set_12_25_10_t_14_13_41/scenario_0001_Sat_Dec_25_14_14_09_PST_2010/



 Comments   
Comment by zorro [ 27/Dec/10 ]

7-day run against nightly build 33 stopped after 2 days with failures stated above.
Stopping cluster failed with:
asadmin stop-cluster clusterz1
No response from Domain Admin Server after 600 seconds.
The command is either taking too long to complete or the server has failed.
Please see the server log files for command status.
Command stop-cluster failed.

all logs:
http://aras2.us.oracle.com:8080/logs/gf31/gms/set_12_27_10_t_12_14_01/scenario_0001_Mon_Dec_27_12_31_14_PST_2010/

physical location.
/net/asqe-logs.us.oracle.com/export1/gms/gf31/gms/set_12_27_10_t_12_14_01/scenario_0001_Mon_Dec_27_12_31_14_PST_2010/

Comment by shreedhar_ganapathy [ 27/Dec/10 ]

Based on feedback from Rajiv and Sony, the issue seems to be the same as the one reported in 15231 which was seen in b33 and fixed in b34.

Also the heap size for Niles app should be -Xmx1024m based on input from Sony from runs in prior releases. The domain xml shows the run was set at 512m.

Please run with b34 and if you see this issue, please reopen it.

Comment by Mahesh Kannan [ 27/Dec/10 ]

Closing this based on Shreedhar's comment

Generated at Fri Dec 02 23:38:48 UTC 2016 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.