[GLASSFISH-15017] cpu 100% after grizzly-comet sends/receives large chunks of data Created: 07/Dec/10  Updated: 10/Dec/10  Resolved: 10/Dec/10

Status: Resolved
Project: glassfish
Component/s: grizzly-kernel
Affects Version/s: 3.1_b30
Fix Version/s: 3.1_ms07

Type: Bug Priority: Major
Reporter: johanvos Assignee: oleksiys
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux marc 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux


Attachments: Zip Archive comettest.zip     Java Archive File grizzly-comet-1.9.24.jar     Zip Archive server.log.zip    

 Description   

I have a problem when sending/receiving large Strings from/to Comet-grizzly. I've uploaded a simple example at http://lodgon.com/johan/comettest.zip
The example consists of 2 parts:

1. CometTestServer: a WAR with 2 servlets and a CometHandler.

2. CometTestClient: a Java client with a Main class and a AsyncRedReader class that calls a servlet and receives input.

The basic idea is to send a String from the client to the server, and the server uses Comet to send it to all registered clients.

In the Main class on the client, I can set the size of the message. In case the message is small (< 100K characters, it works fine). In case the message is large (e.g. > 100K characters), no input is received. Sometimes, this already happens with 1 client, while during other attempts I have to start e.g. 4 clients before it stops working.

The Glassfish server uses 100% CPU once the problem occurs, a Grizzly TCP Selector thread keeps running and uses a CPU. The stacktrace below is a snapshot generated by

asadmin generate-jvm-report --type=thread

Thread "Thread-41" thread-id: 92 thread-state: RUNNABLE
at: sun.nio.ch.FileDispatcher.read0(Native Method)
at: sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at: sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
at: sun.nio.ch.IOUtil.read(IOUtil.java:210)
at: sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at: com.sun.grizzly.comet.CometTask.checkIfClientClosedConnection(CometTask.java:192)
at: com.sun.grizzly.comet.CometTask.handleSelectedKey(CometTask.java:178)
at: com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:274)
at: com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:258)
at: com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:195)
at: com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
at: java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at: java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at: java.lang.Thread.run(Thread.java:619)

Most of the snapshots I take have this Thread in CometTask.checkIfClientClosedConnection, but it seems to be looping forever somewhere. The execution time of this thread is equal to the time since the client-hang occurs.

Note that the CometHandler.onEvent() tells that everything has been sent to the client, but the client didn't receive anything.

I added some debug info to Comet (grizzly-comet 1.9.18o) and I have more info:

when sending small messages, the CometTask.handleSelectedKey(SelectionKey selectionKey) only gets called when the client closes the connection.

With large messages, this method is called infintely after comet has written the message to the output:

[#|2010-12-06T21:17:07.897+0100|SEVERE|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadID=31;_ThreadName=Thread-1;|java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1206)
at com.sun.grizzly.comet.CometTask.handleSelectedKey(CometTask.java:163)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:274)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:258)
at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:195)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



 Comments   
Comment by oleksiys [ 07/Dec/10 ]

Hi,

can I ask you to try the same testcase with GF3.1 latest promoted?

Thanks.

Comment by oleksiys [ 07/Dec/10 ]

One more thing to try...
can you pls. disable http keep-alive, by setting HTTP system property [1]
-Dhttp.keepAlive=false

Thanks.

[1] http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.html

Comment by Joeri Sykora [ 08/Dec/10 ]

I've downloaded the latest promoted build (glassfish-3.1-b32.zip), enabled comet-support on http-listener-1 and added the following system properties to the server-config:

-Dhttp.keepAlive=false
-Dcom.sun.grizzly.http.asyncwrite.enabled=true

The problem still occured with this configuration. One of the Grizzly-kernel-thread(1) threads was using 100% of the CPU.

Comment by oleksiys [ 09/Dec/10 ]

please replace the comet jar in GF/glassfish/modules with the attached one (rename it to grizzly-comet.jar) and reproduce the problem.
then pls. send me the server.log

Thank you.

Comment by Joeri Sykora [ 09/Dec/10 ]

Glassfish server.log after running test attached. The things I did were:

  • asadmin start-domain
  • run comettestclient
  • wait a few seconds
  • kill hanging comettestclient
  • asadmin stop-domain
Comment by oleksiys [ 09/Dec/10 ]

Joeri, can you pls. disable async write and retry?

Comment by Joeri Sykora [ 10/Dec/10 ]

With asyncwrite disabled it works. However, we explicitly set this property to true because of another bug. In some specific cases when a client disconnects, the CometWorker thread is hanging in epollWait:

Thread "CometWorker2" thread-id: 79 thread-state: RUNNABLE Running in native
at: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at: sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at: sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at: sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
at: sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:124)
at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:74)
at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushChannel(SocketChannelOutputBuffer.java:321)
at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushBuffer(SocketChannelOutputBuffer.java:392)
at: com.sun.grizzly.http.SocketChannelOutputBuffer.flush(SocketChannelOutputBuffer.java:370)
at: com.sun.grizzly.http.ProcessorTask.action(ProcessorTask.java:1174)
at: com.sun.grizzly.tcp.Response.action(Response.java:266)
at: org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:378)
at: org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:351)
at: org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:176)
at: java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1784)
at: java.io.ObjectOutputStream.flush(ObjectOutputStream.java:691)
at: org.redfx.server.comet.DincCometHandler.onEvent(DincCometHandler.java:61)
at: com.sun.grizzly.comet.DefaultNotificationHandler.notify0(DefaultNotificationHandler.java:189)
at: com.sun.grizzly.comet.DefaultNotificationHandler$1.run(DefaultNotificationHandler.java:133)
at: com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:330)
at: com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:309)
at: java.lang.Thread.run(Thread.java:619)

Comment by oleksiys [ 10/Dec/10 ]

fixed on Grizzly trunk.
will be a part of GF with the next Grizzly integration.

Generated at Fri Dec 09 22:15:59 UTC 2016 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.