glassfish
  1. glassfish
  2. GLASSFISH-15017

cpu 100% after grizzly-comet sends/receives large chunks of data

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1_b30
    • Fix Version/s: 3.1_ms07
    • Component/s: grizzly-kernel
    • Labels:
      None
    • Environment:

      Linux marc 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux

      Description

      I have a problem when sending/receiving large Strings from/to Comet-grizzly. I've uploaded a simple example at http://lodgon.com/johan/comettest.zip
      The example consists of 2 parts:

      1. CometTestServer: a WAR with 2 servlets and a CometHandler.

      2. CometTestClient: a Java client with a Main class and a AsyncRedReader class that calls a servlet and receives input.

      The basic idea is to send a String from the client to the server, and the server uses Comet to send it to all registered clients.

      In the Main class on the client, I can set the size of the message. In case the message is small (< 100K characters, it works fine). In case the message is large (e.g. > 100K characters), no input is received. Sometimes, this already happens with 1 client, while during other attempts I have to start e.g. 4 clients before it stops working.

      The Glassfish server uses 100% CPU once the problem occurs, a Grizzly TCP Selector thread keeps running and uses a CPU. The stacktrace below is a snapshot generated by

      asadmin generate-jvm-report --type=thread

      Thread "Thread-41" thread-id: 92 thread-state: RUNNABLE
      at: sun.nio.ch.FileDispatcher.read0(Native Method)
      at: sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      at: sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
      at: sun.nio.ch.IOUtil.read(IOUtil.java:210)
      at: sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
      at: com.sun.grizzly.comet.CometTask.checkIfClientClosedConnection(CometTask.java:192)
      at: com.sun.grizzly.comet.CometTask.handleSelectedKey(CometTask.java:178)
      at: com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:274)
      at: com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:258)
      at: com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:195)
      at: com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
      at: java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at: java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at: java.lang.Thread.run(Thread.java:619)

      Most of the snapshots I take have this Thread in CometTask.checkIfClientClosedConnection, but it seems to be looping forever somewhere. The execution time of this thread is equal to the time since the client-hang occurs.

      Note that the CometHandler.onEvent() tells that everything has been sent to the client, but the client didn't receive anything.

      I added some debug info to Comet (grizzly-comet 1.9.18o) and I have more info:

      when sending small messages, the CometTask.handleSelectedKey(SelectionKey selectionKey) only gets called when the client closes the connection.

      With large messages, this method is called infintely after comet has written the message to the output:

      [#|2010-12-06T21:17:07.897+0100|SEVERE|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadID=31;_ThreadName=Thread-1;|java.lang.Exception: Stack trace
      at java.lang.Thread.dumpStack(Thread.java:1206)
      at com.sun.grizzly.comet.CometTask.handleSelectedKey(CometTask.java:163)
      at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:274)
      at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:258)
      at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:195)
      at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

        Activity

        Hide
        oleksiys added a comment -

        Hi,

        can I ask you to try the same testcase with GF3.1 latest promoted?

        Thanks.

        Show
        oleksiys added a comment - Hi, can I ask you to try the same testcase with GF3.1 latest promoted? Thanks.
        Hide
        oleksiys added a comment -

        One more thing to try...
        can you pls. disable http keep-alive, by setting HTTP system property [1]
        -Dhttp.keepAlive=false

        Thanks.

        [1] http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.html

        Show
        oleksiys added a comment - One more thing to try... can you pls. disable http keep-alive, by setting HTTP system property [1] -Dhttp.keepAlive=false Thanks. [1] http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.html
        Hide
        Joeri Sykora added a comment -

        I've downloaded the latest promoted build (glassfish-3.1-b32.zip), enabled comet-support on http-listener-1 and added the following system properties to the server-config:

        -Dhttp.keepAlive=false
        -Dcom.sun.grizzly.http.asyncwrite.enabled=true

        The problem still occured with this configuration. One of the Grizzly-kernel-thread(1) threads was using 100% of the CPU.

        Show
        Joeri Sykora added a comment - I've downloaded the latest promoted build (glassfish-3.1-b32.zip), enabled comet-support on http-listener-1 and added the following system properties to the server-config: -Dhttp.keepAlive=false -Dcom.sun.grizzly.http.asyncwrite.enabled=true The problem still occured with this configuration. One of the Grizzly-kernel-thread(1) threads was using 100% of the CPU.
        Hide
        oleksiys added a comment -

        please replace the comet jar in GF/glassfish/modules with the attached one (rename it to grizzly-comet.jar) and reproduce the problem.
        then pls. send me the server.log

        Thank you.

        Show
        oleksiys added a comment - please replace the comet jar in GF/glassfish/modules with the attached one (rename it to grizzly-comet.jar) and reproduce the problem. then pls. send me the server.log Thank you.
        Hide
        Joeri Sykora added a comment - - edited

        Glassfish server.log after running test attached. The things I did were:

        • asadmin start-domain
        • run comettestclient
        • wait a few seconds
        • kill hanging comettestclient
        • asadmin stop-domain
        Show
        Joeri Sykora added a comment - - edited Glassfish server.log after running test attached. The things I did were: asadmin start-domain run comettestclient wait a few seconds kill hanging comettestclient asadmin stop-domain
        Hide
        oleksiys added a comment -

        Joeri, can you pls. disable async write and retry?

        Show
        oleksiys added a comment - Joeri, can you pls. disable async write and retry?
        Hide
        Joeri Sykora added a comment -

        With asyncwrite disabled it works. However, we explicitly set this property to true because of another bug. In some specific cases when a client disconnects, the CometWorker thread is hanging in epollWait:

        Thread "CometWorker2" thread-id: 79 thread-state: RUNNABLE Running in native
        at: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at: sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
        at: sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at: sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        at: sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:124)
        at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:74)
        at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushChannel(SocketChannelOutputBuffer.java:321)
        at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushBuffer(SocketChannelOutputBuffer.java:392)
        at: com.sun.grizzly.http.SocketChannelOutputBuffer.flush(SocketChannelOutputBuffer.java:370)
        at: com.sun.grizzly.http.ProcessorTask.action(ProcessorTask.java:1174)
        at: com.sun.grizzly.tcp.Response.action(Response.java:266)
        at: org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:378)
        at: org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:351)
        at: org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:176)
        at: java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1784)
        at: java.io.ObjectOutputStream.flush(ObjectOutputStream.java:691)
        at: org.redfx.server.comet.DincCometHandler.onEvent(DincCometHandler.java:61)
        at: com.sun.grizzly.comet.DefaultNotificationHandler.notify0(DefaultNotificationHandler.java:189)
        at: com.sun.grizzly.comet.DefaultNotificationHandler$1.run(DefaultNotificationHandler.java:133)
        at: com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:330)
        at: com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:309)
        at: java.lang.Thread.run(Thread.java:619)

        Show
        Joeri Sykora added a comment - With asyncwrite disabled it works. However, we explicitly set this property to true because of another bug. In some specific cases when a client disconnects, the CometWorker thread is hanging in epollWait: Thread "CometWorker2" thread-id: 79 thread-state: RUNNABLE Running in native at: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at: sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at: sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at: sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at: sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:124) at: com.sun.grizzly.util.OutputWriter.flushChannel(OutputWriter.java:74) at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushChannel(SocketChannelOutputBuffer.java:321) at: com.sun.grizzly.http.SocketChannelOutputBuffer.flushBuffer(SocketChannelOutputBuffer.java:392) at: com.sun.grizzly.http.SocketChannelOutputBuffer.flush(SocketChannelOutputBuffer.java:370) at: com.sun.grizzly.http.ProcessorTask.action(ProcessorTask.java:1174) at: com.sun.grizzly.tcp.Response.action(Response.java:266) at: org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:378) at: org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:351) at: org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:176) at: java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1784) at: java.io.ObjectOutputStream.flush(ObjectOutputStream.java:691) at: org.redfx.server.comet.DincCometHandler.onEvent(DincCometHandler.java:61) at: com.sun.grizzly.comet.DefaultNotificationHandler.notify0(DefaultNotificationHandler.java:189) at: com.sun.grizzly.comet.DefaultNotificationHandler$1.run(DefaultNotificationHandler.java:133) at: com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:330) at: com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:309) at: java.lang.Thread.run(Thread.java:619)
        Hide
        oleksiys added a comment -

        fixed on Grizzly trunk.
        will be a part of GF with the next Grizzly integration.

        Show
        oleksiys added a comment - fixed on Grizzly trunk. will be a part of GF with the next Grizzly integration.

          People

          • Assignee:
            oleksiys
            Reporter:
            johanvos
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: