glassfish
  1. glassfish
  2. GLASSFISH-15474

[Stress][Blocking] RichAccess failing on OEL because of OOM or "message count limit has been reached"

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1_b38
    • Component/s: jms
    • Labels:
      None

      Description

      Build 37 nightly, Jan. 05. OEL5 machines.

      The build was installed on three machines. DAS plus one instance, on asqe-x2250-st5, the other two instances on asqe-x2250-st6 & asqe-x2250-st7. In the cluster, the Stress Application, RichAccess, was deployed. Against each instance in the cluster, a test client was running and sending http requests at a rate that is equivalent to 100 simultaneous users per instance. MQ was set in Embedded mode. Heap size was 768Mb.

      After about one hour of running, either an Out of Memory (OOM) error occurred or the following messages were posted in the MQ log:

      [05/Jan/2011:07:31:19 PST] WARNING [B2011]: Storing of JMS message from IMQConn[AUTHENTICATED,guest@127.0.0.1:22300,null] failed:
      com.sun.messaging.jmq.jmsserver.util.BrokerException: [B4120]: Can not store message 9048323-10.133.184.161(ca:3d:25:9a:7a:46
      )-22300-1294241479417 on destination richAppTopic [Topic]. The destination message count limit (maxNumMsgs) of 100000 has been reached.

      It looks like the OOM condition or "message count limit has been reached" issues were caused by the same problem. It was observed that the MQ messages were slowly consumed.

      For this reason, we believe, the stress Test crashed on this setup.

        Issue Links

          Activity

          Hide
          Satish Kumar added a comment -

          The strange part about this bug is that the stress tests seem to be reporting this warning only on this particular set of machines. Sony and Varun have confirmed that the tests have run for several days on other machines including in OEL and the memory usage did not seem abnormal in any of these setups.

          As has been pointed out in a previous comment, the message consumption rate appears to be too slow causing the head size to grow steadily and eventually resulting in MQ reporting this warning. Stopping the client from sending new messages appears to result in the messages being eventually consumed but at a very slow pace.

          One issue that was noticed while debugging this setup is that the imq.cluster.brokerlist is not being populated correctly. Although this does appear to be directly related to this issue, this could cause an imbalance in the client connections to the broker.

          Show
          Satish Kumar added a comment - The strange part about this bug is that the stress tests seem to be reporting this warning only on this particular set of machines. Sony and Varun have confirmed that the tests have run for several days on other machines including in OEL and the memory usage did not seem abnormal in any of these setups. As has been pointed out in a previous comment, the message consumption rate appears to be too slow causing the head size to grow steadily and eventually resulting in MQ reporting this warning. Stopping the client from sending new messages appears to result in the messages being eventually consumed but at a very slow pace. One issue that was noticed while debugging this setup is that the imq.cluster.brokerlist is not being populated correctly. Although this does appear to be directly related to this issue, this could cause an imbalance in the client connections to the broker.
          Hide
          sb110099 added a comment -

          This bug was filed after a joint debugging session with Mahesh, Amy and Sony.

          Here are follow-up notes from Amy after that :

          "Thanks to Mahesh for spending time yesterday examing the GlassFish servers and Thanks to everyone for yesterday's joined debug session.

          • The message production rate is greater than the message consumption rate as observed yesterday on Elena's setup
            suggestions:
          • reduce load to reduce message production rate
          • increase MDB pool size to speed up message consumption
          • lower max. message limit on the destination richAppTopic
          • increase Jave heap size (for EMBEDDED mode needs much heap than LOCAL/REMOTE)
          • use FLOW_CONTROL on destination limitBehavior (throttle the message producer)
          • The test as its setup now, need to run with balanced message production and consumption rates
          • to monitor message in/out rates: imqcmd metrics dst -t t -n richAppTopic -m rts
          • The OOM as shown in the broker log was "java.lang.OutOfMemoryError: GC overhead limit exceeded"
            which explains why JVM threw OOM even though broker was rejecting new messages under destination full
            suggestions:
          • configure the test to run in a balanced message production/consumtion rate to avoid destination full
          • increase heap size
            or
          • configure the test to run in FLOW_CONTROL destination limitBehavior (default REJECT_NEW)
          • Satish has found an issue in broker addressList in GlassFish/JMS last night which could cause uneven-distribution of client connections to brokers

          Additional info for Sony when he experiments more with the test: Embedded broker in standalone instance uses ra-direct mode whereas embedded broker in 1-instance cluster uses tcp mode as does in n-instance cluster.

          The following 2 bugs noticed in the broker logs (no functional impact):
          7010855 - NPEs logged when GlassFish EMBEDDED broker shutdown due to unrecoverable OOM
          MQ-74 - broker logs NPE when debug dump broker in GlassFish EMBEDDED JMS mode
          "

          Show
          sb110099 added a comment - This bug was filed after a joint debugging session with Mahesh, Amy and Sony. Here are follow-up notes from Amy after that : "Thanks to Mahesh for spending time yesterday examing the GlassFish servers and Thanks to everyone for yesterday's joined debug session. The message production rate is greater than the message consumption rate as observed yesterday on Elena's setup suggestions: reduce load to reduce message production rate increase MDB pool size to speed up message consumption lower max. message limit on the destination richAppTopic increase Jave heap size (for EMBEDDED mode needs much heap than LOCAL/REMOTE) use FLOW_CONTROL on destination limitBehavior (throttle the message producer) The test as its setup now, need to run with balanced message production and consumption rates to monitor message in/out rates: imqcmd metrics dst -t t -n richAppTopic -m rts I'v filed following MQ issue for the unlimited flowcontrol issue noticed in the RichAccess test: MQ-75 http://java.net/jira/browse/MQ-75 The OOM as shown in the broker log was "java.lang.OutOfMemoryError: GC overhead limit exceeded" which explains why JVM threw OOM even though broker was rejecting new messages under destination full suggestions: configure the test to run in a balanced message production/consumtion rate to avoid destination full increase heap size or configure the test to run in FLOW_CONTROL destination limitBehavior (default REJECT_NEW) Satish has found an issue in broker addressList in GlassFish/JMS last night which could cause uneven-distribution of client connections to brokers Additional info for Sony when he experiments more with the test: Embedded broker in standalone instance uses ra-direct mode whereas embedded broker in 1-instance cluster uses tcp mode as does in n-instance cluster. The following 2 bugs noticed in the broker logs (no functional impact): 7010855 - NPEs logged when GlassFish EMBEDDED broker shutdown due to unrecoverable OOM MQ-74 - broker logs NPE when debug dump broker in GlassFish EMBEDDED JMS mode "
          Hide
          Nazrul added a comment -

          This is blocking stress testing

          Show
          Nazrul added a comment - This is blocking stress testing
          Hide
          Nazrul added a comment -

          Amy is workin on MQ-75 issue. So assigning to her.

          Show
          Nazrul added a comment - Amy is workin on MQ-75 issue. So assigning to her.
          Hide
          amyk added a comment - - edited

          MQ 4.5 build25, which contains fixes for MQ-75(7011163), 7011169 and MQ-74, has been integrated and should be in GlassFish 3.1 Jan 13's nightly build and next promoted build

          Show
          amyk added a comment - - edited MQ 4.5 build25, which contains fixes for MQ-75 (7011163), 7011169 and MQ-74 , has been integrated and should be in GlassFish 3.1 Jan 13's nightly build and next promoted build

            People

            • Assignee:
              amyk
              Reporter:
              easarina
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: