sailfin
  1. sailfin
  2. SAILFIN-1790

b2bua, ssr enabled a lot of error messages.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      ********************************************************************************

      • Template v0.1 ( 05/01/08 )
      • Sailfin Stress test issue
        ********************************************************************************
        Sailfin Build :15
        Cluster size : 10 instances
        Happens in a single instance (y/n) n/a:
        Test id : st6_1_b2bua
        Location of the test : as-telco-sqe/stress-ws/b2bua
        JDK version : 1.6.0_12 64bits
        CLB used : Yes
        HW LB used : NO
        **********************************************************************

      10 SuSE machines. SSR enabled. Was used TCP t1. I've tried a loading 300 cps
      (run1) and loading 200 cps (run2). In both cases, I saw a lot of unexpected
      errors on sipp screens (about 30000 unexpected messages per a screen) and a lot
      of different error messages in server.log files. See for examples second run
      logs:
      http://agni-1.sfbay.sun.com/net/asqe-
      logs/export1/SailFin/Results/build15/b2bua_ssr_2

      Were seen such error messages:
      ===========================================================
      1)
      sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=40;_ThreadName=Thread-
      76;|Problem in servlet.
      java.lang.NullPointerException
      at com.sun.asqe.systemtests.b2bua.B2BUAServlet.doResponse
      (B2BUAServlet.java:69)
      at javax.servlet.sip.SipServlet.service(SipServlet.java:48)
      at com.ericsson.ssa.container.sim.SipServletFacade.service
      (SipServletFacade.java:121)
      at com.ericsson.ssa.sip.INVITESession.dispatch(INVITESession.java:839)
      at com.ericsson.ssa.sip.UA.dispatch(UA.java:708)
      at com.ericsson.ssa.sip.transaction.Transaction.dispatchInUOW
      (Transaction.java:234)
      at com.ericsson.ssa.sip.transaction.Transaction.dispatchInUOW
      (Transaction.java:208)
      at com.ericsson.ssa.sip.transaction.NonInviteClientTransaction.timeout
      (NonInviteClientTransaction.java:230)
      at com.ericsson.ssa.sip.timer.GeneralTimerImpl.timeout
      (GeneralTimerImpl.java:189)
      at com.ericsson.ssa.sip.timer.GeneralTimerBase.call
      (GeneralTimerBase.java:72)
      at
      com.ericsson.ssa.container.SipContainerThreadPool$SipContainerThreadPoolThread.r
      un(SipContainerThreadPool.java:295)

      #]

      (see SAILFIN-1787, but in this case there were seen thousands such messages)
      ======================================

      2) [#|2009-05-28T17:34:56.608-0700|INFO|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=40;_ThreadName=Thread-
      76;|Problem in servlet.
      java.lang.NullPointerException
      ================================================
      3) SEVERE|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.clb|_ThreadID=48;_ThreadName=SipCont
      ainer-serversWorkerThread-5060-7;_RequestID=e1a751d0-460d-41ee-a5f6-
      b59631075fdd;|WorkerThreadImpl unexpected exception:
      java.lang.NullPointerException
      at
      org.jvnet.glassfish.comms.clb.core.common.chr.StickyHashKeyExtractor.encodeHashK
      eyToBeKey(StickyHashKeyExtractor.java:245)
      at
      org.jvnet.glassfish.comms.clb.core.sip.SipLoadBalancerBackend.handleOutgoingRequ
      est(SipLoadBalancerBackend.java:175)
      at
      org.jvnet.glassfish.comms.clb.core.sip.SipLoadBalancerManagerBackEnd.dispatch
      (SipLoadBalancerManagerBackEnd.java:230)
      at com.ericsson.ssa.sip.transaction.TransactionManager.dispatch
      (TransactionManager.java:456)
      at com.ericsson.ssa.sip.persistence.ReplicationManager.dispatch
      (ReplicationManager.java:163)
      at com.ericsson.ssa.sip.dns.ResolverManager.dispatch
      (ResolverManager.java:219)
      at com.ericsson.ssa.sip.DialogManager.dispatch(DialogManager.java:808)
      at com.ericsson.ssa.sip.LocalRouteManager.dispatch
      (LocalRouteManager.java:148)
      at
      com.ericsson.ssa.container.sim.ApplicationDispatcher.dispatchViaStatelessProxy
      (ApplicationDispatcher.java:626)
      at com.ericsson.ssa.container.sim.ApplicationDispatcher.dispatch
      (ApplicationDispatcher.java:177)
      at com.ericsson.ssa.sip.FSM$1.call(FSM.java:141)
      at com.sun.grizzly.util.WorkerThreadImpl.processTask
      (WorkerThreadImpl.java:325)
      at com.sun.grizzly.util.WorkerThreadImpl.run(WorkerThreadImpl.java:184)

      #]

      =============================================================

      4) INFO|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=34;_ThreadName=Thread-
      77;|Problem in servlet.
      java.lang.IllegalStateException: The request has already responded with a final
      response.
      at com.ericsson.ssa.sip.SipServletRequestImpl.checkResponseCode
      (SipServletRequestImpl.java:1164)
      at com.ericsson.ssa.sip.SipServletRequestImpl.createResponse
      (SipServletRequestImpl.java:492)
      at com.sun.asqe.systemtests.b2bua.B2BUAServlet.doResponse
      (B2BUAServlet.java:75)
      at javax.servlet.sip.SipServlet.service(SipServlet.java:48)
      at com.ericsson.ssa.container.sim.SipServletFacade.service
      (SipServletFacade.java:121)
      at com.ericsson.ssa.sip.INVITESession.dispatch(INVITESession.java:839)
      at com.ericsson.ssa.sip.UA.dispatch(UA.java:708)
      at com.ericsson.ssa.sip.transaction.Transaction.dispatchInUOW
      (Transaction.java:234)
      at com.ericsson.ssa.sip.transaction.Transaction.dispatchInUOW
      (Transaction.java:208)
      at com.ericsson.ssa.sip.transaction.InviteClientTransaction.timeout
      (InviteClientTransaction.java:254)
      at com.ericsson.ssa.sip.timer.GeneralTimerImpl.timeout
      (GeneralTimerImpl.java:189)
      at com.ericsson.ssa.sip.timer.GeneralTimerBase.call
      (GeneralTimerBase.java:72)
      at
      com.ericsson.ssa.container.SipContainerThreadPool$SipContainerThreadPoolThread.r
      un(SipContainerThreadPool.java:295)

      #]
      ===================================================

      5) [#|2009-05-28T18:02:24.183-0700|SEVERE|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=36;_ThreadName=Thread-
      79;_RequestID=aef4d723-6859-4883-be15-fbbdd2268fd7;|"Caught an error while
      executing task."
      java.lang.NullPointerException
      at com.ericsson.ssa.sip.transaction.InviteClientTransaction.timeout
      (InviteClientTransaction.java:267)
      at com.ericsson.ssa.sip.timer.GeneralTimerImpl.timeout
      (GeneralTimerImpl.java:189)
      at com.ericsson.ssa.sip.timer.GeneralTimerBase.call
      (GeneralTimerBase.java:72)
      at
      com.ericsson.ssa.container.SipContainerThreadPool$SipContainerThreadPoolThread.r
      un(SipContainerThreadPool.java:295)

      #]
      ============================================================

      6) Finally were seen such messages:
      -----------------------------------------------------------
      [#|2009-05-28T18:42:31.728-0700|INFO|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=41;_ThreadName=Thread-
      83;|Number of overdue ServerTransactions removed:1118|#]

      [#|2009-05-28T19:02:33.344-0700|INFO|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=39;_ThreadName=Thread-
      74;|Number of overdue ServerTransactions removed:9590|#]

      [#|2009-05-28T19:22:34.924-0700|INFO|sun-glassfish-comms-
      server1.5|javax.enterprise.system.container.sip|_ThreadID=36;_ThreadName=Thread-
      79;|Number of overdue ServerTransactions removed:225|#]
      ----------------------------------------------------
      And after that no more error messages were seen.

      ==========================================================

      I want to add that I've tried the same run on x86. And that run created a lot
      of errors in server.log files and on sipp screens and then just all
      communication stopped.

        Issue Links

          Activity

          Hide
          easarina added a comment -

          I just want to say that each my SUSE machine has 16 GB RAM and 4 processors:
          Dual-Core AMD Opteron (tm) Processor 2218
          cpu MHz: 2593.136

          Show
          easarina added a comment - I just want to say that each my SUSE machine has 16 GB RAM and 4 processors: Dual-Core AMD Opteron (tm) Processor 2218 cpu MHz: 2593.136
          Hide
          Scott Oaks added a comment -

          Perf team has filed issue 1798 which shows the third exception under load
          without SSR (but is likely has a different root cause than 1797).

          Show
          Scott Oaks added a comment - Perf team has filed issue 1798 which shows the third exception under load without SSR (but is likely has a different root cause than 1797).
          Hide
          Scott Oaks added a comment -

          Adding dependent bugs

          Show
          Scott Oaks added a comment - Adding dependent bugs
          Hide
          easarina added a comment -

          Was aded a key system-test

          Show
          easarina added a comment - Was aded a key system-test
          Hide
          Scott Oaks added a comment -

          Based on the analysis in 1787, these are the same issues: the UAS gets more
          invites than the UAC, and the application doesn't handle that correctly. We see
          more of them in the SSR case than the non-SSR case because SSR is slower, and so
          the timeout described in 1787 is more likely to occur.

          Additionally, the SSR goal for this test is only 100 CPS, so the overload is
          another reason there were more timeouts/failures at this rate.

              • This issue has been marked as a duplicate of 1787 ***
          Show
          Scott Oaks added a comment - Based on the analysis in 1787, these are the same issues: the UAS gets more invites than the UAC, and the application doesn't handle that correctly. We see more of them in the SSR case than the non-SSR case because SSR is slower, and so the timeout described in 1787 is more likely to occur. Additionally, the SSR goal for this test is only 100 CPS, so the overload is another reason there were more timeouts/failures at this rate. This issue has been marked as a duplicate of 1787 ***

            People

            • Assignee:
              Scott Oaks
              Reporter:
              easarina
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: