sailfin
  1. sailfin
  2. SAILFIN-1866

Under load, CANCEL retransmissions not sent, and NPE

    Details

    • Issuezilla Id:
      1,866

      Description

      A load test with CANCEL causes incorrect call flows and NullPointerExceptions.

      My servlet is a B2BUA whose main job, apart from a few 3PCC tasks, is to monitor
      SIP traffic, and therefore it attempts to be as transparent as possible most of
      the time; messages are usually just created with B2buaHelper and forwarded.

      The load test "client" test sends a basic INVITE, and then waits for a
      provisional response for a maximum of around a minute, and once it receives a
      provisional response it sends a CANCEL (it is written with JAIN, and all it does
      is a simple ClientTransaction.createCancel()), and waits for a final response to
      both requests. If it doesn't get a provisional response within around a minute,
      it forcefully terminates the client transaction; sometimes also the INVITE
      client transaction will time out locally.

      The load test "server" test waits for an INVITE, then sends a 180 response, and
      repeats the transmission of the 180 every 5 seconds as long as it has not seen a
      CANCEL. Once it receives a CANCEL, it replies to that with a 200 and to the
      INVITE with a 487. If it doesn't receive a CANCEL within around a minute, it
      replies to the INVITE with a 408 response.

      We are using Sun GlassFish Communications Server 1.5 (9.1.1) (build b60g-fcs) on
      Linux 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:48:10 UTC 2009 i686 GNU/Linux.

      Unfortunately, because it's a load test and there is a lot of traffic, and
      because the SIP messages that cause it never reaches my servlet, it's difficult
      to diagnose, and I am not sure if the two problems (the NPE and the incorrect
      CANCEL call flow) are actually related.

      Here is a call flow I was able to capture with Wireshark. It looks like the
      server transaction that is handling the CANCEL terminates at about the same time
      the INVITE transaction terminates, instead of staying alive to handle
      retransmissions of the CANCEL request.

      __Client_______Servlet__

       
      --INVITE---->
      <-100--------
      <-180--------
      --CANCEL---->
      x-487/INV-----
      x-200/CAN-----
      --CANCEL(r)-?
      --CANCEL(r)-?
      <-487/INV(r)-
      --ACK------->
      --CANCEL(r)->
      <-481--------
       

      Note: (r) are retransmissions, correctly timed. The entire flow only spans a few
      seconds. The ? indicates that I can't tell whether the CANCEL was received by
      the container and there was no reaction, or that the CANCEL packet was lost (I
      think the latter is more likely).

      I am also seeing the log message:
      WARNING|sun-glassfish-comms-server1.5|javax.enterprise.system.container.sip|_ThreadID=145;_ThreadName=Thread-61;_RequestID=e38d2208-d070-4d9c-9b13-2c5ae40ba38f;|Transaction
      was null: z9hG4bK76aaff7d8ab84e9e834f5c5d6d252c9f

      Lastly, here is the NPE from the server logs - this, and the "Transaction was
      null" error above, has apparently been fixed by the patch from Binod.

      java.lang.NullPointerException
      at
      com.ericsson.ssa.sip.transaction.InviteServerTransaction.handleCancel(InviteServerTransaction.java:427)
      at
      com.ericsson.ssa.sip.transaction.TransactionManager.invokeCreatedOrFetchedServerTransaction(TransactionManager.java:220)
      at
      com.ericsson.ssa.sip.transaction.TransactionManager.next(TransactionManager.java:275)
      at com.ericsson.ssa.sip.LayerHelper.next(LayerHelper.java:59)
      at
      com.ericsson.ssa.container.OutboundFlowManager.processOutboundRequest(OutboundFlowManager.java:183)
      at com.ericsson.ssa.container.OutboundFlowManager.next(OutboundFlowManager.java:98)
      at com.ericsson.ssa.sip.LayerHelper.next(LayerHelper.java:59)
      at
      com.ericsson.ssa.container.GrizzlyNetworkManager.next(GrizzlyNetworkManager.java:1266)
      at com.ericsson.ssa.sip.LayerHelper.next(LayerHelper.java:59)
      at
      com.ericsson.ssa.container.MessageProcessorFilter.processMessage(MessageProcessorFilter.java:406)
      at
      com.ericsson.ssa.container.MessageProcessorFilter.execute(MessageProcessorFilter.java:300)
      at
      com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:136)
      at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:103)
      at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:89)
      at
      com.ericsson.ssa.container.GrizzlyNetworkManager$SharedCallbackHandler.onRead(GrizzlyNetworkManager.java:2749)
      at
      com.sun.grizzly.CallbackHandlerContextTask.doCall(CallbackHandlerContextTask.java:76)
      at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:56)
      at com.sun.grizzly.util.WorkerThreadImpl.processTask(WorkerThreadImpl.java:325)
      at com.sun.grizzly.util.WorkerThreadImpl.run(WorkerThreadImpl.java:184)

        Activity

        Hide
        haukex added a comment -

        Created an attachment (id=1049)
        NPE patch by Binod, seems to fix NPE described in issue

        Show
        haukex added a comment - Created an attachment (id=1049) NPE patch by Binod, seems to fix NPE described in issue
        Hide
        sankara added a comment -

        Reassigning the issue to Binod. Binod is working on 1805, which is similar to
        this issue.

        Show
        sankara added a comment - Reassigning the issue to Binod. Binod is working on 1805, which is similar to this issue.
        Hide
        binod added a comment -

        Cancel retransmission handling wasnt proper. It was tied to the lifecycle of the invite transaction. The
        fix is to make sure that cancel retransmissions are handled by the cancel transaction, without touching
        the invite transaction.

        Checking in the fix. The fix worked for me in my environment. However, when hauke come back and
        confirm that the fix works in his environment, I will mark the bug as fixed.

        bash-3.2$ cvs -e vi commit TransactionManager.java
        Checking in TransactionManager.java;
        /cvs/sailfin/sip-stack/src/java/com/ericsson/ssa/sip/transaction/TransactionManager.java,v <--
        TransactionManager.java
        new revision: 1.41; previous revision: 1.40
        done

        Show
        binod added a comment - Cancel retransmission handling wasnt proper. It was tied to the lifecycle of the invite transaction. The fix is to make sure that cancel retransmissions are handled by the cancel transaction, without touching the invite transaction. Checking in the fix. The fix worked for me in my environment. However, when hauke come back and confirm that the fix works in his environment, I will mark the bug as fixed. bash-3.2$ cvs -e vi commit TransactionManager.java Checking in TransactionManager.java; /cvs/sailfin/sip-stack/src/java/com/ericsson/ssa/sip/transaction/TransactionManager.java,v <-- TransactionManager.java new revision: 1.41; previous revision: 1.40 done
        Hide
        binod added a comment -

        Marking it as fixed, since hauke confirmed that his test is working now.

        Show
        binod added a comment - Marking it as fixed, since hauke confirmed that his test is working now.
        Hide
        binod added a comment -

        Corrected the version where the bug is fixed.

        Show
        binod added a comment - Corrected the version where the bug is fixed.

          People

          • Assignee:
            binod
            Reporter:
            haukex
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: