sailfin
  1. sailfin
  2. SAILFIN-1929

Memory leak when running B2BUA scenario with SSR enabled

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: Linux
      Platform: Other

      Description

      ********************************************************************************

      • Template v0.1 ( 05/01/08 )
      • Sailfin Stress test issue
        ********************************************************************************
        Sailfin Build :28
        Cluster size : 9 instances
        Happens in a single instance (y/n) n/a: NA
        Test id : st6_1_b2bua
        Location of the test : as-telco-sqe/stress-ws/b2bua
        JDK version : 1.6.0_16 64 bit
        CLB used : Yes
        HW LB used : NO
        SSR enabled : Yes
        SIPp cps - 225 on a 8 core-machine.
        **********************************************************************

      Opening an issue to track the memory leak issue reported with b2bua scenario
      when running with SSR enabled. Mahesh and Bhavani are already working on this.

        Activity

        Hide
        sonymanuel added a comment -

        Add system-test keyword.

        Show
        sonymanuel added a comment - Add system-test keyword.
        Hide
        Bhavanishankar added a comment -

        Check in summary of some of the common fixes for this (issue 1926) and b2bua
        scenario (issue 1929) are at

        https://sailfin.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=7432
        https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31094
        https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31092
        https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31093

        These fixes address the functional issues in b2bua and conference, and the fixes
        will be available in b30.

        Show
        Bhavanishankar added a comment - Check in summary of some of the common fixes for this (issue 1926) and b2bua scenario (issue 1929) are at https://sailfin.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=7432 https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31094 https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31092 https://glassfish.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=31093 These fixes address the functional issues in b2bua and conference, and the fixes will be available in b30.
        Hide
        shreedhar_ganapathy added a comment -

        Transferring to Mahesh since he is looking into it

        Show
        shreedhar_ganapathy added a comment - Transferring to Mahesh since he is looking into it
        Hide
        Mahesh Kannan added a comment -

        The current run is going on fine for 72 hours without any OOM. No intances were
        shutdown during this run. We are still awaitig confirmation from QE that this
        test indeed doesn't require any failure.

        However, we started a run last Thursday (08/27) and introduced a failure on
        08/28. For a period of one hour we see a lot of traffic as explained by Scott:

        <scott>
        The sipp UAC clients all report receiving ~677K bad messages (which works out to
        166 messages/second, which is the call rate – so it's as if the server is
        handling 2x the call rate during this period).

        The cause of this is from this scenario:
        uac sends invite
        uas receives invite and send back 200 OK
        uac sends ACK routed through the failed instance and gets an I/O error (481 return)
        uac treats terminates the call
        uas resends the 200 OK every 500 milliseconds
        uac discards all these messages because it can't map them to a known call

        </scott>

        However, we did notice that the memory on server itself was well under control
        and didn't exhinit any OOM.

        So, if QE agrees that we can run B2BUA as a happy scenario, we will mark this as
        fixed as the current run has already run for 72 hours.

        Show
        Mahesh Kannan added a comment - The current run is going on fine for 72 hours without any OOM. No intances were shutdown during this run. We are still awaitig confirmation from QE that this test indeed doesn't require any failure. However, we started a run last Thursday (08/27) and introduced a failure on 08/28. For a period of one hour we see a lot of traffic as explained by Scott: <scott> The sipp UAC clients all report receiving ~677K bad messages (which works out to 166 messages/second, which is the call rate – so it's as if the server is handling 2x the call rate during this period). The cause of this is from this scenario: uac sends invite uas receives invite and send back 200 OK uac sends ACK routed through the failed instance and gets an I/O error (481 return) uac treats terminates the call uas resends the 200 OK every 500 milliseconds uac discards all these messages because it can't map them to a known call </scott> However, we did notice that the memory on server itself was well under control and didn't exhinit any OOM. So, if QE agrees that we can run B2BUA as a happy scenario, we will mark this as fixed as the current run has already run for 72 hours.
        Hide
        Mahesh Kannan added a comment -

        This is leak is not observed during our B2BUA runs which has been running for
        close to 96 hours without OOM.

        We are still thinking about introducing transient failure either to B2BUA or
        Subscribe-Refresh system tests.

        Closing this issue since this bug is for the OOM errors.

        Show
        Mahesh Kannan added a comment - This is leak is not observed during our B2BUA runs which has been running for close to 96 hours without OOM. We are still thinking about introducing transient failure either to B2BUA or Subscribe-Refresh system tests. Closing this issue since this bug is for the OOM errors.

          People

          • Assignee:
            Mahesh Kannan
            Reporter:
            sonymanuel
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: