sailfin
  1. sailfin
  2. SAILFIN-2016

High traffic loss (SessionExpired messages) observed in the converged app system test

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      Build 31d
      Parent Issue - 1983

      Session loss for the converged app scenario is partly indicated by the number of
      "SessionExpired" messages in the server logs. On loss of SIP or HTTP requests,
      the calls do not complete which in turn leads to sessions not invalidating
      correctly and that causes the session to expire. The "SessionExpired" message is
      being printed from the applications call back listener.

      ~2300 "SessionExpired" messages were observed in the cluster. This number is
      high with respect to the total number of active sessions, which is ~30K
      (call-rate * call-length * no-of-clients = 37 * 90s * 9 = 29970)

      Some of the items to look into that may help with debugging the issue further:
      1. SessionExpired messages can be co-related to 3 kinds of errors on the SIPp
      clients: 503 responses, 481 response and timeouts.
      2. The SIPp client error logs show that some of the the BYE requests timeout
      upto a minute after the first one that times-out. We expect to only see a few
      BYEs timing out 30 seconds after the failure (since the SIPp scenario has a 30
      second timeout on the receive of 200 OK for the BYE). Only a few should timeout
      as a timeout would be expected if the BYE request reached the failed instance
      just before it failed or was lost in the pipeline...

      Note that the SessionExpired message appear on all instance (other than the
      failed/restarted one) for about 10 minutes after failure detection. This is
      because of the session-timeout value being set to 10 min for the app.

      Some Numbers:

      • "SessionExpired" messages in the cluster = 2286
      • SIPp client errors = 1078 (no-of-481-responses - 142, no-of-503-responses -
        479, no-of-timeouts - 437)
      • HTTP client failed sessions = 854
      • Total reported client errors = 1078 + 854 = 1932

      Now, some remove-replica messages are also expected to be lost. For this run the
      FLUSH_INTERVAL_FOR_REMOVAL_MILLIS was set to 1000 (i.e. 1 second). If we
      consider 2 second worth of remove-replica's being lost, there should be a
      maximum of ((37 SIP + 37 HTTP) * 2) = 148 SessionExpired messages caused by this.

        Issue Links

          Activity

            People

            • Assignee:
              Mahesh Kannan
              Reporter:
              varunrupela
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: