sailfin
  1. sailfin
  2. SAILFIN-1927

[blocking] Disable/Enable of an instance causing uneven traffic distribution

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: b30
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      ******************************************************************************************************

      • Template v0.1 ( 05/01/08 )
      • Sailfin Stress test issue
        ******************************************************************************************************
        Sailfin Build : 28
        Cluster size : 9
        Happens in a single instance (y/n) ? : NA
        Test id : st2_4_presence_subscribe-refresh-failure
        Location of the test : as-telco-sqe/stress-ws/presence
        JDK version : 1.6.0_16
        CLB used : Yes
        HW LB used : Yes.
        SSR: Enabled

      A 9 instance cluster was used in this test. 4 instances were used as FE/BE
      (instance103, instance104, instance105 and instance109), the other 5 as BE
      alone. 4 SIPp instances were started with the following command to get an
      effective call rate of 140cps and load 150k sessions per instance:

      sipp -t t1 -sf st2_4_presence_subscribe-refresh-failure.xml -r 315 -l 338000 -d
      1073000 -nd -trace_err -trace_screen -trace_logs -buff_size 33554433
      -reconnect_close false -max_reconnect 10 -reconnect_sleep 3000
      <instance-host>:35060

      Rolling Upgrade steps were performed on instance101.
      http://wiki.glassfish.java.net/attach/SFv2FunctionalSpecs/rolling_upgrade_one_pager_ver2.html

      Logs for the run are available at:
      sf-x2200-11:/space/sony/logs/2.0/b28/subscribe-refresh-RU/IEC1-SuSE/

      Issue:
      On running disable-converged-lb-server instance101 (this was done at time -
      11:42:51):
      a. All other instances saw a drop in traffic (See the presence-stats.txt file
      under the instance logs)
      b. instance101 still shows up as a healthy instance in the server logs of other
      instances.

      On running enable-converged-lb-server instance101 (done at time 11:54:28)
      traffic distribution was quite un-even for an extended period of time. To some
      instances traffic re-started only after the reconcile step was completed.

      This caused SIPp to backup calls and send them later in larger spurts causing
      "Cant find matching transaction" errors in the logs and some JXTA errors.

      See file
      sf-x2200-11:/space/sony/logs/2.0/b28/subscribe-refresh-RU/IEC1-SuSE/rolling-upgrade-sift-logs/config.RollingUpgrade_testRollingUpgrade/sift/controller.log
      for the exact times at which each Rolling Upgrade step was completed (by
      searching for the exact admin command that is use)

      1. presence-stats_instance10_24x1.txt
        554 kB
        Bhavanishankar
      2. presence-stats.txt
        304 kB
        Bhavanishankar
      3. sipp_screen_log_24x1.log
        7 kB
        Bhavanishankar
      4. subscribe_refresh.log.txt
        17 kB
        Bhavanishankar

        Activity

        Hide
        Bhavanishankar added a comment -

        Created an attachment (id=1090)
        attaching the presence-stats file of rolling upgrade 12 hour run

        Show
        Bhavanishankar added a comment - Created an attachment (id=1090) attaching the presence-stats file of rolling upgrade 12 hour run
        Hide
        varunrupela added a comment -

        The un-even distribution seems to continue to be a problem on one of the setups
        (8 core). Bhavani is looking into the root cause.

        Show
        varunrupela added a comment - The un-even distribution seems to continue to be a problem on one of the setups (8 core). Bhavani is looking into the root cause.
        Hide
        Bhavanishankar added a comment -

        With my previous fix, I had made sure that the RU worked well on 4-core setup,
        hence i had marked this as fixed.

        But later I realized that there was a thread blocking issue which is more
        frequently seen on 8-core setup (very rarely seen on 4-core setups), which was
        causing the uneven traffic, can't find matching, call backups, etc.

        I have filed & fixed the threading issue as part of issue 1943. Please refer it
        for the complete details.

        With 1943 fix, I verified that the RU works fine in both 4-core and 8-core
        setups. Hence, marking the issue as fixed.

        Show
        Bhavanishankar added a comment - With my previous fix, I had made sure that the RU worked well on 4-core setup, hence i had marked this as fixed. But later I realized that there was a thread blocking issue which is more frequently seen on 8-core setup (very rarely seen on 4-core setups), which was causing the uneven traffic, can't find matching, call backups, etc. I have filed & fixed the threading issue as part of issue 1943. Please refer it for the complete details. With 1943 fix, I verified that the RU works fine in both 4-core and 8-core setups. Hence, marking the issue as fixed.
        Hide
        Bhavanishankar added a comment -

        Created an attachment (id=1097)
        one of the sipp screen log of 24x1 RU verification run (there were totally 4 sipps).

        Show
        Bhavanishankar added a comment - Created an attachment (id=1097) one of the sipp screen log of 24x1 RU verification run (there were totally 4 sipps).
        Hide
        Bhavanishankar added a comment -

        Created an attachment (id=1098)
        24x1 traffic distribution in one of the rolled instance in 8-core 10 inst cluster (6 instances were rolled).

        Show
        Bhavanishankar added a comment - Created an attachment (id=1098) 24x1 traffic distribution in one of the rolled instance in 8-core 10 inst cluster (6 instances were rolled).

          People

          • Assignee:
            Bhavanishankar
            Reporter:
            varunrupela
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: