sailfin
  1. sailfin
  2. SAILFIN-1927

[blocking] Disable/Enable of an instance causing uneven traffic distribution

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: b30
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      ******************************************************************************************************

      • Template v0.1 ( 05/01/08 )
      • Sailfin Stress test issue
        ******************************************************************************************************
        Sailfin Build : 28
        Cluster size : 9
        Happens in a single instance (y/n) ? : NA
        Test id : st2_4_presence_subscribe-refresh-failure
        Location of the test : as-telco-sqe/stress-ws/presence
        JDK version : 1.6.0_16
        CLB used : Yes
        HW LB used : Yes.
        SSR: Enabled

      A 9 instance cluster was used in this test. 4 instances were used as FE/BE
      (instance103, instance104, instance105 and instance109), the other 5 as BE
      alone. 4 SIPp instances were started with the following command to get an
      effective call rate of 140cps and load 150k sessions per instance:

      sipp -t t1 -sf st2_4_presence_subscribe-refresh-failure.xml -r 315 -l 338000 -d
      1073000 -nd -trace_err -trace_screen -trace_logs -buff_size 33554433
      -reconnect_close false -max_reconnect 10 -reconnect_sleep 3000
      <instance-host>:35060

      Rolling Upgrade steps were performed on instance101.
      http://wiki.glassfish.java.net/attach/SFv2FunctionalSpecs/rolling_upgrade_one_pager_ver2.html

      Logs for the run are available at:
      sf-x2200-11:/space/sony/logs/2.0/b28/subscribe-refresh-RU/IEC1-SuSE/

      Issue:
      On running disable-converged-lb-server instance101 (this was done at time -
      11:42:51):
      a. All other instances saw a drop in traffic (See the presence-stats.txt file
      under the instance logs)
      b. instance101 still shows up as a healthy instance in the server logs of other
      instances.

      On running enable-converged-lb-server instance101 (done at time 11:54:28)
      traffic distribution was quite un-even for an extended period of time. To some
      instances traffic re-started only after the reconcile step was completed.

      This caused SIPp to backup calls and send them later in larger spurts causing
      "Cant find matching transaction" errors in the logs and some JXTA errors.

      See file
      sf-x2200-11:/space/sony/logs/2.0/b28/subscribe-refresh-RU/IEC1-SuSE/rolling-upgrade-sift-logs/config.RollingUpgrade_testRollingUpgrade/sift/controller.log
      for the exact times at which each Rolling Upgrade step was completed (by
      searching for the exact admin command that is use)

      1. presence-stats_instance10_24x1.txt
        554 kB
        Bhavanishankar
      2. presence-stats.txt
        304 kB
        Bhavanishankar
      3. sipp_screen_log_24x1.log
        7 kB
        Bhavanishankar
      4. subscribe_refresh.log.txt
        17 kB
        Bhavanishankar

        Activity

          People

          • Assignee:
            Bhavanishankar
            Reporter:
            varunrupela
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: