sailfin
  1. sailfin
  2. SAILFIN-1821

SSR drops some dialog fragment saves

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: 2.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      1,821

      Description

      I am testing a 3 instance cluster with this model:

      start cluster
      start 100 sip sessions, pause sipp
      kill instance 1
      let all 100 sip sessions refresh, pause sipp
      restart instance 1
      let all 100 sip sessions refresh, pause sipp
      kill instance 1

      At this point, I notice that the number of entries in the DF expat list is lower
      than those in the sas or ss expat list, and the reason always turns out to be
      that the replica caches on the surviving instances are not in sync: for example,
      instance 3 replica cache might have a SAS for 37-1809 but not have a DF for 37-1809.

      Here's what occurs: 37-1809 is begun on instance 2 and replicated to instance 1
      – i.e. the instance that fails (if it replicated to a surviving instance, we
      never have a problem with the replication). Then when 37-1809 is refreshed on
      instance 2, it will save the SAS – but not the DF – to instance 3 (since
      instance 1 is down).

      But this occurs only rarely; most of the time both the SAS and DF are replicated
      correctly to instance 3.

      It seems that the DF itself is not being asked to replicate; I see the
      ReplicationState for savesas and savedialogfragment commands being created for
      all the things that get replicated – i.e., there is no savedialogfragment being
      created for the request in question. So it's apparently not the case that the
      message is getting lost along the way; rather, it's just never getting created
      to be sent. [Plus, if it were an issue at a lower level, we'd see dropped
      messages for savesas or other things; we only ever see this for the DF.]

        Activity

        Hide
        sonymanuel added a comment -

        From the description this problem seems to be similar to a previous fixed issue
        1500.

        Show
        sonymanuel added a comment - From the description this problem seems to be similar to a previous fixed issue 1500.
        Hide
        Scott Oaks added a comment -

        When a refresh comes in, it updates the expiration, but it can happen (due to
        code in the servlet) that a refreshes that are 90 seconds apart can specify the
        same expiration (which is minute-based) – because of the way the servlet
        rounds. And of course, 90 seconds is what I was testing...

        Hence, the DF is not actually changing in this test scenario and doesn't need to
        be replicated, so it isn't – leading to the difference in the expat lists.
        Because the DF is available in the active cache, we have no actual errors.

        Because we replicate lazily, there is a chance that multiple failures will lead
        to an error if they occur in just the right order. That is also true for similar
        tests that use modified_attribute semantics. However, the accepted errors for
        the product include those possibilities.

        Show
        Scott Oaks added a comment - When a refresh comes in, it updates the expiration, but it can happen (due to code in the servlet) that a refreshes that are 90 seconds apart can specify the same expiration (which is minute-based) – because of the way the servlet rounds. And of course, 90 seconds is what I was testing... Hence, the DF is not actually changing in this test scenario and doesn't need to be replicated, so it isn't – leading to the difference in the expat lists. Because the DF is available in the active cache, we have no actual errors. Because we replicate lazily, there is a chance that multiple failures will lead to an error if they occur in just the right order. That is also true for similar tests that use modified_attribute semantics. However, the accepted errors for the product include those possibilities.

          People

          • Assignee:
            Scott Oaks
            Reporter:
            Scott Oaks
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: