glassfish
  1. glassfish
  2. GLASSFISH-15252

incorrect request to resend a GMS broadcast notification when an instance transitions from being master to not being master

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1_ms06
    • Fix Version/s: 3.1
    • Labels:
      None

      Description

      Failure impacts GMS QE tests that stop or kill the DAS. These are scenarios 8, 10 and 11.
      The issue occurs in all runs. (it is not intermittent)

      Attached the complete server log, but the following log events capture what the issue is.
      These happen in every run of scenarios 8, 10 and 11.

      The following messages occur at the end of the test when the DAS is restarted so "asadmin stop-cluster" can be run.
      When "stop-cluster" is run, the DAS takes over GroupLeadership again(otherwise there is
      cascading group leadership during the entire shutdown process). The instances that was the master was
      incorrectly requesting resends of messages that it had sent out to the group when it was the master.

      Here are the log events that capture requesting the resend of the missed events.

      [#|2010-12-15T01:24:48.382-0800|INFO|glassfish3.1|ShoalLogger|_ThreadID=16;_ThreadName=Thread-1;|GMS1093: adding GroupLeadershipNotification signal leadermember: server of group: clusterz1|#]

      [#|2010-12-15T01:24:48.384-0800|INFO|glassfish3.1|ShoalLogger|_ThreadID=16;_ThreadName=Thread-1;|GMS1057: Announcing Master Node designation for member: server of group: clusterz1. Local view contains 10 entries|#]

      [#|2010-12-15T01:24:49.217-0800|INFO|glassfish3.1|ShoalLogger.mcast|_ThreadID=16;_ThreadName=Thread-1;|GMS1112: unable to find message to resend broadcast event with masterViewId: 21 to member: n1c1m1 of group: clusterz1|#]

      [#|2010-12-15T01:24:49.218-0800|INFO|glassfish3.1|ShoalLogger.mcast|_ThreadID=16;_ThreadName=Thread-1;|GMS1112: unable to find message to resend broadcast event with masterViewId: 22 to member: n1c1m1 of group: clusterz1|#]

      [#|2010-12-15T01:24:49.218-0800|INFO|glassfish3.1|ShoalLogger.mcast|_ThreadID=16;_ThreadName=Thread-1;|GMS1112: unable to find message to resend broadcast event with masterViewId: 23 to member: n1c1m1 of group: clusterz1|#]

      [#|2010-12-15T01:24:49.219-0800|INFO|glassfish3.1|ShoalLogger.mcast|_ThreadID=16;_ThreadName=Thread-1;|GMS1112: unable to find message to resend broadcast event with masterViewId: 24 to member: n1c1m1 of group: clusterz1|#]

      [#|2010-12-15T01:24:49.231-0800|INFO|glassfish3.1|ShoalLogger.mcast|_ThreadID=16;_ThreadName=Thread-1;|GMS1111: resend broadcast event with masterViewId: 25 to member: n1c1m1 of group: clusterz1 resends=1 broadcast seq id:25 viewChangeEvent:MASTER_CHANGE_EVENT member:server peerId:10.133.184.226:9116:228.9.32.97:5229:clusterz1:server|#]

      [#|2010-12-15T01:24:50.394-0800|INFO|glassfish3.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=16;_ThreadName=Thread-1;|Stopping cluster clusterz1|#]

      Fix is already known. It is quite minor.

        Activity

        Hide
        Joe Fialli added a comment -

        How bad is its impact? (Severity

        The impact of this issue is unneccessary network traffic
        when the DAS was not the Master and stop-cluster is called.
        Then there is 3 to 4 log events that an instance requested
        a resend of missed MasterChangeEvents that were not really missed.

        *******
        How often does it happen? Will many users see this problem? (Frequency)

        For this issue, it is happening at stop-cluster time.
        Based on the fix that is already known, there is a potential for
        issues whenever the GroupLeader of the cluster changes due
        to current one being stopped or killed.

        ******

        How much effort is required to fix it? (Cost)
        Fix is already done. No further cost.

        ******

        What is the risk of fixing it and how will the risk be mitigated? (Risk)
        It is riskier to not fix this issue since the fix is so straight forward.
        There was an obvious bug in the code (no idea why) but boolean return value was
        obviously incorrect.

        Show
        Joe Fialli added a comment - How bad is its impact? (Severity The impact of this issue is unneccessary network traffic when the DAS was not the Master and stop-cluster is called. Then there is 3 to 4 log events that an instance requested a resend of missed MasterChangeEvents that were not really missed. ******* How often does it happen? Will many users see this problem? (Frequency) For this issue, it is happening at stop-cluster time. Based on the fix that is already known, there is a potential for issues whenever the GroupLeader of the cluster changes due to current one being stopped or killed. ****** How much effort is required to fix it? (Cost) Fix is already done. No further cost. ****** What is the risk of fixing it and how will the risk be mitigated? (Risk) It is riskier to not fix this issue since the fix is so straight forward. There was an obvious bug in the code (no idea why) but boolean return value was obviously incorrect.
        Hide
        Chris Kasso added a comment -

        Approved for 3.1

        Show
        Chris Kasso added a comment - Approved for 3.1
        Hide
        Joe Fialli added a comment -

        was fixed. overlooked closing this. verified that reported message is no longer being generated in any
        Shoal Glassfish QE test runs.

        Show
        Joe Fialli added a comment - was fixed. overlooked closing this. verified that reported message is no longer being generated in any Shoal Glassfish QE test runs.

          People

          • Assignee:
            Joe Fialli
            Reporter:
            Joe Fialli
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: