shoal
  1. shoal
  2. SHOAL-81

Propagate Senders HM.Entry seqid in sent HealthMessage

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: 1.1
    • Component/s: GMS
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      81

      Description

      Requires changes in HealthMessage.initialize() and getDocument().

      HealthMessage.getDocument() should write the HealthMessage.Entry.sequenceId into
      its XML document representation.
      HealthMessage.initialize() should read the senders sequence id for health
      message.entry from XML document representation.

      Currently, the receiver side is just creating a sequence id based on
      order of receiving messages. Jxta messaging protocol does not guarantee
      that messages are received in precise order that they were sent, so the
      current sequencing mechanism could be resulting in out of order processing
      of health messages. This could result in incorrect computed cache state for an
      instance in the master node.

        Activity

        Joe Fialli created issue -
        Hide
        Joe Fialli added a comment -

        Created an attachment (id=12)
        server log summarizing out of order message processing

        Show
        Joe Fialli added a comment - Created an attachment (id=12) server log summarizing out of order message processing
        Hide
        Joe Fialli added a comment -

        https://shoal.dev.java.net/nonav/issues/showattachment.cgi/12/unexpectedfailure.log

        Following attachment summarizes a failure that occurs due to this defect.
        Messages are sent by instance in following order:
        aliveandready
        clusterstopping
        stopping

        The DAS (master node) receives the messages in the following order:
        stopping (receiving side seqid 960)
        clusterstopping (receiving side seqid 961)
        aliveandready (receiving side seqid 963)

        The DAS processes the message in following order:
        clusterstopping (961)
        stopping(960)
        aliveandready (963)

        The aliveandready message being processed last makes a stopped instance
        appear to come back to life as far as Master is concerned.
        It is then marked as INDOUBT by master and then verified FAILED.
        Must correct this ordering issue to fix this.

        Show
        Joe Fialli added a comment - https://shoal.dev.java.net/nonav/issues/showattachment.cgi/12/unexpectedfailure.log Following attachment summarizes a failure that occurs due to this defect. Messages are sent by instance in following order: aliveandready clusterstopping stopping The DAS (master node) receives the messages in the following order: stopping (receiving side seqid 960) clusterstopping (receiving side seqid 961) aliveandready (receiving side seqid 963) The DAS processes the message in following order: clusterstopping (961) stopping(960) aliveandready (963) The aliveandready message being processed last makes a stopped instance appear to come back to life as far as Master is concerned. It is then marked as INDOUBT by master and then verified FAILED. Must correct this ordering issue to fix this.
        Hide
        Joe Fialli added a comment -

        Fix delivered. Senders sequence id is now propagated.

        Also, use start time of member and sequence id to order messages between
        one invocation and a restart invocation of server instance.
        (Nodeagent can restart a failed instance quickly so this can happen)

        Show
        Joe Fialli added a comment - Fix delivered. Senders sequence id is now propagated. Also, use start time of member and sequence id to order messages between one invocation and a restart invocation of server instance. (Nodeagent can restart a failed instance quickly so this can happen)
        kenaiadmin made changes -
        Field Original Value New Value
        issue.field.bugzillaimportkey 81 22009

          People

          • Assignee:
            Joe Fialli
            Reporter:
            Joe Fialli
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: