shoal
  1. shoal
  2. SHOAL-107

MasterNode ensure delivery of GMS notifications over UDP

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: 1.1
    • Component/s: GMS
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      107

      Description

      GMS membership notifications such as JOIN, JOIN_AND_READY, FAILURE,
      PLANNED_SHUTDOWN, FAILURE_SUSPECTED, GroupLeadership are broadcasted from
      MasterNode over UDP. A protocol will be developed between masters and other
      members of group to ensure that this event is delivered and, if not, master will
      resend the event to ALIVE instances that have not acked receiving the notification.

      Currently, ensuring that the MasterNode is not a heavily loaded application,
      (such as Domain Application Server in Glassfish that does not run apps) and
      configuring via OS tuning of UDP buffers has ensured UDP messages are not dropped.
      Addressing this issue will provide robustness of event delivery w/o requiring OS
      tuning or partition of application load from Shoal GMS MasterNode.

        Activity

        Hide
        Joe Fialli added a comment -

        Fix checked in.

        The master sends along latest MasterViewID with every heartbeat message it
        broadcasts. The gms group members record each MasterChangeEvent MasterViewID it
        has received. When gms group member detects that it has not received a specific
        masterViewID, it requests the master resend to just itself (via more reliable TCP).

        Tested this with simulated failure injection.
        Wrote ReliableMulticast junit test.

        Show
        Joe Fialli added a comment - Fix checked in. The master sends along latest MasterViewID with every heartbeat message it broadcasts. The gms group members record each MasterChangeEvent MasterViewID it has received. When gms group member detects that it has not received a specific masterViewID, it requests the master resend to just itself (via more reliable TCP). Tested this with simulated failure injection. Wrote ReliableMulticast junit test.

          People

          • Assignee:
            Joe Fialli
            Reporter:
            Joe Fialli
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: