shoal
  1. shoal
  2. SHOAL-83

When group leader failed, any member couldn't receive FailureRecovery notification

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: 1.1
    • Component/s: GMS
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: Windows

    • Issuezilla Id:
      83

      Description

      When group leader failed, any member couldn't receive FailureRecovery
      notification.
      Of course, members added FailureRecoveryActionFactoryImpl and their callbacks
      to GMS.
      But if failure member was not group leader, other member received
      FailureRecovery notification successfully.

      Here are two logs.
      --------------------
      case 1) When failure member is group leader.

      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      MASTER_CHANGE_EVENT
      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
      2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      MASTER_CHANGE_EVENT
      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
      2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
      2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
      2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      IN_DOUBT_EVENT
      2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      addInDoubtMemberSignals
      ì •ë³´: gms.failureSuspectedEventReceived
      2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router
      notifyFailureSuspectedAction
      ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51-
      9b79-43e2-92dd-41899c907383...
      2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      MASTER_CHANGE_EVENT
      2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

      2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      FAILURE_EVENT
      2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      addFailureSignals
      ì •ë³´: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383

      case 2) When failure member is not group leader

      2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

      2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      MASTER_CHANGE_EVENT
      2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
      2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

      2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
      2008. 11. 12 오후 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
      2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

      2008. 11. 12 오후 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      IN_DOUBT_EVENT
      2008. 11. 12 오후 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      addInDoubtMemberSignals
      ì •ë³´: gms.failureSuspectedEventReceived
      2008. 11. 12 오후 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router
      notifyFailureSuspectedAction
      ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3-
      581c-4392-89cf-6a06d736c90f...
      2008. 11. 12 오후 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      getMemberTokens
      ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
      (before change analysis) are :
      1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
      urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

      2008. 11. 12 오후 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      newViewObserved
      ì •ë³´: Analyzing new membership snapshot received as part of event :
      FAILURE_EVENT
      2008. 11. 12 오후 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
      addFailureSignals
      ì •ë³´: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f
      2008. 11. 12 오후 9:42:19
      com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector
      setRecoverySelectionState
      ì •ë³´: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed
      member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
      2008. 11. 12 오후 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router
      notifyFailureRecoveryAction
      ì •ë³´: Sending FailureRecoveryNotification to component service
      --------------------

      In case1(abnormal case),
      group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master
      was selected) -> FAILURE_EVENT

      In case2(normal case),
      member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

      For receiving FailureRecovery notification, recovery target should be resolved.
      Selection algorithm for recovery target uses previous members' view.

      Assume that "A" and "B" are member in the same group and "A" is group leader.

      [case1: "B"'s view histroy]
      ... --> (A, B) --> A failed -> B became to be new master with master change
      event -> (B)[previous view] -> failure event -> (B)[current view]

      [case2: "A"'s view history]
      ... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]

      In other words,
      case1's previous view doesn't have "A"(failure member), so default algorithm
      (SimpleSelectionAlgorithm) can't find proper recovery target.
      case2's previous view has "B"(failure member), so default algorithm can
      select "A" for recovery target.
      (I assume that you already know SimpleSelectionAlgorithm)

      So I think that this issue has a concern in selection algorithm for recovery
      target.

      I think that thinking out another simple algorithm can be an example for
      resolving this issue.
      ex) always selecting first core member in live cache.

        Activity

        Hide
        shreedhar_ganapathy added a comment -

        ..

        Show
        shreedhar_ganapathy added a comment - ..
        Hide
        Joe Fialli added a comment -

        Shoal test scenario 14 verifies that the fix for this has been integrated.

        Show
        Joe Fialli added a comment - Shoal test scenario 14 verifies that the fix for this has been integrated.

          People

          • Assignee:
            Joe Fialli
            Reporter:
            carryel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: