[SHOAL-112] ability to configure GMS member to use SSL Created: 12/Nov/10  Updated: 31/Oct/12

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 112

 Description   

Provide a GMS property that enables one to configure a GMS member to use SSL for
its TCP communications. Both supported transports, grizzly and jxta, have the
ability to enable SSL for point to point communication.






[SHOAL-118] regression in cluster starting time. Created: 07/Jan/12  Updated: 09/Jan/12  Resolved: 09/Jan/12

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: None
Fix Version/s: current

Type: Bug Priority: Major
Reporter: zorro Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux



 Description   

build 16.

Cluster startup takes longer than expected intermittently.

Results:
http://aras2.us.oracle.com:8080/logs/gf31/gms//set_01_05_12_t_17_12_28/scenario_0002_Thu_Jan__5_17_21_17_PST_2012.html



 Comments   
Comment by Joe Fialli [ 09/Jan/12 ]

Fix committed in svn 1736.

Comment by Joe Fialli [ 09/Jan/12 ]

patch run of Glassfish Shoal GMS SQE test confirmed this fix in shoal-gms libraries.





[SHOAL-117] Support multiple shoal instances in a single JVM Created: 25/Nov/11  Updated: 25/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Arul Dhesiaseelan Assignee: shreedhar_ganapathy
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We have a requirement to run multiple Shoal instances per JVM. We believe Shoal does not support this as the GMSContext is assigned per group, not per server. We have implemented this to support GMSContext per server in the same group allowing multiple contexts coexist in the same JVM. We would be happy to contribute this patch to the Shoal project. Would anyone be interested in this patch?






[SHOAL-89] Improved concurrency for sendMessage Created: 18/Jun/09  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 89

 Description   

RFE related to fix for shoal issue 88 to change that synchronization
solution to a more performant pool of OutputPipes (one pipe to be used by one
thread at any point in time).



 Comments   
Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.

Comment by Joe Fialli [ 09/Nov/11 ]

there are trade offs for concurrent sendMessage when relying on NIO as the ultimate transport.
so this RFE was considered and postponed due to these tradeoffs.

The concurrent processing resulted in not being able to share the same deserialized output stream with
all send messages. Thus, there is also a space usage and/or multiple desserializations necessary for
each concurrent send.

With regular multicast, only one deserialization of the message to be sent was occurring.
With current implementation, there still is only one.

With concurrent send, there were not so obvious tradeoffs.
So this RFE is on hold for now while sorting through the tradeoffs.





[SHOAL-113] Dangling Threads Prevent Graceful JVM Shutdown Created: 15/Dec/10  Updated: 09/Nov/11  Resolved: 09/Nov/11

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: 1.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: erich_liebmann Assignee: Joe Fialli
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We experience problems to gracefully shutdown the JVM because Shoal does not terminate all non-deamon threads upon shutdown.

We initiate Shoal shutdown via the following API call but this does not terminate the GMSContext "viewWindowThread" and "messageWindowThread" and the Router "signalHandlerThread" (non-daemon) threads.

haGroupManagementService.shutdown(shutdownType.INSTANCE_SHUTDOWN);

To work around this problem we had to implement the following hack:

haGroupManagementService.shutdown(shutdownType.INSTANCE_SHUTDOWN);
DirectFieldAccessor gmsContextDirectFieldAccessor = new DirectFieldAccessor(gmsContext);
gmsContextDirectFieldAccessor.setPropertyValue("shuttingDown", true);
((Thread)gmsContextDirectFieldAccessor.getPropertyValue("viewWindowThread")).interrupt();
((Thread)gmsContextDirectFieldAccessor.getPropertyValue("messageWindowThread")).interrupt();

Router router = gmsContext.getRouter();
DirectFieldAccessor routerDirectFieldAccessor = new DirectFieldAccessor(router);
((Thread)routerDirectFieldAccessor.getPropertyValue("signalHandlerThread")).interrupt();

Kindly fix this shutdown issue. Please let me know should you require a proper source code patch for this.



 Comments   
Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.

Comment by Joe Fialli [ 09/Nov/11 ]

this shoal gms issue would prevent glassfish v2 and v3.1 and higher application server from exiting.
It has been fixed in all versions of shoal gms.





[SHOAL-76] DSC logging performance improvements Created: 13/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Task Priority: Trivial
Reporter: mbien Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File DSC.patch    
Issuezilla Id: 76

 Description   

wrapped logging in potential hot or concurrent code paths into
if(is loglevel loggable){
log(...);
}
to prevent unnecessary synchronization and logging overhead.



 Comments   
Comment by mbien [ 13/Sep/08 ]

Created an attachment (id=9)
diff patch

Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.





[SHOAL-80] Accessing system property in a rt.jar specific way Created: 23/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: okrische Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 80

 Description   

Watch out in line 99 of com.sun.enterprise.jxtamgmt.NiceLogFormatter:

@SuppressWarnings("unchecked")
private static final String LINE_SEPARATOR =
(String) java.security.AccessController.doPrivileged(
new sun.security.action.GetPropertyAction("line.separator"));

Why not just using:

  • System.getProperty("line.separator")

instead?

The code above is shown as error in eclipse. Probably it just does not like,
that we use code directly on the rt.jar and not from the public API.



 Comments   
Comment by Joe Fialli [ 27/Oct/08 ]

Does not impact the running system, only compile time.

Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.





[SHOAL-116] rejoin subevent is null in JoinedAndReadyNotificationSignal Created: 28/Feb/11  Updated: 20/Apr/11  Resolved: 20/Apr/11

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-16109 get-health command not showing rejoins Resolved

 Description   

Found this while recreating a rejoin for blogging. The rejoin happens and shows up in the log, but the rejoin event from JoinedAndReadyNotificationSignal.getRejoinSubevent() is null, which affects the output of the GlassFish 'asadmin get-health' command.

Joe already has a fix for this that I'll test soon to commit to the trunk.



 Comments   
Comment by Joe Fialli [ 01/Mar/11 ]

committed fix in trunk.

Comment by Bobby Bissett [ 02/Mar/11 ]

Verified in trunk, rev 1543. Will mark the GF issue fixed when we integrate next.

Comment by Bobby Bissett [ 20/Apr/11 ]

Just opening so I can mark fixed with a specific version.





[SHOAL-114] validate-multicast tool should handle unexpected data in receiver thread Created: 01/Feb/11  Updated: 03/Mar/11  Resolved: 03/Mar/11

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: 1.1
Fix Version/s: current

Type: Improvement Priority: Minor
Reporter: Bobby Bissett Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If the validate multicast tool is run at the same time as another process (e.g. GlassFish DAS) using the same port, it will choke on the data received since the format is unexpected. The code in MulticastTester#trimDataString should check the format of what it's parsing first. If it's not something expected, then a warning message should be logged that the tool received unexpected information.



 Comments   
Comment by Bobby Bissett [ 03/Mar/11 ]

Fixed in revision 1544.





[SHOAL-115] MultiCastReceiverThread is not clearing buffer in DatagramPacket Created: 28/Feb/11  Updated: 03/Mar/11  Resolved: 03/Mar/11

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: current

Type: Bug Priority: Minor
Reporter: Bobby Bissett Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-16108 validate-multicast tool can give dupl... Resolved

 Description   

In the thread's run method, the same byte buffer is being used for each DatagramPacket created. So the data returned from receive method can contain extra text, which throws of the set of host strings. So hosts will show up more than once in the output of the validate multicast tool.

Simple fix is just to create a new byte array for each packet (or clear the old one). Without this fix, the tool still gives accurate results – it can just include extra copies of previous entries. I have a GF issue filed for integration, and this one for the actual fix.



 Comments   
Comment by Bobby Bissett [ 03/Mar/11 ]

Fixed in revision 1544.





[SHOAL-50] Expose MASTER_CHANGE_EVENT Created: 14/Apr/08  Updated: 25/Nov/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: krijnschaap Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Archive File group_leader_notification_patch.jar     Java Archive File group_leader_notification_patch.jar    
Issue Links:
Dependency
depends on SHOAL-61 when members join the group concurren... Open
Issuezilla Id: 50

 Description   

Create a signal whenever the master for the cluster has changed. Either add a
promoted to master/demoted as master to the involved nodes, or a global signal
that just notifies all nodes of the new master, the second probably being the
most flexible solution.



 Comments   
Comment by shreedhar_ganapathy [ 24/Jun/08 ]

Reassigning the enhancement request to community member Bongjae

Comment by carryel [ 14/May/09 ]

This issue could be dependent on issue #61 because issue #61 could change
MASTER_CHANGE_EVENT's notification timiming.

But, anyway, I made the patch by which a user could be aware of
MASTER_CHANGE_EVENT.

A user can receive GroupLeaderShipNotificationSignal when MASTER_CHANGE_EVENT
is occurred by adding GroupLeaderShipNotificationActionFactory to gms like this.


gms.addActionFactory( new GroupLeaderShipNotificationActionFactoryImpl(
callback ) );


I will attach the proposed patch, diff and test app.

Comment by carryel [ 14/May/09 ]

Created an attachment (id=16)
attched the patch and diff

Comment by carryel [ 17/May/09 ]

I received feedbacks from Joe and Shreedhar.

So I modified the proposed patch more.

Here are changes.

  • GroupLeaderShip ===> GroupLeadership
  • all member types as well as CORE can receive the group leadership event.
  • the log message is modified for debugging when a group leadership event is
    fired.
  • GroupLeadershipNotificationSignal has the following API.
    /**
  • provides a list of the previous view's snapshot at time signal arrives.
    *
  • @return List containing the list of <code>GMSMember</code>s which are
    corresponding to the view
    */
    List<GMSMember> getPreviousView();

/**

  • provides a list of the current view's snapshot at time signal arrives.
    *
  • @return List containing the list of <code>GMSMember</code>s which are
    corresponding to the view
    */
    List<GMSMember> getCurrentView();

/**

  • provides a list of all live and current CORE designated members.
    *
  • @return List containing the list of member token ids of core members
    */
    List<String> getCurrentCoreMembers();

/**

  • provides a list of all live members i.e. CORE and SPECTATOR members.
    *
  • @return List containing the list of member token ids of all members
    */
    List<String> getAllCurrentMembers();
Comment by carryel [ 17/May/09 ]

Created an attachment (id=17)
I attached the proposed patch and diff again

Comment by carryel [ 19/May/09 ]

I received feedback from Joe.

Joe wrote:
>A WATCHDOG is not allowed to be a GroupLeader and at this time I don't
envision it needing to know the master or of master change event.
>The WATCHDOG currently broacdcasts failure notification to all members of the
cluster so it does not assume it knows which instance is the Master.
>So lets not send groupleadership events to WATCHDOG

So recent patch should be modified more like this.

In ViewWindow.java

private void analyzeMasterChangeView(final EventPacket packet)

{ ... if( !getGMSContext().isWatchdog() ) addGroupLeadershipNotificationSignal( token, member.getGroupName(), member.getStartTime() ); }

Thanks to Joe.

Comment by Joe Fialli [ 18/Jun/09 ]

Proposed changes are being regression tested by Shoal QA test.

Comment by Joe Fialli [ 05/Feb/10 ]

Fix has been integrated in past.





[SHOAL-61] when members join the group concurrently, join notifications of some members are often duplicated or missed Created: 10/Jun/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File shoal_issue61_2009_06_11.txt     Java Source File SimpleJoinTest.java    
Issue Links:
Dependency
blocks SHOAL-50 Expose MASTER_CHANGE_EVENT Resolved
Issuezilla Id: 61
Status Whiteboard:

shoal-shark-na


 Description   

This issue is similar to issue #60.
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)

Members joined the group according to the order in the issue #60,
but on the other hand members joined the group concurrently in this issue.

If all members concurrently join the group first, members don't know who is
group leader and should negotiate the leader. At this case, notifications of
some memebers are often duplicated or missed.

Here is the log of duplicated case. Assume that "A" and "B" are "TestGroup"'s
members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049
group:TestGroup
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
------------------------------------------------------------------------
"A" received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-3fc8b6854049).

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: e9d80499-0f8b-4e2d-8856-3f31dcc25f96
group:TestGroup
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103

2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:04
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:12
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:15
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96

------------------------------------------------------------------------
"B" also received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049).
And because "B" is group leader, "B" don't receive own join notification.

Here is the another log of missed case. Assume that "A" ,"B" and "C"
are "TestGroup"'s members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 197c66d7-f56c-4119-8b1e-18dc330e39d3
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["C"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 468996ee-2d54-4c58-af46-72d903154e31
group:TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------
All members missed some join notification.

Whenever you try to test join concurrently, duplicated or missed results can be
changed.

Anyway, in concurrent join case, all members should receive join notifications
with one another and join notifications should not be duplicated if all memeber
has good health.



 Comments   
Comment by carryel [ 10/Jun/08 ]

Created an attachment (id=8)
I attached a simple test code

Comment by carryel [ 29/Jun/08 ]

1. Testing scenarios
Testing scenarios are simple. Shoal(with Jxta) don't support multiple members
becoming part of the same group from the same JVM.
So, each member should join the group with separate process(JVM).
You can test this manually with executing "SimpleJoinTest" I attached ago.
Whenever you execute "SimpleJoinTest", new member(node) can join
the "TestGroup".

I tested this with creating multiple "SimpleJoinTest"s. You maybe need 3 or
4 "SimpleJoinTest"'s processes.
a) In the beginning, there is no member and no group.
b) I executed multiple "SimpleJoinTest"s in the separate process(JVM)
concurrently.
c) I saw each log. I observed "Signal.getMemberToken()" particularly.
ex) "****JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96"

Strictly speaking, we can't execute multiple processes simultaneously. But
because each member has the discovery timeout, this is an acceptable error
range.
In other words, if you execute "SimpleJoinTest" when other "SimpleJoinTest"s
are waiting for the discovery timeout, you can reproduce strange results.

2. How does the new code behave during the discovery phase?
Assume "A", "B" and "C" will become members of the group.
In my scenario, "A", "B" and "C" will wait for the discovery timeout because
there is no master in the group.
Before they enter discovery phase, they first set the master advertisement as
own advertisement. But masterAssigned is mostly false at this time.
Mostly masterAssigned can be set as true by the following method:

  • In MasterNode.appointMasterNode()
  • In MasterNode.processMasterNodeResponse()
  • In MasterNode.processMasterNodeAnnouncement()

a) In MasterNode.appointMasterNode()
This case is that master is not assigned after the discovery timeout. ex) there
is no master in the group.
Then we use the discovery view which has other members if other members sent
any messages to me in order to put up a cadidate as master.
Of course, because the discovery view always has own advertisement, own
advertisement can become the candidate.

a-1) when own advertisement becomes the master
First, if the cadidate is own advertisement and the discovery view has other
members, clusterViewManager.setMaster() is called with discovery view's
snapshot.
Original code call clusterViewManager.setMaster() with only own view snapshot.
But because the master was already determined as own advertisement, I think
that calling clusterViewManager.setMaster() with discovery view's snapshot is
better than with only own view's snapshot.
Of course, Calling clusterViewManager.setMaster() without discovery view's
snapshot has no problem because when other members receive
processMasterNodeAnnouncement() by master's announceMaster(), they can call
sendSelfNodeAdvertisement(). But if discovery view has them and setMaster() is
called with discovery view, sendSelfNodeAdvertisement() is unnecessary at this
case because master view already has them. So they can set the master directly
without sendSelfNodeAdvertisement().

And about calling announceMaster(),
-----------------------------------------------------
[original appointMasterNode() in MasterNode.java]
...
if (madv.getID().equals(localNodeID)) {
...
if(clusterViewManager.getViewSize() > 1)

{ announceMaster(manager.getSystemAdvertisement()); }

...
}
-----------------------------------------------------

It can be edited this as the following code
-----------------------------------------------------
if (madv.getID().equals(localNodeID)) {
...
//if(clusterViewManager.getViewSize() > 1)

{ announceMaster(manager.getSystemAdvertisement()); //}

...
}
-----------------------------------------------------
In other words, if own advertisement becomes the master, announceMaster() is
always called. When I am debuging this, though one more member joined the
group, sometimes clusterViewManager.getViewSize() could be equal to 1 in a
short time. So I think that for safety it is better that it should be edited.
Though announceMaster() is called when clusterViewManager.getViewSize() is
equal to 1, it is no problem because we don't receive own message.

a-2) when other member's advertisement becomes the master
Original code always set the master without notification. Then sometimes
master's view can't be updated. see the following code.
-----------------------------------------------------
[appointMasterNode() method in MasterNode.java]
...
clusterViewManager.setMaster(madv, false);
...
-----------------------------------------------------

-----------------------------------------------------
[setMaster(advertisement, notify) method in ClusterViewManager.java]

if ( !advertisement.equals(masterAdvertisement))

{ ... // notify }

-----------------------------------------------------
As you see, if current member already set the master, notify is not called.
If first we already called setMaster(advertisement, false) in
MasterNode.appointMasterNode(), when master sends new view to me later and I
receive the view through processMasterNodeAnnouncement() or
processMasterNodeResponse(), notifying new view is not called, though setMaster
() can be called with new view because current masterAdvertisement is already
same to master's advertisement.
So I think it should be also edited. If cadidate is other member, I don't call
setMaster(advertisement, false). Though we don't set the master now, we can
receive the master change event through processMasterNodeAnnouncement() or
processMasterNodeResponse() later.

b) In MasterNode.processMasterNodeResponse():
MASTER_CHANGE_EVENT is notified with master view's snapshot by Issue #60
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)
Additional patch is that when sendSelfNodeAdvertisement() is called,
MASTER_CHANGE_EVENT also is notified with master view's snapshot.

c) In MasterNode.processMasterNodeAnnouncement():
This is very similar to above b). Like b) It should be edited.

So now I want to describe how the new code behaves during the discovery phase.
Actually, new code behaves like old code's original purpose. There is no big
changes.

1) If "A", "B" and "C" joined the group concurrently and when all member are
waiting for the discovery timeout.

1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master.
So all members call announceMaster(). Then All members receive master's
announcements and become aware of master's collision through checkMaster().
Master's collision can be resolved by ID. When the member affirms master node
role or resign master node role, the member notify MASTER_CHANGE_EVENT.
Though original code didn't notify MASTER_CHANGE_EVENT when the member affirms
master node role, I think that it should be edited.
Above a-1) though the member already called setMaster() and notified
MASTER_CHANGE_EVENT and master was not changed, we should notify
MASTER_CHANGE_EVENT because master's view already was changed by collision. If
we don't notify the event, we can't become aware of view changes quickly in
the collision case. Of course, if another event will be occurred later, this
member(master) can become aware of view changes. But I think view changes
should be applied as soon as possible.

1-2) If all members receive each other member's message and discovery view has
all members, candidate is selected from discovery view by TreeMap's ordering
sort.
If all members select the same cadidate, the cadidate member will send master
announcement. other members will process processMasterNodeAnnouncement() and
set the master with current master's view snapshot.

If some members receive each other member's message and some members don't
receive, 1-1) and 1-2) are mixing.

2) If some nodes joined the group late
If some members join the group and there is already master,
new members will send master node query to all members and the master node will
process processMasterNodeQuery(). Then the master node will send master
response with master view's snapshot and new members will process
processMasterNodeResponse() and set the master with current master's view.

3. How the code behaves when a node is shutdown and :
I think my semantics don't have an effect on shutdown algorithm. I know
shutdown and failure case are connected with HealthMonitor.
But I think that some logic about startup in HealthMonitor should be edited.
When node is starting and HealthMonitor is started, MasterNode.probeNode() can
be called by HealthMonitor.
In "1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master"'s case and
master collision case,
if MasterNode.probeNode() is called by HealthMonitor, processMasterNodeResponse
() can be processed. Because processMasterNodeResponse() doesn't assume
collision case, sometime unexpected results can be occurred in the master
selection algorithm.
So I think that health monitor should only start after master discovery was
finished.
So this change don't have an effect on shutdown.

When a node which is not mater restarted before it is determined to be failed,
master's view is same. So members which already joined the group don't receive
any changes.
The node which restarted receives all members' join notifications by master's
response.
When a node which is not master restared after it has been ejected from the
cluster, master's view is changed. So members which already joined the group
only receive failed node's join notification because master already removed the
node from master's view . The node which restarted receives all members' join
notifications by master's response.

When a node which is master restarted before it is determined to be failed, the
node which was master sends discovery messages to all members and waits for the
discovery timeout.
Maybe because other members are not master, the master node don't receive any
messages. So the master node sends master announcement included only own
advertisement to members. Then because members know that master's view only has
master advertisement, members call sendSelfNodeAdvertisement(). Then master can
become aware of existing members through processNodeResponse(). The master node
can receive join notifications of all members. Other members don't receive any
changes because they first call sendSelfNodeAdvertisement() and return before
setMaster().

When a node which is master restarted after it has been ejected from the
cluster, members already elected new master. When master was failed and new
master was elected, because members' view had no additional member, members
don't receive any join events. But when a node which was master restarted, the
node sends master discovery message and receives new master's response. So the
node receives all existing members' join notifications from new master and
other members receives only failed member's join notification.

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

assigning to self

Comment by Joe Fialli [ 06/Feb/09 ]

Reviewing carryel's submitted fix for this issue.
Have already checked in submitted test case and it can be run
via "ant simplejointest".

Comment by carryel [ 22/Jun/09 ]

Created an attachment (id=18)
I attached the proposed patch for history





[SHOAL-55] more reliable failure notification Created: 09/May/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issue Links:
Dependency
blocks SHOAL-58 One instance was killed, other instan... Resolved
Issuezilla Id: 55
Status Whiteboard:

shoal-shark-na


 Description   

Instance A is either going down or under load. So Instance B starts to retry its
connection to instance A. Before instance B can deem instance A as dead or
alive, there needs to be an intermediate state called "in_retry_mode" that can
help the GMS clients.
For e.g. CLB can make use of this state to ping instance A again after a little
while.
In memory rep code can also make use of this intermediate state to determine
that instance A is in "in_retry_mode" and then if the pipecloseevent has
occurred, then a new pipe can be created if instance A is now alive.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 27/Aug/08 ]

2 cases to address:

1. false positives occurring when miss 3 heartbeats from an instance that
is in middle of full GC. (full GC can take 12 to 15 seconds).
Other instances in cluster receive incorrectly receive FAILURE_NOTIFICATION
and instance is still running once full gc completes.

2. nodeagent detects a failed instance and restarts before shoal can detect
the instance has failed and notify others in cluster. Happens on faster,
newer machines.

Comment by sheetalv [ 27/Aug/08 ]
      • Issue 58 has been marked as a duplicate of this issue. ***
Comment by sheetalv [ 27/Oct/08 ]

too big of an architecture change for Sailfin 1.5. NA for Sailfin 1.5.

Comment by sheetalv [ 31/Jul/09 ]

WatchDog notification implementation has been added to Shoal. This takes care of
case 2 (DAS restart) of what Joe has mentioned above.





[SHOAL-58] One instance was killed, other instances were not properly notified about that event. Created: 13/May/08  Updated: 25/Nov/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: easarina Assignee: sheetalv
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File server.log.051208_13_08.asqe-sblade-5.n2c1m5     Text File server.log.051208_13_08.asqe-sblade-5.n2c1m5_short    
Issue Links:
Dependency
depends on SHOAL-55 more reliable failure notification Open
Issuezilla Id: 58
Status Whiteboard:

shoal-shark-na


 Description   

Sailfin 1.0 b32. SuSE and Solaris 10 Sparc machines. On SuSE I had a cluster
with three instances, on Solaris I had a cluster with 10 instances. In both
cases I've tried to kill one instance under such conditions:
1) Without sip traffic.
2) With "low" sip traffic.
3) With "big" sip traffic.

Not always, but in many cases, most cases were under conditions "no traffic"
and especially "big traffic", when an instance was killed I did not see in the
log files of other instances: FAILURE_EVENT, Assigned recovery server messages
and in many cases IN_DOUBDT_EVENT. But I always saw ADD_EVENT and
JOINED_AND_READY_EVENT.

When the events were absent I saw in the logs such warnings:

[#|2008-05-07T14:31:05.324-0700|WARNING|sun-comms-appserver1.0|ShoalLogger|_ThreadID=24;_ThreadName=pool-2-thread-19;_RequestID=0d9efdac-4f92-4e0c-bb87-6107fc42ed1e;|
Could not send the LWR Multicast message to get the member state of
urn:jxta:uuid-C9C9584023FA421AA3F0A79F128543642168480919FC4885BAA1EF5F3ED12B2D03
IOException : Unable to create a messenger to
jxta://uuid-C9C9584023FA421AA3F0A79F128543642168480919FC4885BAA1EF5F3ED12B2D03/PipeService/urn:jxta:uuid-C9C9584023FA421AA3F0A79F128543647F57B1A44FDB469D9F94BE4B0722C28904|#]

I've turned the ShoalLogger to FINEST and collected logs.

In the logs I've saw that, for example, for 10-instances cluster, while one
instance was killed, the "true View Size" always was 11. It looks, that other
instances were not properly notified that one instance was killed. I've
attached one log. In two formats: original log and the same log with shorter
lines, to make the log more readable. In that case was killed n1c1m4 instance.



 Comments   
Comment by easarina [ 13/May/08 ]

Created an attachment (id=5)
server.log

Comment by easarina [ 13/May/08 ]

Created an attachment (id=6)
short server.log

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 ]

This issue is similar to issue 55.

      • This issue has been marked as a duplicate of 55 ***




[SHOAL-111] capability to configure requirement for authentication for GMS member to join group Created: 12/Nov/10  Updated: 12/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 111

 Description   

Leverage certificate based authentication (JAAS) to validate whether a GMS
member should be allowed to join a GMS group.



 Comments   
Comment by Joe Fialli [ 12/Nov/10 ]

adjustment to subject title to state that there needs to be a configuration
capability to require authentication for GMS join





[SHOAL-78] add isLoggable around logging that is lower than warning Created: 19/Sep/08  Updated: 12/Nov/10  Resolved: 12/Nov/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 78

 Description   

Performance optimization:

logging messages that concatenate strings together as parameters waste processing
time. All logging messages that concatenate strings together and logging level
is less than WARNING need to follow this pattern.

if (logger.isLoggable(Level.XXX) )

{ log.XXX("..." + "...." + ...); }

where XXX is logging level less than WARNING.



 Comments   
Comment by Joe Fialli [ 19/Sep/08 ]

take ownership of task

Comment by Joe Fialli [ 19/Sep/08 ]

Accept task. Given that it is not high priority, it will be placed in queue.
Should be completed before Sailfin 1.5 ships.

Comment by Joe Fialli [ 01/Oct/08 ]

Partially fixed for distributed state cache logging messages.

Comment by Joe Fialli [ 12/Nov/10 ]

fixes checked in





[SHOAL-108] setting Bind Interface Address in grizzly transport Created: 16/Jul/10  Updated: 27/Oct/10  Resolved: 27/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 108

 Description   

GMS is not detecting an invalid GMS_BIND_INTERFACE_ADDRESS which results in a No
available ports exist" message.
need to fix this with a SEVERE failure of bad BIND_INTERFACE_ADDRESS

The log messages generated are:
#|2010-07-16T09:02:32.689-0700|SEVERE|Shoal|ShoalLogger|_ThreadID=10;_ThreadName=main;ClassName=NetworkUtility;MethodName=getAvailableTCPPort;|Fatal
error. No available ports exist for 10.5.217.120 in range 9090 to 9120|#]

[#|2010-07-16T09:02:32.691-0700|SEVERE|Shoal|GMSAdminCLI|_ThreadID=10;_ThreadName=main;ClassName=GMSAdminCLI;MethodName=registerAndJoinCluster;|Exception
occured :com.sun.enterprise.ee.cms.core.GMSException: failed to join group
testgroup|#]



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]
      • Issue 105 has been marked as a duplicate of this issue. ***
Comment by Bobby Bissett [ 27/Oct/10 ]

Taking this one.

Comment by Bobby Bissett [ 27/Oct/10 ]

This is fixed Shoal in revision 1325 by adding a method
NetworkUtilty#isBindAddressValid. However, it won't be called from GlassFish
until the next integration. See GF issue
https://glassfish.dev.java.net/issues/show_bug.cgi?id=14006 for information on
the integration.





[SHOAL-110] enabling a second network interface in the bios causes view change issues Created: 19/Oct/10  Updated: 20/Oct/10  Resolved: 20/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Rajiv Mordani Assignee: Joe Fialli
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 110

 Description   

I enabled my second interface in the bios (didn't assign an ip or anything to the
interface - just enabled in the bios) and then when I start a cluster GMS does not
seem to work well in the scenario thus causing shoal cache to not work well
either. Once I disabled the interface in bios everything started working.



 Comments   
Comment by Joe Fialli [ 19/Oct/10 ]

request for steps to recreate this issue.

specifically, what was the order for the following operations

  • asadmin start-domain
  • asadmin create-cluster
  • assadmin start-cluster
  • enabling 2nd network interface

We can not support changes to network interfaces in the middle of first 3
methods above when GMS-BIND-INTERFACE-ADDRESS-cluster-name is not being used on
DAS and all instances in the cluster.

The dynamic finding of first network address can change between DAS joining
cluster and instances being started via start-cluster if network interface
change is made in between.

Comment by Joe Fialli [ 19/Oct/10 ]

additionally need to know if GMS-BIND-INTERFACE-ADDRESS is being set for
clustered instances and/or DAS. (GMS-BIND-INTERFACE-ADDRESS should be set for
all clustered instances AND the DAS.)

Comment by Joe Fialli [ 20/Oct/10 ]

Enhancement scheduled for glassfish 3.2 described by
https://glassfish.dev.java.net/issues/show_bug.cgi?id=13056 would enable
detection of misconfiguration of network and/or glassfish cluster.

Not possible to diagnose this in gms since it is dynamic runtime configuration
and there is no place defined that all instances in a cluster are configured.
But in glassfish, the domain.xml has static configuration of clustered instances
and DAS. That info can be used by a tool (3.2 extension of "asadmin
validate-multicast" to validate if multicast is working between all members of
glassfish cluster). If there is an issue detected, user is required to figure
out why multicast is not working as glassfish cluster is configured.





[SHOAL-51] signal.getMemberDetails().get(WAITTIMEBEFORESTARTINGRECOVERY))) returns null Created: 16/Apr/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 51
Status Whiteboard:

shoal-shark-na


 Description   

The following stack trace happened 112 times in Steve's SIFT functional run.

java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:415)
at java.lang.Integer.parseInt(Integer.java:497)
at
com.sun.enterprise.ee.server.autotxrecovery.core.TxnFailureRecoveryActionImpl.co
nsumeSignal(TxnFailureRecoveryActionImpl.java:99)
at com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call
(Router.java:509)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

#]

Here is source code line with NPE.
waitTime = Integer.parseInt((String)(signal.getMemberDetails().get
(WAITTIMEBEFORESTARTINGRECOVERY)));

The code in com.sun.enterprise.ee.cms.core.TxnFailureRecoveryActionImpl.java is
not written to allow for WAITTIMEBEFORESTATINGRECOVERY to not be set. I am
assuming this constant should always be set and filing the missing value as a
bug against shoal. If my assumption is incorrect, then this bug should be re-
assigned to glassfish issue tracker so the Failure Recovery handler can be fixed
to accomodate this scenario.



 Comments   
Comment by Joe Fialli [ 16/Apr/08 ]

Initial submit was premature. made summary more specific.

Comment by shreedhar_ganapathy [ 16/Apr/08 ]

..

Comment by sheetalv [ 09/Jul/08 ]

Assigning to Joe.

Comment by sheetalv [ 28/Jul/08 ]

not important for Sailfin 0.5

Comment by Joe Fialli [ 27/Aug/08 ]

has not occurred recently so downgrading it.

Comment by Joe Fialli [ 07/Oct/10 ]

values are not guaranteed to be in distributed state cache for a failed member.
The member may have failed during initialization, which happened frequently with
port in use failures on startup in glassfish v2.1 time frame.

The WAITFORTIME is no longer stored by glassfish v3.1 transaction in distributed
state cache. But the TX_LOG_DIR property is. But new v3.1 code does account for
the value not being set.

Closing this issue. Functional testing of distributed state cache is desirable
to verify that all is working.





[SHOAL-105] enhance validation of GMS configuration property BIND_INTERFACE_ADDRESS Created: 31/Mar/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 105

 Description   

Configuration of BIND_INTERFACE_ADDRESS should validate that provided
IP-ADDRESS/HOSTNAME is a local IP address.



 Comments   
Comment by Joe Fialli [ 31/Mar/10 ]

accepting error check. system misbehaves and can not find a valid port for
server socket when BIND_INTERFACE_ADDRESS is set to non-local IP address
acciddently. Need to address this for better usability.

Comment by Joe Fialli [ 07/Oct/10 ]

duplicate

      • This issue has been marked as a duplicate of 108 ***




[SHOAL-107] MasterNode ensure delivery of GMS notifications over UDP Created: 15/Jun/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 107

 Description   

GMS membership notifications such as JOIN, JOIN_AND_READY, FAILURE,
PLANNED_SHUTDOWN, FAILURE_SUSPECTED, GroupLeadership are broadcasted from
MasterNode over UDP. A protocol will be developed between masters and other
members of group to ensure that this event is delivered and, if not, master will
resend the event to ALIVE instances that have not acked receiving the notification.

Currently, ensuring that the MasterNode is not a heavily loaded application,
(such as Domain Application Server in Glassfish that does not run apps) and
configuring via OS tuning of UDP buffers has ensured UDP messages are not dropped.
Addressing this issue will provide robustness of event delivery w/o requiring OS
tuning or partition of application load from Shoal GMS MasterNode.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fix checked in.

The master sends along latest MasterViewID with every heartbeat message it
broadcasts. The gms group members record each MasterChangeEvent MasterViewID it
has received. When gms group member detects that it has not received a specific
masterViewID, it requests the master resend to just itself (via more reliable TCP).

Tested this with simulated failure injection.
Wrote ReliableMulticast junit test.





[SHOAL-65] add cluster name to HealthMonitor thread descriptions and log messages Created: 26/Jun/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Trivial
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 65

 Description   

For each cluster/group, the following 3 threads get created in HealthMonitor.java.

"HealthMonitor", "InDoubtPeerDetector Thread" and "FailureVerifier Thread"

It would assist investigating stack traces and logs with multiple clusters
if the cluster/group name was integrated into these names.

For example if one has clusters cluster1, cluster2 and cluster3, these
names would be appended to above thread descriptive names AND also be
included in relevant log messages to provide more complete context.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fixed.

this.healthMonitorThread =
new Thread(this, "HealthMonitor for Group:" + manager.getGroupName());





[SHOAL-106] Grizzly transport: sendMessage gets a NPE in NIOContext.configureOpType Created: 30/Apr/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 106

 Description   

The following occurred in a gms developer test that a gms client sent
a message to another instance and the other instance was attempting to send
a reply back and it received the error below. We have a workaround
to get around the failure BUT it is just a short term hack to get testers by the
issue. Will be consulting with Grizzly developers on this issue to identify if
it is a misuse by Shoal or an issue needing resolution in grizzly.

Here is the stack trace.

java.lang.NullPointerException when calling registered application callback
method com.sun.enterprise.ee.cms.tests.GMSAdminAgent.processNotification. The
method should have handled this exception.
java.lang.NullPointerException
at com.sun.grizzly.NIOContext.configureOpType(NIOContext.java:431)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.notifyCallbackHandlerPseudoConnect(CacheableConnectorHandler.java:221)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.doConnect(CacheableConnectorHandler.java:168)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.connect(CacheableConnectorHandler.java:122)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyTCPConnectorWrapper.send(GrizzlyTCPConnectorWrapper.java:104)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyTCPConnectorWrapper.doSend(GrizzlyTCPConnectorWrapper.java:86)
at
com.sun.enterprise.mgmt.transport.AbstractMessageSender.send(AbstractMessageSender.java:34)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.send(GrizzlyNetworkManager.java:478)
at com.sun.enterprise.mgmt.ClusterManager.send(ClusterManager.java:458)
at
com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:316)
at
com.sun.enterprise.ee.cms.impl.base.GroupHandleImpl.sendMessage(GroupHandleImpl.java:128)
at
com.sun.enterprise.ee.cms.tests.GMSAdminAgent.processNotification(GMSAdminAgent.java:449)
at
com.sun.enterprise.ee.cms.impl.client.MessageActionImpl.processMessage(MessageActionImpl.java:86)
at
com.sun.enterprise.ee.cms.impl.client.MessageActionImpl.consumeSignal(MessageActionImpl.java:69)
at
com.sun.enterprise.ee.cms.impl.common.Router.notifyMessageAction(Router.java:377)
at
com.sun.enterprise.ee.cms.impl.common.Router.notifyMessageAction(Router.java:402)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.analyzeSignal(SignalHandler.java:128)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.handleSignal(SignalHandler.java:106)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.run(SignalHandler.java:91)
at java.lang.Thread.run(Thread.java:637)



 Comments   
Comment by Joe Fialli [ 30/Apr/10 ]

Was using Grizzly 1.9.19 beta2 when this occurred.

Comment by Joe Fialli [ 07/Oct/10 ]

NPE is fixed in Grizzly transport.





[SHOAL-101] very intermittent - ABSTRACT_TRANSPORT BRANCH: dropped Shoal message(using Grizzly transport) in distributed system testing Created: 18/Mar/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File shoal_bug_101_instance105.log     Text File shoal_bug_101_instance106.log    
Issuezilla Id: 101

 Description   

Running HAMessageBuddyReplicationSimulator (see shoal workspace developer test
script runHAMessageBuddyReplicationSimulator.sh) on a distributed group of 9
instances, 1 out of 5 times in running entire test, there will be the
following message drop.

The test is confirming a dropped message when the 2 exceptions below occur in
server logs.

Message test output detecting a dropped message.

Never received objectId:45 msgId:248, from:106
---------------------------------------------------------------
106: FAILED. Confirmed (1) messages were dropped

Here is the matching exception.

[#|2010-03-18T11:18:41.831-0700|WARNING|Shoal|ShoalLogger|_ThreadID=26;_ThreadName=-WorkerThread(31);ClassName=NetworkUtility;MethodName=deserialize;|NetworkUtility.deserialized
current objects:
messages=

{NAD=com.sun.enterprise.ee.cms.impl.base.SystemAdvertisementImpl@e8f7fdef, targetPeerId=192.168.46.109:9130:2299:cluster1:n1c1m9, sourcePeerId=192.168.46.108:9130:2299:cluster1:n1c1m8}

failed while deserializing
name=APPMESSAGE
java.io.StreamCorruptedException: invalid type code: 58
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1356)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at
com.sun.enterprise.mgmt.transport.NetworkUtility.deserialize(NetworkUtility.java:419)
at
com.sun.enterprise.mgmt.transport.MessageImpl.readMessagesFromBytes(MessageImpl.java:233)
at
com.sun.enterprise.mgmt.transport.MessageImpl.parseMessage(MessageImpl.java:214)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser.hasNextMessage(GrizzlyMessageProtocolParser.java:140)
at
com.sun.grizzly.filter.ParserProtocolFilter.execute(ParserProtocolFilter.java:139)
at
com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:135)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:102)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:88)
at
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:57)
at com.sun.grizzly.NIOContext.execute(NIOContext.java:510)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:357)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:257)
at
com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:194)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:129)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.dowork(FixedThreadPool.java:379)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.run(FixedThreadPool.java:360)
at java.lang.Thread.run(Thread.java:619)

#]

Mar 18, 2010 11:18:41 AM
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser
hasNextMessage
WARNING: hasNextMessage()
Thread:-WorkerThread(31),position:6744,nextMsgStartPos:0,expectingMoreData:true,hasMoreBytesToParse:false,error:false,msg
size:5405,message: MessageImpl[v1:CLUSTER_MANAGER_MESSAGE:NAD, Target:
192.168.46.109:9130:2299:cluster1:n1c1m9 , Source:
192.168.46.108:9130:2299:cluster1:n1c1m8,
com.sun.enterprise.mgmt.transport.MessageIOException: failed to deserialize a
message : name = APPMESSAGE

Have not seen this issue occur running checked in
runHAMessageBuddyReplicationSimulator.sh on single machine
with 10 instances in cluster. Will double check this by running it several times.

Also, verifying that there is no message drops when running shoal over jxta
transport.



 Comments   
Comment by Joe Fialli [ 18/Mar/10 ]

accepting issue.

Comment by Joe Fialli [ 18/Mar/10 ]

Created an attachment (id=22)
shoal logs running test runHAMessageBuddyReplicationSimulator with exception in grizzly transport layer receiving the message

Comment by Joe Fialli [ 18/Mar/10 ]

Created an attachment (id=23)
server log of instance sending message that was lost on instance106 - nothing in log that is helpful. Just added for completeness that issue is only showing on receiving side, no send error noted.

Comment by Joe Fialli [ 07/Oct/10 ]

The NPE is fixed in grizzly.





[SHOAL-32] test for getCurrentAliveOrReadyMembers Created: 23/Jan/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 32

 Description   

Need to test the above API from performance and correctness perspective.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

method is deprecated.

replaced by GroupHandle.getCurrentAliveAndReadyCoreView().

There is a dev test already written for it and it is run nightly.





[SHOAL-96] Re-joining a group throws a NPE exception. Created: 19/Jan/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: dhcavalcanti Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Linux


Issuezilla Id: 96

 Description   

If a peer join a group, then leaves the group, and then joins the group again,
the shoal framework throws a NullPointException.

Here is the code that produces the problem:

package sample.shoal;

import com.sun.enterprise.ee.cms.core.GMSConstants.shutdownType;
import com.sun.enterprise.ee.cms.core.GMSFactory;
import com.sun.enterprise.ee.cms.core.GroupManagementService;

public class Sample {

private static final String GROUP_NAME = "MyGroup";
private static final String PEER_NAME = "Peer";
private static final int WAIT_PERIOD = 7000;

public static void main(String[] args)
throws Exception

{ GroupManagementService gms = (GroupManagementService) GMSFactory.startGMSModule(PEER_NAME, GROUP_NAME, GroupManagementService.MemberType.CORE, null); gms.join(); Thread.sleep(WAIT_PERIOD); gms.shutdown(shutdownType.INSTANCE_SHUTDOWN); Thread.sleep(WAIT_PERIOD); gms.join(); Thread.sleep(WAIT_PERIOD); gms.shutdown(shutdownType.INSTANCE_SHUTDOWN); Thread.sleep(WAIT_PERIOD); System.exit(0); }

}

And here is the output log:

Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group MyGroup : Members in view for
MASTER_CHANGE_EVENT(before change analysis) are :
1: MemberId: Peer, MemberType: CORE, Address: urn:jxta:uuid-
59616261646162614A7874615032503384B6AF8DCB384ABA9DB5B80617CF675D03

Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT for Member: Peer of Group: MyGroup
Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addGroupLeadershipNotificationSignal
INFO: adding GroupLeadershipNotification signal leaderMember: Peer of group:
MyGroup
Jan 19, 2010 3:52:55 PM com.sun.enterprise.jxtamgmt.ClusterViewManager addToView
WARNING: no changes from previous view, skipping notification of listeners for
cluster view event MASTER_CHANGE_EVENT from member: Peer group: MyGroup
Jan 19, 2010 3:52:55 PM com.sun.enterprise.jxtamgmt.MasterNode appointMasterNode
INFO: Assuming Master Node designation member:Peer for group:MyGroup
Jan 19, 2010 3:52:57 PM com.sun.enterprise.ee.cms.impl.jxta.GMSContext leave
INFO: Leaving GMS group MyGroup with shutdown type set to InstanceShutdown
Exception in thread "MessageWindowThread:MyGroup" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)
Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group MyGroup : Members in view for
MASTER_CHANGE_EVENT(before change analysis) are :
1: MemberId: Peer, MemberType: CORE, Address: urn:jxta:uuid-
59616261646162614A7874615032503384B6AF8DCB384ABA9DB5B80617CF675D03

Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT for Member: Peer of Group: MyGroup
Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addGroupLeadershipNotificationSignal
INFO: adding GroupLeadershipNotification signal leaderMember: Peer of group:
MyGroup
Jan 19, 2010 3:53:09 PM com.sun.enterprise.jxtamgmt.ClusterViewManager addToView
WARNING: no changes from previous view, skipping notification of listeners for
cluster view event MASTER_CHANGE_EVENT from member: Peer group: MyGroup
Jan 19, 2010 3:53:09 PM com.sun.enterprise.jxtamgmt.MasterNode appointMasterNode
INFO: Assuming Master Node designation member:Peer for group:MyGroup
Jan 19, 2010 3:53:11 PM com.sun.enterprise.ee.cms.impl.jxta.GMSContext leave
INFO: Leaving GMS group MyGroup with shutdown type set to InstanceShutdown



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fixed.

Added a junit regression test in GroupManagementServiceImplTest for this scenario.





[SHOAL-109] optimize virtual broadcast message send Created: 19/Aug/10  Updated: 19/Aug/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 109

 Description   

Broadcast that iterates over each active instance and sends over TCP is
inefficiently serializing the payload each time it sends to an instance.

When udp broadcast is used, the payload of gms send message is serialized once
and then broadcast to all instances in the cluster. Correct this inefficiency
since DistributedStateCache and GroupHandle.sendMessage(String targetComponent,
bytes[]) serializes the
GMSMessage object FOR EACH INSTANCE in cluster.

This change will not impact GMS notifications or heartbeats since they rely on
udp broadcast of gms sendMessage.






[SHOAL-21] When MasterNode is killed, then other surviving member does not assume master role Created: 11/May/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Critical
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 21

 Description   

Reported by user Kapucin
https://shoal.dev.java.net/servlets/ReadMsg?list=users&msgNo=21

When a Master Node is killed, then another master does not get selected. Also
many probeNode() messages are seen on the surviving member.



 Comments   
Comment by shreedhar_ganapathy [ 11/May/07 ]

Marked as a P2

Comment by shreedhar_ganapathy [ 11/May/07 ]

Fix has been checked in per following cvs message:
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=318





[SHOAL-22] HealthMonitor refactoring to improve node state maintenance and detection Created: 24/May/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Critical
Reporter: hamada Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File issue22.patch    
Issuezilla Id: 22
Status Whiteboard:

Review requested


 Description   

Currently the HealthMonitor module uses 2 collections and Maps to maintain and
detect cluster member states. Maintaining states in multiple collections and
maps require proper locking on affected objects during critical operations,
which can lead to long back logs and eventually leading to false failures.

The patch removes all synchronizations, uses a single hashtable to maintain
member states, and detect failures. Missed heartbeats are computed from an
entry timestamp / the expected heartbeat timeout, thus eliminating the need to
maintain a retries table.



 Comments   
Comment by hamada [ 24/May/07 ]

Created an attachment (id=2)
refactors HealthMonitor to utilize a single table for state maintainance and failure detection

Comment by hamada [ 24/May/07 ]

Review requested

Comment by shreedhar_ganapathy [ 09/Jun/07 ]

Reviewed and integrated changes in last week of May 07.





[SHOAL-28] Add support for multicluster support Created: 10/Nov/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Critical
Reporter: hamada Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File issue28.patch     Java Source File NetworkManager.java    
Issuezilla Id: 28

 Description   

The following patch adds multi cluster support by separating multicast out of
the world group out into the net group, thus allowing total isolation down to
the network layer.



 Comments   
Comment by hamada [ 10/Nov/07 ]

Created an attachment (id=3)
Allows for multi cluster creation

Comment by shreedhar_ganapathy [ 28/Jan/08 ]

The patch supporting multi cluster did not work after the netPeerGroup, started
and stopped fields in NetworkManager that were earlier static were make non-static.
The real issue with Jxta then surfaces whereby the WorldPeerGroup does not allow
for multiple infrastructure peer groups each with their own multicast address
space.

Also see related issue filed in Sailfin that is affecting CLB requirement. A
early fix for this issue would help then proceed with their module's testing.

Comment by shreedhar_ganapathy [ 28/Jan/08 ]

Related Sailfin Issue
https://sailfin.dev.java.net/issues/show_bug.cgi?id=446

Comment by sheetalv [ 30/Jan/08 ]

Created an attachment (id=4)
updated NetworkManager.java

Comment by sheetalv [ 30/Jan/08 ]

I tried running rungmsdemo.sh with the updated NetworkManager class. Made "netPeerGroup",
"started" and "stopped" as non-static variables.
Still no luck. I see the following exception while running the test :
[#|2008-01-30T10:01:24.871-0800|WARNING|Shoal|ShoalLogger|
_ThreadID=25;_ThreadName=pool-1-
thread-4;ClassName=DistributedStateCacheImpl;MethodName=getFromCacheForPattern;|
GMSException during DistributedStateCache Sync....com.sun.enterprise.ee.cms.core.GMSException:
java.io.IOException: Unable to create a messenger to jxta://
uuid-1070C0A4278E4D80AA34A4BA3D45F734CBEC5C6CD8414CB09EF1AC46AC045C7D03/
PipeService/
urn:jxta:uuid-1070C0A4278E4D80AA34A4BA3D45F7346521C3C52E4443928082812BCDC1E25B04|#]

I see the above exception stemming from some other APIs while running the multi-group test.
In the multi-group test, there are 2 instances running separate VMs trying to join 2 groups.
Instances A and B join groups G1 and G2. A and B send messages to both groups. B receives the
messages for group1 but not for group2. Similarly A receives messages for group1 but not for group2.
The following exception is seen when B is trying to send mesage to group2.

Jan 30, 2008 2:35:55 PM com.sun.enterprise.jxtamgmt.HealthMonitor send
WARNING: Failed to send message
java.io.IOException: Unable to create a messenger to jxta://
uuid-45716DDDE2C34663A79D9C808283F839117F856F1C3A4CC083A5D3A0BC2CCD7F03/
PipeService/
urn:jxta:uuid-45716DDDE2C34663A79D9C808283F8397F57B1A44FDB469D9F94BE4B0722C28904
at net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:221)
at net.jxta.impl.pipe.BlockingWireOutputPipe.send(BlockingWireOutputPipe.java:245)
at com.sun.enterprise.jxtamgmt.HealthMonitor.send(HealthMonitor.java:426)
at com.sun.enterprise.jxtamgmt.HealthMonitor.reportMyState(HealthMonitor.java:359)
at com.sun.enterprise.jxtamgmt.HealthMonitor.process(HealthMonitor.java:289)
at com.sun.enterprise.jxtamgmt.HealthMonitor.pipeMsgEvent(HealthMonitor.java:217)
at net.jxta.impl.pipe.InputPipeImpl.processIncomingMessage(InputPipeImpl.java:219)
at net.jxta.impl.pipe.WirePipe.callLocalListeners(WirePipe.java:374)
at net.jxta.impl.pipe.WirePipe.processIncomingMessage(WirePipe.java:350)
at net.jxta.impl.pipe.WirePipeImpl.processIncomingMessage(WirePipeImpl.java:338)
at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage(EndpointServiceImpl.java:
989)
at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
(EndpointServiceInterface.java:352)
at net.jxta.impl.rendezvous.RendezVousServiceProvider.processReceivedMessage
(RendezVousServiceProvider.java:502)
at net.jxta.impl.rendezvous.StdRendezVousService.processReceivedMessage
(StdRendezVousService.java:240)
at net.jxta.impl.rendezvous.RendezVousServiceProvider.processIncomingMessage
(RendezVousServiceProvider.java:159)
at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage(EndpointServiceImpl.java:
989)
at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
(EndpointServiceInterface.java:352)
at net.jxta.impl.endpoint.mcast.McastTransport.processMulticast(McastTransport.java:752)
at net.jxta.impl.endpoint.mcast.McastTransport$DatagramProcessor.run(McastTransport.java:874)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:613)
Jan 30, 2008 2:35:55 PM com.sun.enterprise.ee.cms.tests.multigroupjoin.MultiGroupJoinTest main
SEVERE: Exception occured while joining group:com.sun.enterprise.ee.cms.core.GMSException:
java.io.IOException: Unable to create a messenger to jxta://
uuid-45716DDDE2C34663A79D9C808283F839117F856F1C3A4CC083A5D3A0BC2CCD7F03/
PipeService/
urn:jxta:uuid-45716DDDE2C34663A79D9C808283F8396521C3C52E4443928082812BCDC1E25B04

Comment by sheetalv [ 28/Feb/08 ]

Mo has fixed this issue. The MultiGroupJoinTest shows that an instance can be
part of 2 groups and send and receive messages to it.

change log :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=525
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=526
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=527





[SHOAL-42] masterNode.getRouteControl().isConnected returns false intermittently or all the time Created: 16/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Critical
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 42

 Description   

Hi Mo,
I added a log message to print out the value of isConnected() in the HealthMonitor class. I ran the
rungmsdemo.sh test on 2 terminals and I see the value printed as false all the time.
Please comment. This is crucial since the "false" value would mean that our logic to determine
IN_DOUBT state would need to be altered.
Thanks
Sheetal

On Feb 14, 2008, at 4:27 PM, Sheetal Vartak wrote:

Hi Mo,
As you are aware, one of the instances that is started while running my MultiGroupTest suddenly
decides to go into IN_DOUBT state. I was looking at the values computed for
masterNode.getRouteControl().isConnected(entry.id). I found that the value is sometime false and
sometimes true. What I don't understand is how can the value be false at one point (time = t) and then
at some other point (time t+delta) it becomes true.
Can you please shed some light on this?
Thanks
Sheetal



 Comments   
Comment by sheetalv [ 28/Feb/08 ]

Mo has fixed this issue in the single cluster scenario. It still does not work
correctly in a multi-cluster environment. The MultiGroupJoinTest produces false
failures due to isConnected() returning false intermittently.
Test can be run in 2 terminals as follows :
sh runmultigroupjointest.sh C1
sh runmultigroupjointest.sh C2

change log :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=531

Comment by sheetalv [ 04/Mar/08 ]

This issue is now resolved. Added a fix in HealthMonitor to not check for
isConnected for the same instance where the VM is running. The HealthMonitor's
InDoubtPeerDetector thread now iterates through all the entries but skips its
own entry since isConnected() obviously returns false for its own self.

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=536





[SHOAL-43] unable to create messenger exception from JXTA Created: 16/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Critical
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 43

 Description   

Hi Mo,
I still keep seeing exceptions as follows even when running the rungmsdemo.sh :

[#|2008-02-15T16:57:14.511-0800|WARNING|Shoal|ShoalLogger|_ThreadID=14;_ThreadName=pool-
1-thread-
3;ClassName=DistributedStateCacheImpl;MethodName=getFromCacheForPattern;|GMSException during
DistributedStateCache Sync....com.sun.enterprise.ee.cms.core.GMSException: java.io.IOException:
Unable to create a messenger to jxta://uuid-
1070C0A4278E4D80AA34A4BA3D45F734CBEC5C6CD8414CB09EF1AC46AC045C7D03/PipeService/urn
:jxta:uuid-1070C0A4278E4D80AA34A4BA3D45F7346521C3C52E4443928082812BCDC1E25B04|#]

There are no changes in my workspace except for some minor log messages. This should be
reproducible just by running the ApplicationServer test.

Can you please look into this? I have noticed it after turning the log level to INFO. It probably got lost
when the log level was set to FINEST.

Simply start the test in 2 terminals as follows :

<terminal 1> sh rungmsdemo.sh C1 G1 CORE 200000 INFO
<terminal 2> sh rungmsdemo.sh C2 G1 CORE 200000 INFO

Thanks
Sheetal



 Comments   
Comment by sheetalv [ 28/Feb/08 ]

This issue has been fixed. Please see

https://shoal.dev.java.net/issues/show_bug.cgi?id=28

Comment by shreedhar_ganapathy [ 25/Mar/08 ]

Vivek reports seeing this with GF v2.1 build 24c which has Jxta jar svn version
537.

[#|2008-03-25T17:41:49.101-0700|WARNING|sun-appserver9.1|ShoalLogger|_ThreadID=20;_ThreadName=pool-2-thread-4;_RequestID=378e023c-9136-4a10-b128-278092756024;|GMSException
during DistributedStateCache
Sync....com.sun.enterprise.ee.cms.core.GMSException: java.io.IOException: Unable
to create a messenger to
jxta://uuid-D061310EC6A64B22A06AB63D5D1A4DC47987FC1134E54090AB24B0C9E01AD7DF03/PipeService/urn:jxta:uuid-D061310EC6A64B22A06AB63D5D1A4DC46521C3C52E4443928082812BCDC1E25B04|#]

[#|2008-03-25T17:41:49.103-0700|SEVERE|sun-appserver9.1|javax.enterprise.resource.corba|_ThreadID=20;_ThreadName=pool-2-thread-4;_RequestID=378e023c-9136-4a10-b128-278092756024;|The
log message is null.
java.lang.NullPointerException
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.addMember(IiopFolbGmsClient.java:359)
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.handleSignal(IiopFolbGmsClient.java:286)
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.consumeSignal(IiopFolbGmsClient.java:174)
at
com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call(Router.java:509)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

Comment by sheetalv [ 25/Apr/08 ]

This issue has been fixed and is available in promoted build b31 of
SJSAS91_FCS_BRANCH.





[SHOAL-68] HealthMonitor's getMemberState should not make a network roundtrip peer's own health state Created: 16/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 68
Status Whiteboard:

shoal-shark-na


 Description   

When a client component in the same VM calls
JoinNotificationSignal.getMemberState(), the eventual call results in HM making
a network call. This is okay for other peers but when the member involved is the
VM itself, the call should check for that and consult the local health state
cache and return that state.

This is critical for Sailfin's CLB



 Comments   
Comment by sheetalv [ 28/Jul/08 ]

will fix for Sailfin 1.5.

Comment by sheetalv [ 27/Aug/08 ]

needs to be a P2 since it needs to be fixed for Sailfin 1.5

Comment by sheetalv [ 23/Oct/08 ]

This issue has been fixed. A check to see if the member is asking for its own state has already been
checked in.





[SHOAL-77] CMS SIGSEGV error generated by the JVM Created: 16/Sep/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: andbur Assignee: sheetalv
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File PL5_hserr_pid2847.log    
Issuezilla Id: 77

 Description   

Couldn't change two of the upper header field...
Found in version should be 1.0 and
subcomponent = server_lifecycle

Operative System : Linux, CXP9013152/1 R2B02
Sailfin version : v5 b37g

  1. An unexpected error has been detected by Java Runtime Environment:
    #
  2. SIGSEGV (0xb) at pc=0x00002b5f56ecc650, pid=2847, tid=1098402112
    #
  3. Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64)
  4. Problematic frame:
  5. C [libc.so.6+0x72650] strlen+0x20
    #
  6. If you would like to submit a bug report, please visit:
  7. http://java.sun.com/webapps/bugreport/crash.jsp
  8. The crash happened outside the Java Virtual Machine in native code.
  9. See problematic frame for where to report the bug.
    #

--------------- T H R E A D ---------------

Current thread (0x00002aab77ecc400):
JavaThread "com.sun.enterprise.ee.cms.impl.common.Router Thread"
[_thread_in_native, id=2934, stack(0x0000000041764000,0x0000000041785000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
si_addr=0x0000000000000009

Registers:
RAX=0x0000000000000000, RBX=0x000000004177d250, RCX=0x000000004177d250,
RDX=0x000000004177d330
RSP=0x000000004177caa8, RBP=0x000000004177d120, RSI=0x0000000000000009,
RDI=0xfffffffffffffff7
R8 =0xfffffffffffffff9, R9 =0x00002b5f56f5b8c0, R10=0x0000000000000000,
R11=0x0000000000000000
R12=0x0000000000000009, R13=0x0000000000000000, R14=0x000000004177cff4,
R15=0x0000000000000000
RIP=0x00002b5f56ecc650, EFL=0x0000000000010297, CSGSFS=0x0000000000000033,
ERR=0x0000000000000004
TRAPNO=0x000000000000000e

Top of Stack: (sp=0x000000004177caa8)
0x000000004177caa8: 00002b5f56e9e98a 00000000000005e8
0x000000004177cab8: 0000000000000000 000000004177d0e0



 Comments   
Comment by andbur [ 16/Sep/08 ]

Created an attachment (id=10)
hserr_pid2847 log file found on payload node no 5 (PL5)

Comment by sheetalv [ 27/Oct/08 ]

Need more information on how to reproduce this issue. Does this issue still occur with the latest Sailfin
1.5 promoted build?

Comment by andbur [ 28/Oct/08 ]

We have only seen this once on SGCS 1.0 build 36g and have not been able to
reproduce since that. I think it's okay to close this issue unless you can see
some obvious fault based on the SIGSEGV dump and if we see it again in 1.5 or
1.0 we'll open up the issue again to help you troubleshoot.

Comment by sheetalv [ 31/Jul/09 ]

not enough information.

Comment by sheetalv [ 31/Jul/09 ]

not enough information.





[SHOAL-81] Propagate Senders HM.Entry seqid in sent HealthMessage Created: 02/Oct/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File unexpectedfailure.log    
Issuezilla Id: 81

 Description   

Requires changes in HealthMessage.initialize() and getDocument().

HealthMessage.getDocument() should write the HealthMessage.Entry.sequenceId into
its XML document representation.
HealthMessage.initialize() should read the senders sequence id for health
message.entry from XML document representation.

Currently, the receiver side is just creating a sequence id based on
order of receiving messages. Jxta messaging protocol does not guarantee
that messages are received in precise order that they were sent, so the
current sequencing mechanism could be resulting in out of order processing
of health messages. This could result in incorrect computed cache state for an
instance in the master node.



 Comments   
Comment by Joe Fialli [ 06/Oct/08 ]

Created an attachment (id=12)
server log summarizing out of order message processing

Comment by Joe Fialli [ 06/Oct/08 ]

https://shoal.dev.java.net/nonav/issues/showattachment.cgi/12/unexpectedfailure.log

Following attachment summarizes a failure that occurs due to this defect.
Messages are sent by instance in following order:
aliveandready
clusterstopping
stopping

The DAS (master node) receives the messages in the following order:
stopping (receiving side seqid 960)
clusterstopping (receiving side seqid 961)
aliveandready (receiving side seqid 963)

The DAS processes the message in following order:
clusterstopping (961)
stopping(960)
aliveandready (963)

The aliveandready message being processed last makes a stopped instance
appear to come back to life as far as Master is concerned.
It is then marked as INDOUBT by master and then verified FAILED.
Must correct this ordering issue to fix this.

Comment by Joe Fialli [ 11/Nov/08 ]

Fix delivered. Senders sequence id is now propagated.

Also, use start time of member and sequence id to order messages between
one invocation and a restart invocation of server instance.
(Nodeagent can restart a failed instance quickly so this can happen)





[SHOAL-3] GMS SPI does not expose receive capability Created: 12/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 3

 Description   

The GMS SPI GroupCommunicationProvider.java does not expose a receive() method
so that SPI implementations may handle messages received from the underlying
Group Communication Provider.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by shreedhar_ganapathy [ 16/Feb/07 ]

..

Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-4] GMS SPI does not expose Send to more than one member Created: 12/Nov/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 4

 Description   

GMS SPI GroupCommunicationProvider.java does not expose a sendMessage() method
that allows for sneding messages to all or more than one member.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by shreedhar_ganapathy [ 31/Jan/07 ]

Fixes Checked in per cvs log message below:
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=198
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=197





[SHOAL-5] GMS SPI should use more that a String to identify members Created: 12/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 5

 Description   

GMS SPI currently uses a String to identify a member which may work in many
cases. We need to consider a better way to represent a member such that
implementations can easily map such a member to the underlyiing Group
Communication Provider's member representations.

We may also want to open up the possibilities that applications using GMS may
not want to identify their members as a String.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-6] GroupHandle needs to expose API to sendMessage to more than one member Created: 12/Nov/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: shreedhar_ganapathy
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 6

 Description   

GroupHandle which is the client side API for interacting with the group, does
not currently expose a method to send messages to all or more than one member.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by shreedhar_ganapathy [ 31/Jan/07 ]

Actually it does. This is a filig error





[SHOAL-7] DistributedStateCacheImpl should not send messages to members individually Created: 12/Nov/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 7

 Description   

DistributedStateCacheImpl currently interprets sendMessage(List<String>, ....)
as sending to each member in the list one by one. This does not take advantage
of Group Comm libs that allow messages to be sent to group or a list of members.
<code>
private synchronized void sendMessage ( final List<String> members,
final DSCMessage msg )
throws GMSException
{
if(members != null && !members.isEmpty()){
for(String member : members)

{ sendMessage(member, msg); }

}
}
</code>



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by shreedhar_ganapathy [ 01/Feb/07 ]

User: shreedhar_ganapathy
Date: 2007/02/01 19:22:31

Modified:

shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/DistributedStateCacheImpl.java

Log:
Fix for Issue # 7
https://shoal.dev.java.net/issues/show_bug.cgi?id=7
DSC impl now sends it to all members where appropriate.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================

File [changed]: DistributedStateCacheImpl.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/DistributedStateCacheImpl.java?r1=1.3&r2=1.4
Delta lines: +5 -46
--------------------
— DistributedStateCacheImpl.java 19 Jan 2007 01:50:12 -0000 1.3
+++ DistributedStateCacheImpl.java 2 Feb 2007 03:22:28 -0000 1.4
@@ -72,7 +72,7 @@
*

  • @author Shreedhar Ganapathy
  • Date: June 20, 2006
  • * @version $Revision: 1.3 $
    + * @version $Revision: 1.4 $
    */
    public class DistributedStateCacheImpl implements DistributedStateCache { private final ConcurrentHashMap<GMSCacheable, Object> cache = @@ -137,23 +137,6 @@ addToRemoteCache(cKey, state); }
  • public void addToCache(final String componentName,
  • final String memberTokenId,
  • final Serializable key,
  • final byte[] state,
  • final List<String> targetReplicantMembers)
  • throws GMSException { - logger.log(Level.FINER, "Adding to DSC by local Member:" + memberTokenId + - ",Component:" + componentName + ",key:" + key + - ",State:" + state + - ", Replicants:" + targetReplicantMembers.toString()); - final GMSCacheable cKey = createCompositeKey(componentName, - memberTokenId, - key); - addToLocalCache(cKey, state); - addToReplicants(cKey, state, targetReplicantMembers); - addToRemoteCache(cKey, targetReplicantMembers); - }

public void addToLocalCache(
final String componentName, final String memberTokenId,
@@ -182,6 +165,7 @@
final Object state)

{ cKey = getTrueKey(cKey); cache.put(cKey, state); + printDSCContents(); }

private void printDSCContents () {
@@ -215,19 +199,8 @@
throws GMSException

{ final DSCMessage msg = new DSCMessage(cKey, state, DSCMessage.OPERATION.ADD.toString()); - final List<String> members = getGMSContext().getViewWindow().getAllCurrentMembers(); - - sendMessage(members, msg); - }

-

  • private void addToReplicants(final GMSCacheable cKey,
  • final Object state,
  • final List<String> targetReplicantMembers)
  • throws GMSException { - final DSCMessage msg = new DSCMessage(cKey, state, - DSCMessage.OPERATION.ADD.toString()); - sendMessage(targetReplicantMembers, msg); + sendMessage(null, msg); }

/*
@@ -253,8 +226,7 @@
cKey = getTrueKey(cKey);
final DSCMessage msg = new DSCMessage(cKey, null,
DSCMessage.OPERATION.REMOVE.toString());

  • final List<String> members =
    getGMSContext().getViewWindow().getAllCurrentMembers();
  • sendMessage(members, msg);
    + sendMessage(null, msg);
    }

/*
@@ -394,10 +366,7 @@
final DSCMessage msg = new DSCMessage(cache,
DSCMessage.OPERATION.ADDALLREMOTE.toString(),
false);

  • final List<String> members = getGMSContext().getViewWindow()
  • .getAllCurrentMembers();
    -
  • sendMessage(members, msg);
    + sendMessage(null, msg);
    }

/**
@@ -459,16 +428,6 @@
final String componentName, final String memberTokenId,
final Object key)

{ return new GMSCacheable(componentName, memberTokenId, key); - }

-

  • private synchronized void sendMessage(final List<String> members,
  • final DSCMessage msg)
  • throws GMSException {
  • if (members != null && !members.isEmpty()) {
  • for (String member : members) { - sendMessage(member, msg); - }
  • }
    }

private synchronized void sendMessage(final String member,





[SHOAL-8] Add a client API to expose Group's leader Created: 07/Dec/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 8

 Description   

Shoal clients should be able to determine who the group's leader is and whether
the current member is a group leader.
Thanks to Rob Beazizo for pointing out this.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by shreedhar_ganapathy [ 19/Jan/07 ]

Checked in fix to support this enhancement request
CVS log URL:
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=190





[SHOAL-9] Need Shoal User Guide Created: 08/Dec/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 9

 Description   

A Shoal User Guide is needed to guide users to easily integrate the library into
their applications.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 ]

..

Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-10] stack trace during appserver shutdown Created: 12/Dec/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: lwhite Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 10

 Description   

read;|Sending FailureSuspectedSignals to registered Actions...|#]

[#|2006-12-12T12:33:09.648-0800|INFO|sun-appserver-ee9.1|javax.ee.enterprise.sys
tem.gms|_ThreadID=22;_ThreadName=com.sun.enterprise.ee.cms.impl.common.Router Th
read;|Sending FailureSuspectedSignals to registered Actions...|#]

[#|2006-12-12T12:33:13.112-0800|SEVERE|sun-appserver-ee9.1|net.jxta.impl.pipe.No
nBlockingWireOutputPipe|_ThreadID=29;_ThreadName=Worker Thread for NonBlockingWi
reOutputPipe : urn:jxta:uuid-59313231343132314A484431544E50477F57B1A44FDB469D9F9
4BE4B0722C28904;_RequestID=5f970310-0c5f-4b23-84ea-0019bb3db5f3;|Failed sending
net.jxta.endpoint.Message@21719901(1)

{30415,30414}

on urn:jxta:uuid-593132313431
32314A484431544E50477F57B1A44FDB469D9F94BE4B0722C28904
java.io.IOException: No RDV provider
at net.jxta.impl.rendezvous.RendezVousServiceImpl.propagateToNeighbors(R
endezVousServiceImpl.java:927)
at net.jxta.impl.rendezvous.RendezVousServiceInterface.propagateToNeighb
ors(RendezVousServiceInterface.java:345)
at net.jxta.impl.pipe.WirePipe.sendMessage(WirePipe.java:497)
at net.jxta.impl.pipe.NonBlockingWireOutputPipe.run(NonBlockingWireOutpu
tPipe.java:391)
at java.lang.Thread.run(Thread.java:595)

#]

[#|2006-12-12T12:33:13.114-0800|INFO|sun-appserver-ee9.1|javax.ee.enterprise.sys
tem.gms|_ThreadID=12;_ThreadName=ViewWindowThread;|GMS View Change Received: Mem
bers in view (before change analysis) are :
1: MemberId: i2, MemberType: CORE, Address: urn:jxta:uuid-59313231343132314A4844
31544E5047A3FCFCE465FB451999D4E9EC39A9467B03

#]

[#|2006-12-12T12:33:13.116-0800|SEVERE|sun-appserver-ee9.1|net.jxta.impl.pipe.No
nBlockingWireOutputPipe|_ThreadID=29;_ThreadName=Worker Thread for NonBlockingWi
reOutputPipe : urn:jxta:uuid-59313231343132314A484431544E50477F57B1A44FDB469D9F9
4BE4B0722C28904;_RequestID=5f970310-0c5f-4b23-84ea-0019bb3db5f3;|Failed sending
net.jxta.endpoint.Message@7723154(1)

{30418,30417}

on urn:jxta:uuid-5931323134313
2314A484431544E50477F57B1A44FDB469D9F94BE4B0722C28904
java.io.IOException: No RDV provider
at net.jxta.impl.rendezvous.RendezVousServiceImpl.propagateToNeighbors(R
endezVousServiceImpl.java:927)
at net.jxta.impl.rendezvous.RendezVousServiceInterface.propagateToNeighb
ors(RendezVousServiceInterface.java:345)
at net.jxta.impl.pipe.WirePipe.sendMessage(WirePipe.java:497)
at net.jxta.impl.pipe.NonBlockingWireOutputPipe.run(NonBlockingWireOutpu
tPipe.java:391)
at java.lang.Thread.run(Thread.java:595)

#]

[#|2006-12-12T12:33:13.116-0800|INFO|sun-appserver-ee9.1|javax.ee.enterprise.sys
tem.gms|_ThreadID=12;_ThreadName=ViewWindowThread;|Analyzing new membership snap
shot received as part of event : FAILURE_EVENT|#]



 Comments   
Comment by shreedharganapathy [ 26/Dec/06 ]

I think this may have already been fixed. Reassigning to hamada to verify

Comment by shreedharganapathy [ 12/Jan/07 ]

MO fixed this in Jxta platform.





[SHOAL-11] GMS doesn't see nodes via GroupHandle that have been started later Created: 26/Dec/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: xmart Assignee: shreedharganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 11

 Description   

slightly modify ApplicationServer.java:

===

public void run() {
startGMS();
addMemberDetails();
startClientServices();
long till = System.currentTimeMillis()
+ Integer.parseInt(System.getProperty("LIFEINMILLIS", "15000"));
try {
do {
for (String details : gms.getGroupHandle()
.getAllCurrentMembersWithStartTimes())

{ logger.log(Level.INFO, details); }

Thread.sleep(3000);
} while (System.currentTimeMillis() < till);
} catch (InterruptedException e)

{ logger.log(Level.SEVERE, e.getLocalizedMessage()); }

stopClientServices();
stopGMS();
System.exit(0);
}

===

try this in separate consoles

./rungmsdemo.sh inst1 grp0 CORE 60000 INFO
./rungmsdemo.sh inst2 grp0 CORE 60000 INFO

My output is (only relevant snippets here):

#1:

[#|2006-12-26T21:10:35.840+0300|INFO|JxtaMgmt|javax.ee.enterprise.system.gms|_ThreadID=11;
_ThreadName=ApplicationServer;ClassName=com.sun.enterprise.ee.cms.tests.ApplicationServer;
MethodName=run;RecordNumber=90;

inst1::1167156573930 #]

[#|2006-12-26T21:10:35.937+0300|FINER|JxtaMgmt|javax.ee.enterprise.system.gms|_ThreadID=18;
_ThreadName=HealthMonitor Thread interval : 3000;
ClassName=com.sun.enterprise.jxtamgmt.MasterNode;MethodName=isMaster;RecordNumber=91;

isMaster :true MasterAssigned :true View Size :2 #]

#2:

[#|2006-12-26T21:10:36.529+0300|FINER|JxtaMgmt|javax.ee.enterprise.system.gms|_ThreadID=22;
_ThreadName=HealthMonitor Thread interval : 3000;
ClassName=com.sun.enterprise.jxtamgmt.MasterNode;MethodName=isMaster;RecordNumber=113;

isMaster :false MasterAssigned :true View Size :2 #]

[#|2006-12-26T21:10:36.540+0300|INFO|JxtaMgmt|javax.ee.enterprise.system.gms|_ThreadID=11;
_ThreadName=ApplicationServer;ClassName=com.sun.enterprise.ee.cms.tests.ApplicationServer;
MethodName=run;RecordNumber=114;

inst2::1167156586641 #]

[#|2006-12-26T21:10:36.540+0300|INFO|JxtaMgmt|javax.ee.enterprise.system.gms|_ThreadID=11;
_ThreadName=ApplicationServer;ClassName=com.sun.enterprise.ee.cms.tests.ApplicationServer;
MethodName=run;RecordNumber=115;

inst1::1167156573930 #]

--------------------------

Why #1 does only see itself, but both of them say that ViewSize is 2 ???



 Comments   
Comment by shreedharganapathy [ 26/Dec/06 ]

Thanks for this submission. For others seing this issue ticker, this test has to
run in FINER mode to see this.

I also notice that in the FINER mode the second member's joining is not conveyed
to GMS layer from the JxtaMgmt layer. This does not happen when the log level is
set to INFO.

Comment by shreedharganapathy [ 26/Dec/06 ]

..

Comment by shreedharganapathy [ 26/Dec/06 ]

inadvertent assignment to hamada thinking this is issue 10

Comment by xmart [ 13/Jan/07 ]

Looks like a bug in logging system..

After updating tonight I got this stack trace:

Jan 14, 2007 2:40:41 AM com.sun.enterprise.ee.cms.impl.jxta.GMSContext <init>
INFO: Initialized Group Communication System....
Jan 14, 2007 2:40:41 AM
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl startup
INFO: gms.joinMessage
Exception in thread "GMS" java.lang.NoClassDefFoundError: org/apache/log4j/Priority
at com.sun.enterprise.jxtamgmt.NetworkManager.getSocketID(NetworkManager.java:179)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:122)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:112)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:114)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:108)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:299)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.startup(GroupManagementServiceImpl.java:79)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.run(GroupManagementServiceImpl.java:73)
at java.lang.Thread.run(Thread.java:619)

When I've put log4j-1.2.14.jar on classpath everything worked ok.
It looks like commons-logging-1.1 has support for new log4j TRACE logging level,
and while checking for it it doesn't catch runtime exceptions [somewhere].

Comment by shreedharganapathy [ 13/Jan/07 ]

Are you using a freshly built jar or are you using the downloaded one?

Please let us know.
I tried both on a fresh work space and could not reproduce this issue? Is it
possible you have an older set of jars? We removed the dependency on log4j a
couple of months ago and all logging is based on java.util.logging.

Comment by xmart [ 13/Jan/07 ]

Yes, I use a jar that I built max. 1 hour before the last post.

There is a problem in log4j-commonslogging junction while commons-logging tries
to work with both log4j1.2 and 1.3... and it doesn't work well without it))

Comment by shreedharganapathy [ 13/Jan/07 ]

One more question:
Are you using the jxta jar that is supplied with the build (provided in lib dir)?

Comment by xmart [ 14/Jan/07 ]

Both the bundled jxta.jar and jxta-2.4.1b.jar work.
However, jxta-2.4.1b.jar generates substantially more log output.

May you describe the differences between these? Just different versions or the
one bundled is patched?

Comment by shreedharganapathy [ 14/Jan/07 ]

I looked at the sources of jxta and needless to say IDFactory does use log4j
classes. I suppose we may have a patched version of jxta to remove this
dependency as with the version of jxta jar we have in Shoal I dont see this
stack trace you reported.

I will let hamada respond on that as he is also the Jxta architect.

Comment by hamada [ 14/Jan/07 ]

The bundled jxta.jar is based on 2.4.1 with patch
http://platform.jxta.org/issues/show_bug.cgi?id=1537, and some select fixes from
the upcoming jxta release, as we try to limit the number of changes introduced.
Once issue 1537 is applied you can expect alignment with the JXTA stable releases.





[SHOAL-12] Startup time needs improvement Created: 27/Dec/06  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 12

 Description   

Noticed the following when running jxtamgmt module test using run_client.
C:\shoaln\shoal\gms>run_client.bat s1

[#|2006-12-27T10:00:42
.640-0800|FINER|JxtaMgmt|JxtaMgmt|_ThreadID=10;_ThreadName
=main;ClassName=com.sun.enterprise.jxtamgmt.ClusterManager;MethodName=main;Recor
dNumber=0;|Instance Name :clients1|#]

[#|2006-12-27T10:00:53
.375-0800|FINER|JxtaMgmt|JxtaMgmt|_ThreadID=10;_ThreadName
=main;ClassName=com.sun.enterprise.jxtamgmt.ClusterManager;MethodName=<init>;Rec
ordNumber=1;|Instance ID :urn:jxta:uuid-8C677192023A48AE9201BE8E2B2A84E44F1E44C3
D2B94DCC91BDE9E6EE77374903|#]

The time taken from the first log entry to the second is about 11 seconds
(initial entry at 10:00:42 and second entry at 10:00:53).

instance name s1 is used for the first time so I thought may be there is some
jxta platform initialization cost. But the second run has the same startup time
issue. It might be worth profiling this to find out where most time is being
spent and if some optimizations are possible.

Second Run (10 sec:
C:\shoaln\shoal\gms>run_client.bat s1
[#|2006-12-27T10:54:38
.015-0800|FINER|JxtaMgmt|JxtaMgmt|_ThreadID=10;_ThreadName
=main;ClassName=com.sun.enterprise.jxtamgmt.ClusterManager;MethodName=main;Recor
dNumber=0;|Instance Name :clients1|#]

[#|2006-12-27T10:54:48
.781-0800|FINER|JxtaMgmt|JxtaMgmt|_ThreadID=10;_ThreadName
=main;ClassName=com.sun.enterprise.jxtamgmt.ClusterManager;MethodName=<init>;Rec
ordNumber=1;|Instance ID :urn:jxta:uuid-8C677192023A48AE9201BE8E2B2A84E44F1E44C3
D2B94DCC91BDE9E6EE77374903|#]



 Comments   
Comment by shreedhar_ganapathy [ 06/Mar/07 ]

Latest checkin of jxta.jar from hamada addressed this issue. Brought down
startup time from 10 secs on my machine to 1 sec.





[SHOAL-13] MasterNode not notifying add event to own listeners when responding with MasterNodeResponse Created: 12/Jan/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: shreedharganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 13

 Description   

Issue summary says it all.
Restating it here if it gets truncated.
"MasterNode not notifying add event to own listeners when responding with
MasterNodeResponse"



 Comments   
Comment by shreedharganapathy [ 12/Jan/07 ]

Checked in fix by adding clusterViewManager.notifyListeners(cvEvent);





[SHOAL-14] MasterNode treats both master announcement and master response with the same message type Created: 12/Jan/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 14

 Description   

This causes master node responses to be treated as announcements by recipients.



 Comments   
Comment by shreedharganapathy [ 12/Jan/07 ]

Mo Checked in the following fix for this issue:

User: hamada
Date: 2007/01/12 12:32:30

Modified:
shoal/gms/src/java/com/sun/enterprise/jxtamgmt/MasterNode.java

Log:
Fixes a bug where a master node response was typed as an announcement
Hardens seq ID parsing to avoid potential bugs

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/jxtamgmt/
===========================================================

File [changed]: MasterNode.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/jxtamgmt/MasterNode.java?r1=1.14&r2=1.15
Delta lines: +24 -17
---------------------
— MasterNode.java 12 Jan 2007 19:24:13 -0000 1.14
+++ MasterNode.java 12 Jan 2007 20:32:28 -0000 1.15
@@ -95,7 +95,7 @@
private static final String CCNTL = "CCNTL";
private static final String MASTERNODE = "MN";
private static final String MASTERQUERY = "MQ";

  • private static final String NODERESPONSE = "NR";
    + private static final String MASTERNODERESPONSE = "NR";
    private static final String NAMESPACE = "MASTER";
    private static final String NODEADV = "NAD";
    private static final String ROUTEADV = "ROUTE";
    @@ -237,10 +237,15 @@
    *
  • @param masterID the MasterNode ID
  • @return a message containing a MasterResponse element
    + * @param announcement if true, creates an anouncement type message,
    otherwise it creates a response type.
    */
  • private Message createMasterResponse(final ID masterID) {
    + private Message createMasterResponse(boolean announcement, final ID masterID) {
    final Message msg = createSelfNodeAdvertisement();
  • final MessageElement el = new StringMessageElement(MASTERNODE,
    masterID.toString(), null);
    + String type = MASTERNODE;
    + if (!announcement) { + type = MASTERNODERESPONSE; + }

    + final MessageElement el = new StringMessageElement(type,
    masterID.toString(), null);
    msg.addMessageElement(NAMESPACE, el);
    LOG.log(Level.FINER, "Created a Master Response Message with masterId =
    " + masterID.toString());
    return msg;
    @@ -389,7 +394,8 @@
    msgElement = msg.getMessageElement(NAMESPACE, VIEW_CHANGE_EVENT);
    if (msgElement != null) {
    if (seqID <= clusterViewManager.getMasterViewID()) {

  • LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}." +
    + " Current sequence :{1} discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ClusterViewEvent cvEvent =
    @@ -427,7 +433,7 @@
    */
    boolean processMasterNodeResponse(final Message msg,
    final SystemAdvertisement source) throws
    IOException {
    - MessageElement msgElement = msg.getMessageElement(NAMESPACE, NODERESPONSE);
    + MessageElement msgElement = msg.getMessageElement(NAMESPACE,
    MASTERNODERESPONSE);
    if (msgElement != null) {
    LOG.log(Level.FINE, "Received a MasterNode Response from Name :" +
    source.getName());
    clusterViewManager.setMaster(source, true);
    @@ -443,7 +449,8 @@
    if (msgElement != null) {
    long seqID = getLongFromMessage(msg, NAMESPACE, MASTERVIEWSEQ);
    if (seqID <= clusterViewManager.getMasterViewID()) {
    - LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}

    ." +
    + " Current sequence :

    {1} discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ClusterViewEvent cvEvent = (ClusterViewEvent)
    @@ -486,7 +493,8 @@
    if (msgElement != null && cvEvent != null) {
    long seqID = getLongFromMessage(msg, NAMESPACE, MASTERVIEWSEQ);
    if (seqID <= clusterViewManager.getMasterViewID()) {
    - LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}." +
    + " Current sequence :{1}

    discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ArrayList<SystemAdvertisement> newLocalView =
    @@ -538,7 +546,7 @@
    if (isMaster() && masterAssigned)

    { final ClusterViewEvent cvEvent = new ClusterViewEvent( ADD_EVENT, adv); - sendNewView(cvEvent, createMasterResponse(myID), true); + sendNewView(cvEvent, createMasterResponse(false, myID), true); clusterViewManager.notifyListeners(cvEvent); }

    return true;
    @@ -620,12 +628,6 @@
    if (processChangeEvent(msg, adv))

    { return; }

    -

  • // generate the node add event
  • if (isMaster() && masterAssigned) { - final ClusterViewEvent cvEvent = new ClusterViewEvent(ADD_EVENT, adv); - sendNewView(cvEvent, createMasterResponse(myID), true); - }

    } catch (IOException e)

    { e.printStackTrace(); LOG.log(Level.WARNING, e.getLocalizedMessage()); @@ -637,7 +639,7 @@ }

private void announceMaster(SystemAdvertisement adv) {

  • final Message msg = createMasterResponse(adv.getID());
    + final Message msg = createMasterResponse(true, adv.getID());
    final ClusterViewEvent cvEvent = new ClusterViewEvent(
    ClusterViewEvents.MASTER_CHANGE_EVENT,
    adv);
    @@ -873,11 +875,16 @@
  • @param message The message to retrieve from
  • @param nameSpace The namespace of the element to get.
  • @param elemName Name of the Element.
  • * @return The long value
    + * @return The long value, -1 if element does not exist in the message
  • @throws NumberFormatException If the String does not contain a parsable int.
    */
    private static long getLongFromMessage(Message message, String nameSpace,
    String elemName) throws NumberFormatException
    Unknown macro: {+ String seqStr = message.getMessageElement(nameSpace, elemName).toString();+ if (seqStr != null) { return Long.parseLong(message.getMessageElement(nameSpace, elemName).toString()); + } else { + return -1; + } }

    }





[SHOAL-15] MasterNode sends unnecessary ADD_EVENT notifs to its listeners Created: 12/Jan/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 15

 Description   

This does not respect the mutually exclusive processing order that's in
pipeMsgEvent() method. Caused numerous Join notifications for the same members.



 Comments   
Comment by shreedharganapathy [ 12/Jan/07 ]

Mo Fixed this issue with this checkin :
User: hamada
Date: 2007/01/12 12:32:30

Modified:
shoal/gms/src/java/com/sun/enterprise/jxtamgmt/MasterNode.java

Log:
Fixes a bug where a master node response was typed as an announcement
Hardens seq ID parsing to avoid potential bugs

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/jxtamgmt/
===========================================================

File [changed]: MasterNode.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/jxtamgmt/MasterNode.java?r1=1.14&r2=1.15
Delta lines: +24 -17
---------------------
— MasterNode.java 12 Jan 2007 19:24:13 -0000 1.14
+++ MasterNode.java 12 Jan 2007 20:32:28 -0000 1.15
@@ -95,7 +95,7 @@
private static final String CCNTL = "CCNTL";
private static final String MASTERNODE = "MN";
private static final String MASTERQUERY = "MQ";

  • private static final String NODERESPONSE = "NR";
    + private static final String MASTERNODERESPONSE = "NR";
    private static final String NAMESPACE = "MASTER";
    private static final String NODEADV = "NAD";
    private static final String ROUTEADV = "ROUTE";
    @@ -237,10 +237,15 @@
    *
  • @param masterID the MasterNode ID
  • @return a message containing a MasterResponse element
    + * @param announcement if true, creates an anouncement type message,
    otherwise it creates a response type.
    */
  • private Message createMasterResponse(final ID masterID) {
    + private Message createMasterResponse(boolean announcement, final ID masterID) {
    final Message msg = createSelfNodeAdvertisement();
  • final MessageElement el = new StringMessageElement(MASTERNODE,
    masterID.toString(), null);
    + String type = MASTERNODE;
    + if (!announcement) { + type = MASTERNODERESPONSE; + }

    + final MessageElement el = new StringMessageElement(type,
    masterID.toString(), null);
    msg.addMessageElement(NAMESPACE, el);
    LOG.log(Level.FINER, "Created a Master Response Message with masterId =
    " + masterID.toString());
    return msg;
    @@ -389,7 +394,8 @@
    msgElement = msg.getMessageElement(NAMESPACE, VIEW_CHANGE_EVENT);
    if (msgElement != null) {
    if (seqID <= clusterViewManager.getMasterViewID()) {

  • LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}." +
    + " Current sequence :{1} discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ClusterViewEvent cvEvent =
    @@ -427,7 +433,7 @@
    */
    boolean processMasterNodeResponse(final Message msg,
    final SystemAdvertisement source) throws
    IOException {
    - MessageElement msgElement = msg.getMessageElement(NAMESPACE, NODERESPONSE);
    + MessageElement msgElement = msg.getMessageElement(NAMESPACE,
    MASTERNODERESPONSE);
    if (msgElement != null) {
    LOG.log(Level.FINE, "Received a MasterNode Response from Name :" +
    source.getName());
    clusterViewManager.setMaster(source, true);
    @@ -443,7 +449,8 @@
    if (msgElement != null) {
    long seqID = getLongFromMessage(msg, NAMESPACE, MASTERVIEWSEQ);
    if (seqID <= clusterViewManager.getMasterViewID()) {
    - LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}

    ." +
    + " Current sequence :

    {1} discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ClusterViewEvent cvEvent = (ClusterViewEvent)
    @@ -486,7 +493,8 @@
    if (msgElement != null && cvEvent != null) {
    long seqID = getLongFromMessage(msg, NAMESPACE, MASTERVIEWSEQ);
    if (seqID <= clusterViewManager.getMasterViewID()) {
    - LOG.log(Level.FINER, "Received an older clusterView
    sequence. discarding old view");
    + LOG.log(Level.FINER, MessageFormat.format("Received an
    older clusterView sequence {0}." +
    + " Current sequence :{1}

    discarding out of
    sequence view", seqID, clusterViewManager.getMasterViewID()));
    return true;
    }
    final ArrayList<SystemAdvertisement> newLocalView =
    @@ -538,7 +546,7 @@
    if (isMaster() && masterAssigned)

    { final ClusterViewEvent cvEvent = new ClusterViewEvent( ADD_EVENT, adv); - sendNewView(cvEvent, createMasterResponse(myID), true); + sendNewView(cvEvent, createMasterResponse(false, myID), true); clusterViewManager.notifyListeners(cvEvent); }

    return true;
    @@ -620,12 +628,6 @@
    if (processChangeEvent(msg, adv))

    { return; }

    -

  • // generate the node add event
  • if (isMaster() && masterAssigned) { - final ClusterViewEvent cvEvent = new ClusterViewEvent(ADD_EVENT, adv); - sendNewView(cvEvent, createMasterResponse(myID), true); - }

    } catch (IOException e)

    { e.printStackTrace(); LOG.log(Level.WARNING, e.getLocalizedMessage()); @@ -637,7 +639,7 @@ }

private void announceMaster(SystemAdvertisement adv) {

  • final Message msg = createMasterResponse(adv.getID());
    + final Message msg = createMasterResponse(true, adv.getID());
    final ClusterViewEvent cvEvent = new ClusterViewEvent(
    ClusterViewEvents.MASTER_CHANGE_EVENT,
    adv);
    @@ -873,11 +875,16 @@
  • @param message The message to retrieve from
  • @param nameSpace The namespace of the element to get.
  • @param elemName Name of the Element.
  • * @return The long value
    + * @return The long value, -1 if element does not exist in the message
  • @throws NumberFormatException If the String does not contain a parsable int.
    */
    private static long getLongFromMessage(Message message, String nameSpace,
    String elemName) throws NumberFormatException
    Unknown macro: {+ String seqStr = message.getMessageElement(nameSpace, elemName).toString();+ if (seqStr != null) { return Long.parseLong(message.getMessageElement(nameSpace, elemName).toString()); + } else { + return -1; + } }

    }





[SHOAL-16] Fix FindBugs issues raised in HealthMessage and NetworkManager Created: 12/Jan/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedharganapathy Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 16

 Description   

FindBugs 1.1.3 pointed out following issues:
Variables that were not marked final and
Classes define equals() method but dont define hashCode() method.



 Comments   
Comment by shreedharganapathy [ 12/Jan/07 ]

Fixes are checked in as below:
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=160
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=159





[SHOAL-17] Shoal Does not provide ability to set configuration properties Created: 31/Jan/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 17

 Description   

Although there is some work done on this, the implementation is not complete.
Hence apps cannot set their own multicast address, port, failure detection
timeouts, failure detection retries, discovery timeout and failure verification
timeouts.

This issue corresponds to issue raised in GlassFish 2265:
https://glassfish.dev.java.net/issues/show_bug.cgi?id=2265



 Comments   
Comment by shreedhar_ganapathy [ 31/Jan/07 ]

about to check in fixes.

Comment by shreedhar_ganapathy [ 31/Jan/07 ]

checked in fix per cvs log :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=196





[SHOAL-18] Request from JOnAS : Make app messages visible to senders Created: 06/Feb/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 18

 Description   

I'm a student working in a Brazilian research project about clustering with
JOnAS. We are studying some frameworks for group communication and one of them
is Shoal. This is probably a silly question, but I would appreciate it if you
could answer me. I would like to know if I can send a message to myself. I tried
to connect two hosts and exchange messages between them, but I could just
receive messages from the other machine, not from myself. I looked the source
code and I found that there's a call to local listeners in jxta before it sends
a message, but when this call is passed to shoal it is stopped (if the message
received is from the same machine that sent it). I tried to look the rest of the
code, but it got complicated for me. I read the documentation present in your
website and the examples that come with the source code, but I couldn't found
any ideas to solve my problem.

Thank you in advance


Rafael Garcia Barbosa



 Comments   
Comment by shreedhar_ganapathy [ 06/Feb/07 ]

We would make this configurable. Shoal exposes a configuration keys Object
called ServiceProviderConfigurationKeys.java

We specify a key called LOOPBACK which lets the client specify boolean value.

The default is taken as false, and if null, this is treated as false. If true,
app level messages will be received by the sender in addition to the group.

The Properties object in GroupManagementService's constructor can be used to
populate this key value pair.

Comment by shreedhar_ganapathy [ 06/Feb/07 ]

Checked in fix for this request:
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=219





[SHOAL-19] Publish JavaDoc of the APIs Created: 20/Feb/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: pinus Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 19

 Description   

Publish JavaDoc of the APIs on the web site to get an impression of the API.



 Comments   
Comment by shreedhar_ganapathy [ 20/Feb/07 ]

The javadocs are indeed published. On the right hand side of the home page there
is a link for javadocs.
Here is the actual link:
https://shoal.dev.java.net/nonav/docs/api/





[SHOAL-20] Document ".shoal" directory Created: 03/Mar/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: hamada Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 20

 Description   

This task documents the creation and use of a temporary runtime directory ".shoal"

Shoal relies on JXTA for it's group communication and organization, as result
JXTA utilizes a persistent store for each group (cluster) created for node
routes and other resources.

The directory is created at startup and contains hierarchical representation of
groups (clusters) created (joined). These directories contain a BTree store of
XML documents describing physical route, communication channels, and other
documents to facilitate cross network connectivity/deployment.
JXTA, in support of large size deployments, it utilize a persistent BTree store
with limited caching to achieve an optimized balanced between memory usage and
in-memory caching of frequently used resources. Note: When deploying a large
cluster, ensure sufficient disk space is available ~10-40KB per node.

Since Shoal defines a deterministic method to create identifiers for groups,
nodes, and communication channel, the persistent store is utilized during the
lifetime of an instance, and therefore the store is expunged during shutdown.
The ".shoal" directory should be treated as transient, and therefore, there is
no need to backup or replicate such data.



 Comments   
Comment by hamada [ 03/Mar/07 ]

previous description documented uses of ".shoal".
Closing issue





[SHOAL-23] Occassional NPE see in DistribuedStateCache sync() Created: 02/Jun/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 23

 Description   

While running com.sun.enterprise.shoal.ShoalMessagingTest, the NPE below is
being seen. This is not happening when the rungmsdemo.sh(bat) test is run.
Possible issue with some object's init not happening in time.

Jun 2, 2007 9:00:32 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow syncDSC
WARNING: Exception during DSC sync:java.lang.NullPointerException
java.lang.NullPointerException
at com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.syncCac
he(DistributedStateCacheImpl.java:423)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.syncDSC(ViewWindow.jav
a:519)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.determineAndAddNewMemb
erJoins(ViewWindow.java:236)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.analyzeMasterChangeVie
w(ViewWindow.java:209)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.analyzeViewChange(View
Window.java:193)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.newViewObserved(ViewWi
ndow.java:101)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.run(ViewWindow.java:85
)
at java.lang.Thread.run(Thread.java:595)



 Comments   
Comment by shreedhar_ganapathy [ 02/Jun/07 ]

User: shreedhar_ganapathy
Date: 2007-06-02 17:33:31+0000
Log:
Fix for Issue 23: NPE in DSC occassionally in syncCache()
GMSContext is not inited. Instead of calling the getGMSContext() method which
does the right thing, the code in question relies on the ctx variable directly.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================

File [changed]: DistributedStateCacheImpl.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/DistributedStateCacheImpl.java?r1=1.13&r2=1.14
Delta lines: +4 -4
-------------------
— DistributedStateCacheImpl.java 2007-05-30 22:46:21+0000 1.13
+++ DistributedStateCacheImpl.java 2007-06-02 17:33:28+0000 1.14
@@ -73,7 +73,7 @@
*

  • @author Shreedhar Ganapathy
  • Date: June 20, 2006
  • * @version $Revision: 1.13 $
    + * @version $Revision: 1.14 $
    */
    public class DistributedStateCacheImpl implements DistributedStateCache { private final ConcurrentHashMap<GMSCacheable, Object> cache = @@ -277,8 +277,8 @@ return retval; }

    else{

  • if(!memberToken.equals(ctx.getServerIdentityToken())){
  • MemberStates state =
    ctx.getGroupCommunicationProvider().getMemberState(memberToken);
    + if(!memberToken.equals(getGMSContext().getServerIdentityToken())){
    + MemberStates state =
    getGMSContext().getGroupCommunicationProvider().getMemberState(memberToken);
    if(state.equals(MemberStates.ALIVE)) {
    ConcurrentHashMap<GMSCacheable, Object>
    temp = new ConcurrentHashMap<GMSCacheable, Object>(cache);
    @@ -420,7 +420,7 @@
    final DSCMessage msg = new DSCMessage(temp,
    DSCMessage.OPERATION.ADDALLLOCAL.toString(),
    isCoordinator);
  • if(!memberToken.equals(ctx.getServerIdentityToken())){
    + if(!memberToken.equals(getGMSContext().getServerIdentityToken())){
    logger.log(Level.FINER, "Sending sync message from
    DistributedStateCache " +

"to member "+memberToken);





[SHOAL-24] Incorrect handling of Master Change Events in ViewWindow Created: 06/Jun/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 24

 Description   

ViewWindow does the following on occurrence of Master Change Event.
Compares new view with previous one and if new view contains less number of
members, assumes the missing member to have failed. This can cause false failure
to be reported to GMS clients when in fact the Master may not have yet
discovered the missing member.

Failure determination should be left to the underlying provider's health
monitoring service to report.



 Comments   
Comment by shreedhar_ganapathy [ 06/Jun/07 ]

User: shreedhar_ganapathy
Date: 2007-06-07 00:29:09+0000
Log:
Fix for Issue 24: Incorrect handling of master change event for failures
View Window now does not assume that when a Master Change Event's view contains
less number of members than prior view, the missing members have failed. It
leaves the failure determination to the underlying providers to notify failure.
This should address the false failure notifications we see in large clusters.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================

File [changed]: ViewWindow.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/ViewWindow.java?r1=1.15&r2=1.16
Delta lines: +2 -26
--------------------
— ViewWindow.java 2007-05-01 19:22:52+0000 1.15
+++ ViewWindow.java 2007-06-07 00:29:07+0000 1.16
@@ -30,7 +30,6 @@
import com.sun.enterprise.jxtamgmt.SystemAdvertisement;

import java.io.Serializable;
-import java.text.MessageFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
@@ -44,7 +43,7 @@
/**

  • @author Shreedhar Ganapathy
  • Date: Jun 26, 2006
  • * @version $Revision: 1.15 $
    + * @version $Revision: 1.16 $
    */
    class ViewWindow implements com.sun.enterprise.ee.cms.impl.common.ViewWindow,
    Runnable
    Unknown macro: { private GMSContext ctx;@@ -207,7 +206,6 @@ packet.getClusterView().getSize() != views.get(views.size() - 2).size()) { determineAndAddNewMemberJoins(); - determineAndAddFailureSignals(packet); } }

@@ -250,29 +248,6 @@
return tokens;
}

  • private void determineAndAddFailureSignals(final EventPacket packet) {
  • if (views.size() < 2) { - return; - }
  • final List<GMSMember> oldMembership = views.get(views.size() - 2);
  • String token;
  • for (GMSMember member : oldMembership) { - token = member.getMemberToken(); - analyzeAndFireFailureSignals(member, token, packet); - }
  • }
    -
  • private void analyzeAndFireFailureSignals(final GMSMember member,
  • final String token,
  • final EventPacket packet) {
    -
  • if (member.getMemberType().equalsIgnoreCase(CORETYPE) &&
    !getCurrentCoreMembers().contains(token)) { - logger.log(Level.INFO, "gms.failureEventReceived", token); - addFailureSignals(packet); - getGMSContext().removeFromSuspectList(token); - }
  • }
    -
    private void addPlannedShutdownSignals(final EventPacket packet) { final SystemAdvertisement advert = packet.getSystemAdvertisement(); final String token = advert.getName(); @@ -524,6 +499,7 @@ logger.log(Level.WARNING, e.getLocalizedMessage()); }

    catch (Exception e)

    { logger.log(Level.WARNING, "Exception during DSC sync:"+e); + e.printStackTrace(); }

    }
    }





[SHOAL-26] Threads don't shutdown Created: 06/Aug/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: bryon Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Windows XP
Platform: Windows


Issuezilla Id: 26
Status Whiteboard:

as91ur1-na, shoal-shark-na


 Description   

It appears that there are 3 threads that don't shutdown when trying to shutdown
Shoal; they are: ViewWindowThread, MessageWindowThread, and
com.sun.enterprise.ee.cms.impl.common.Router. These threads do not terminate
and so the process does not die when performing a shutdown. After examining the
code, it doesn't appear that the shutdown flag is ever changed to signal the
threads that a shutdown is in progress. Also, 2 of the threads don't appear to
ever be interrupted.



 Comments   
Comment by shreedhar_ganapathy [ 11/Aug/07 ]

reassigning to Sheetal to address this issue.

Comment by shreedhar_ganapathy [ 21/Aug/07 ]

lowering to p4 as its not a release stopper for GlassFish which is tracking p1s,
p2s, p3s for v2's FCS.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

Will not make it into GlassFish v2 update release.

Sheetal could you take a look at this issue and address it in time for SailFin's
feature freeze?

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 ]

Need to interrupt the ViewWindow and MessageWindow threads in GMSContext.leave() since GMSContext
starts the threads.
Router starts the SignalHandler thread and it needs to interrupt it during shutdown.

Comment by sheetalv [ 27/Oct/08 ]

re-assigning to Joe.

Comment by Joe Fialli [ 09/Sep/09 ]

tested and integrated patch submitted by Bongjae.





[SHOAL-27] java.lang.IllegalStateException is thrown with default config Created: 06/Nov/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: mbien Assignee: hamada
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All
URL: https://shoal.dev.java.net/servlets/ReadMsg?list=users&msgNo=55


Issuezilla Id: 27
Status Whiteboard:

as91ur1-na


 Description   

GMSFactory.startGMSModule(nodeName, groupName,
GroupManagementService.MemberType.CORE, null)

throws:

Exception in thread "main" java.lang.IllegalStateException: Must specify
rendezvous if 'useOnlySeeds' is enabled and configured as client
at net.jxta.impl.protocol.RdvConfigAdv.getDocument(RdvConfigAdv.java:523)
at
net.jxta.platform.NetworkConfigurator.getPlatformConfig(NetworkConfigurator.java:1778)
at com.sun.enterprise.jxtamgmt.NetworkManager.start(NetworkManager.java:397)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:145)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:129)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:309)
at net.java.fishfarm.GridNodeController.startNode(GridNodeController.java:89)
...

this is a regression over Shoal 1.0



 Comments   
Comment by shreedhar_ganapathy [ 06/Nov/07 ]

assigning it hamada for better handling.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

Will not make it into GlassFish v2 Update release.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

..

Comment by sheetalv [ 01/Feb/08 ]

tried passing null for properties in the GMSFactory.startGMSModule() API as
mentioned below in ApplicationServer test. Could not reproduce the exception
mentioned below.





[SHOAL-29] API to Signal to group that each member's application is ready to start operations Created: 15/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 29

 Description   

Parent applications employing Shoal typically using
GMSFactory.startGMSModule(...) API might find it limiting that Shoal's group
JoinNotificationSignal signifies the application employing it to be ready to
start its operations. This is particularly the case when products that have
sequences of services that have to be started that need to be part of a group
early on but also need the ability to know when the app is ready to be operated
upon.
For instance, where a load balancer is employing Shoal as the health monitoring
system by participating in it as a SPECTATOR member, and the cluster it is load
balancing also uses Shoal, the LB needs to know when the instances are ready to
accept requests.

Further, appserver instances in the cluster may want to know when an instance is
in ready state so that operations such as say, data replication can occur.
This RFE is for a JoinedAndReadyNotificationSignal which would signify to all
members of the group that the member is not only joined the group but is also
ready to start operations.

Additionally, the JoinNotificationSignal requires an additional API to return
the joined member's health state through a method getMemberState(). This would
cover for cases where an instance has already sent out a
JoinedAndReadyNotificationSignal and another instance needs to know the health
state of this member.
The state machine of an instance's startup should be starting, ready and then
alive. Both ready and alive signify an instance's ready and available state.



 Comments   
Comment by shreedhar_ganapathy [ 15/Jan/08 ]

Sheetal and I have checked in this feature through various cvs checkins.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=503
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=497
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=496
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=495
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=494
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=492
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=485
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=469

We may need to expose an API that allows applications to query the state of
members through GroupHandle.





[SHOAL-30] Health Monitor's getState returns state of members as DEAD if health messages have not yet started Created: 19/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 30

 Description   

getState in HealthMonitor needs to make appropriate assumptions regarding
members who are in clusterViewManager but not yet sent out a health message.
Although this is a small window of time, members state is shown as DEAD at this
time. This happens even for local peer's getState call.
Fix is understood.



 Comments   
Comment by shreedhar_ganapathy [ 19/Jan/08 ]

fix checked in
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=514





[SHOAL-31] Health Monitor's reportJoinedAndReadyState doesnt send local cluster view event in master node Created: 19/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 31

 Description   

In Health Monitor, the reportJoinedAndReadyState method does not send out a
local clusterViewEvent notification when the sender is the assigned Master. This
is a useful notification even in the case where master node's consuming
application requires such a notification.

Fix is understood



 Comments   
Comment by shreedhar_ganapathy [ 19/Jan/08 ]

Fix checked in
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=514





[SHOAL-33] test for DSCMessage Created: 23/Jan/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 33

 Description   

DSCMessages should also be sent P2P. This needs to be checked.






[SHOAL-34] Join notif states different for master and other members Created: 23/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 34

 Description   

(Reported in Sailfin) when an instance comes up, the master instance get JOIN
notification with member's state as ALIVE. However other instances get JOIN
notification with member's state as STARTING.
This needs to be looked into.



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

There is an issue in Sailfin for the same :
https://sailfin.dev.java.net/issues/show_bug.cgi?id=420

Comment by sheetalv [ 28/Feb/08 ]

It is OK for instance A to see instance B's state as ALIVE while instance C sees
instance B's state as STARTING or READY. All these 3 states are considered
healthy for an instance for which a Join notif is sent out.
A fix has gone into Shoal workspace to make sure that the most up-to-date state
is returned back via LWRMulticast. A Shoal integration will make the fix
available in Sailfin.

change log at :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=535





[SHOAL-35] member state shld not returned as DEAD for a member not in view Created: 23/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 35

 Description   

if a member is not in the view, the state should be returned as UNKNOWN and not
as DEAD.
Also one last check needs to be done for the state before returning (state could
have changed just before returning) so that the right state is returned.



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

The HealthMonitor.getState() has been modified to fix this.





[SHOAL-37] expose API to determine if group is shutting down Created: 30/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 37

 Description   

Need to add an API :

isGroupShuttingDown() in GroupManagementServiceImpl which will call
ShutdownHelper.isGroupBeingShutdown().

This API can be used by GMS clients like in-memory replication so that replication can be done before
instance shuts down.

Add test to query this API ( should return right info if not shutting down).



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

This has been fixed.
See following check ins :

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=519
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=520
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=521
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=522
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=523
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=524

A test has been added as well.





[SHOAL-38] HealthMonitoring support for hardware/network failures avoiding TCP timeouts Created: 01/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 38

 Description   

With a hardware or network failure, current JxtaMgmt provider's HealthMonitor
will go into TCP timeout which on certain systems can be as long as 10 minutes.
Need a timeout based mechanism to allow applications to configure a timeout
after which a TCP socket connection based liveness check should terminate and
assign the member as failed. This is needed to provide robustness in the face of
hardware failures.

Fix for this needs to come from JXTA for SailFin as it is a critical req for
Ericsson.



 Comments   
Comment by shreedhar_ganapathy [ 24/Jun/08 ]

Sheetal has integrated a fix into the trunk wrt this feature. The feature allows
health monitoring to report a failure when a failure detection related tcp
connection is blocked for a configured timeout (set to 30 seconds default).

The timeout is configured using the FAILURE_DETECTION_TCP_RETRANSMIT_TIMEOUT and
FAILURE_DETECTION_TCP_RETRANSMIT_PORT properties specified in
ServiceProviderConfigurationKeys.java.

Javadoc corresponding to these properties are as follows:
FAILURE_DETECTION_TCP_RETRANSMIT_PORT
This value of this key is a port common to all cluster members where
a socket will be attempted to be created when a particular instance's configured
periodic heartbeats have been missed for the max retry times.

FAILURE_DETECTION_TCP_RETRANSMIT_TIMEOUT
Maximum time that the health monitoring protocol would wait for a
reachability query to block for a response.





[SHOAL-39] HealthMonitor should report if the Network Interface of the local peer is down Created: 01/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 39

 Description   

This will provide an additional layer of failure reporting which will help
diagnose problems for customers.

JDK 6 provides a facility for this such as the NetworkInterface API.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

Changing to Enhancement





[SHOAL-40] (User Feedback) : provide ability to choose network interface on which to have group communication Created: 01/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 40

 Description   

As requested by John Kim in Shreedhar's blog entry :
http://blogs.sun.com/shreedhar/entry/sailfin_drives_a_new_feature

Need to expose a configuration in Shoal and support in JXTA.



 Comments   
Comment by hamada [ 01/Feb/08 ]

This is already supported in JXTA. A new property constant needs to be defined
in JxtaConfigConstants TCPADDRESS which is turn passed to NetworkManager and set
in NetworkManager.startDomain()

config.setTcpInterfaceAddress(TCPADDRESS);

Comment by sheetalv [ 11/Feb/08 ]

Fix for this issue has been checked in.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=530





[SHOAL-41] Add support in Shoal for passing in cert stores Created: 11/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 41

 Description   

Jxta and other service providers provide a notion of encryption through various
means. The Properties object that passes in configurational data to service
provider backends should pass in a certstore to the Jxta service provider so
that end to end security can be optionally provided.






[SHOAL-44] accessing JXTA's System ADV information or equivalent Created: 19/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: mbien Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 44

 Description   

JXTA stores in the system advertisement a lot of useful and never changing
information about the node's runtime environment. It would be great if Shoal
would provide this kind of immutable "node info" additional to the mutable "node
details" (DistributedStateCache).

proposed public API changes:
-node info getter in the GMS
-node info getter in Signal
-mechanism for adding custom values on node join

workaround with DistibutedStateCache possible
but:
-redundant communication
-values are not guaranteed to arrive at the same time



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-45] MessageWindow NullPointerException handling GMSMessage Created: 17/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: garyfeltham Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 45
Status Whiteboard:

shoal-shark-na


 Description   

GroupHandle#sendMessage(String targetComponentName, byte[] message) contract
states that "Specifying a null component name would result in the message being
delivered to all registered components" on forwarding a message such as

sendMessage(null, "Example message".getBytes());

a NullPointerException is thrown from MessageWindow#handleGMSMessage where

if(gMsg.getComponentName().equals(GMSConstants.shutdownType.GROUP_SHUTDOWN.toString()))

gMsg.getComponentName() is null.

A fix is:

if (gMsg.getComponentName()!=null &&
gMsg.getComponentName().equals(GMSConstants.shutdownType.GROUP_SHUTDOWN.toString()))



 Comments   
Comment by shreedhar_ganapathy [ 18/Mar/08 ]

Thanks for bringing this to our attention. We will add a test case for this
situation.
Reassigning to Sheetal for fix.

Comment by sheetalv [ 28/Jul/08 ]

not important for Sailfin 0.5

Comment by sheetalv [ 25/Sep/08 ]

Fixed in Shoal trunk.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=646





[SHOAL-46] .shoal directory size increases dangerously Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 46

 Description   

Hi from Sweden,

After sending a number of GMS messages, i found out that there is a catalog
named .shoal in the locatio where i run the JVM.
This catalog i several cases exceeded 300 MBs in size which may lead to fill the
file system.

Please note that i am testing shoal on java 6 patch 4, using the libraries from
sailfin v1 b22, not sailfin.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

Thanks for filing this issue. I will look into this issue asap.

Comment by shreedhar_ganapathy [ 18/Mar/08 ]

While we investigate this issue, could you try the latest shoal and jxta jars
that have now been integrated into Sailfin build 24a. For your reference, the
following contains those specific jars :
https://shoal.dev.java.net/files/documents/5135/89898/shoal-1.1_03132008.zip

Comment by shreedhar_ganapathy [ 18/Mar/08 ]

Hi Babbis,
While we investigate this issue, could you try the latest shoal and jxta jars
that have now been integrated into Sailfin build 24a. For your reference, the
following contains those specific jars :
https://shoal.dev.java.net/files/documents/5135/89898/shoal-1.1_03132008.zip

Thanks
Shreedhar

Comment by shreedhar_ganapathy [ 20/Mar/08 ]

Based on the following email from Babbis, the issue is resolved with the 1.1 bits
==
Hi,

This fix worked fine ! No file leaks were after heavy traffic.
I have tested the fix on v1-b25.

Thanks

Babbis
==





[SHOAL-47] Gracefully shutdown does not work, but also creates sideeffects Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 47

 Description   

Hi from Sweden,

I found out that GMS gracefull shutdown,

//leaves the group gracefully
gms.shutdown(GMSConstants.shutdownType.INSTANCE_SHUTDOWN);

does not work, but also has some serious sideeffects :

  • first i get an exception

Exception in thread "MessageWindowThread" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)

  • afterwards, all other group members will see the member that called shutdown()
    permanently in the group member list . This means that the member will appear
    in member list for ever (until all members and their processes terminate).

This have been verified both though out prints and by calling

List<String> members = groupHandle.getAllCurrentMembers();

On the other hand if the members JVM terminates (not gracefull shutdown) the
member is correctly removed.

Please note that this is verified on java 6 patch 4 and the libraries from
sailfin v1 b22. Please also note that i am not running sailfin/glassfish, just
plain JVM.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

This issue has already been filed under issue 48.

      • This issue has been marked as a duplicate of 48 ***
Comment by sheetalv [ 18/Mar/08 ]

This issue has been reopened to track the second part of the description.

Comment by shreedhar_ganapathy [ 20/Mar/08 ]

Based on the following email snippet from Babbis, the issue is resolved with the
1.1 bits
==
The problem with gratefull shutdow were apparent in pure v1-b25, but were fixed
when the pach were applyied.
(this was actually a big issue when we e g wanted to replace cards in a cluster
and the subscriber on the removed server instance where never removed from the
group).
So you can remove issue 48 also.

==





[SHOAL-48] Gracefully shutdown does not work, but also creates sideeffects Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 48

 Description   

Hi from Sweden,

I found out that GMS gracefull shutdown,

//leaves the group gracefully
gms.shutdown(GMSConstants.shutdownType.INSTANCE_SHUTDOWN);

does not work, but also has some serious sideeffects :

  • first i get an exception

Exception in thread "MessageWindowThread" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)

  • afterwards, all other group members will see the member that called shutdown()
    permanently in the group member list . This means that the member will appear
    in member list for ever (until all members and their processes terminate).

This have been verified both though out prints and by calling

List<String> members = groupHandle.getAllCurrentMembers();

On the other hand if the members JVM terminates (not gracefull shutdown) the
member is correctly removed.

Please note that this is verified on java 6 patch 4 and the libraries from
sailfin v1 b22. Please also note that i am not running sailfin/glassfish, just
plain JVM.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

Thanks for filing this issue. I will look into it.

Comment by sheetalv [ 18/Mar/08 ]
      • Issue 47 has been marked as a duplicate of this issue. ***
Comment by shreedhar_ganapathy [ 20/Mar/08 ]

The problem with gratefull shutdow were apparent in pure v1-b25, but were fixed
when the pach were applyied.
(this was actually a big issue when we e g wanted to replace cards in a cluster
and the subscriber on the removed server instance where never removed from the
group).
So you can remove issue 48 also.





[SHOAL-49] Provide a converse to JoinedAndReady when consuming app did not get ready Created: 25/Mar/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 49

 Description   

When the consuming application or product is ready to process its operations, it
can use Shoal's new JoinedAndReady reporting facility to let group members know
of this state.
The converse state of this situation may be a valuable piece of information for
administrative or monitoring applications.

If the application could not get into the joined and ready state for any reason
(for instance, an application server consuming Shoal could not complete its
startup and failed midway), then such an unready state can be conveyed through a
notification that specifically identifies this state.

Need an appropriate name for such a notification so it is meaningful.



 Comments   
Comment by shreedhar_ganapathy [ 25/Mar/08 ]

..





[SHOAL-53] MSG LOSS: MISSING Failure Events in GMS/glassfish Created: 30/Apr/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sviveka Assignee: sheetalv
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 53

 Description   

Build : b31
setup : 9 instaces cluster

Kill one of 9 instances randomly
sleep for 30 seconds
search individual instance logs for FAILURE_EVENT
Almost once in four failures one or more instances failed to receive the FAILURE
notification.



 Comments   
Comment by sviveka [ 30/Apr/08 ]

ccing Sheetal

Comment by shreedhar_ganapathy [ 30/Apr/08 ]

..

Comment by sviveka [ 09/May/08 ]

Not reproducible





[SHOAL-54] gms.getGroupHandle().getGroupLeader() throws NullPointerException Created: 03/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: leehui Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Windows XP
Platform: Windows


Issuezilla Id: 54
Status Whiteboard:

shoal-shark-na


 Description   

If invoke gms.getGroupHandle().getGroupLeader() at once after gms.join(), the
application will throw NullPointerException occasionally, especially when you
run the application in console from command line.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

assigning to myself





[SHOAL-56] Document Configuration Settings Available to users Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 56

 Description   

We need to document configuration settings that are available to users with
clear explanations on what they are and in some cases under what circumstances
these should be used.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

marking as Enhancement

Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-57] Provide a JMX MBean to list and configure configuration settings Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 57

 Description   

A JMX MBean to list and configure Shoal's providers would be very useful from a
management standpoint.

Additionally, this MBean could also provide runtime statistics ranging from
number of views, current views to request/response metrics.
Adding a placeholder RFE for this purpose.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-60] when new member joins existing group, this member can't receive join notifications of others that already joined Created: 10/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Source File SimpleJoinTest.java    
Issuezilla Id: 60

 Description   

Assuming that "A", "B" and "C" are members in "TestGroup". Sometimes when new
member join, this member can't receive join notifications of others that
already joined.

This scenario is following(Assume that members will join the same group
according to the order, not concurrently).

1. First, "A" joined and became a group leader.
2. after 1, "B" joined. Then "B" received "A"'s a join notification and own
("C") join notification in "B". No problem.
3. after 2, "C" joined. At this time, "C" must receive "A", "B" and "C" join
notifications in "C". But "C" didn't receive "B"'s a join notification.

Like above, assuming that "A", "B", "C" and "D" are members in "TestGroup", "D"
didn't receive "B" and "C"'s join notifications.

above 3, when new member joined, this member don't receive some members' join
notifications.(In other words, this member receives only own notification and
group leader's notification)

You can also see this result from following logs.

"A"(the group leader): member id="6a92713c-d83e-49a8-8aaa-ad12046a1acb"
"B": member id="77ff0a1c-b9a1-417a-b04c-0028ef6da921"
"C": member id="6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a"
When memebers receive a join notification, "***JoinNotification received:
ServerName = [MY_MEMBER_ID], Signal.getMemberToken() = [MEMBER_ID]" printed.

["A"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:17 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 6a92713c-d83e-49a8-8aaa-ad12046a1acb
group:TestGroup
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:36:18 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103

2008. 6. 5 오후 1:36:18 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:36:44
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a92713c-d83e-49a8-8aaa-
ad12046a1acb, Signal.getMemberToken() = 77ff0a1c-b9a1-417a-b04c-0028ef6da921
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:03
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a92713c-d83e-49a8-8aaa-
ad12046a1acb, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------

["B"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 77ff0a1c-b9a1-417a-b04c-0028ef6da921
group:TestGroup
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 6a92713c-d83e-49a8-8aaa-ad12046a1acb
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:36:41
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 77ff0a1c-b9a1-417a-b04c-0028ef6da921
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------

["C"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
group:TestGroup
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a8e7161-92ef-4b9e-a5e1-
d9a8c7665b4a, Signal.getMemberToken() = 6a92713c-d83e-49a8-8aaa-ad12046a1acb
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a8e7161-92ef-4b9e-a5e1-
d9a8c7665b4a, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------
"C"'s log don't have "B"'s join notification(77ff0a1c-b9a1-417a-b04c-
0028ef6da921).



 Comments   
Comment by carryel [ 10/Jun/08 ]

Created an attachment (id=7)
I attached a simple test code

Comment by carryel [ 10/Jun/08 ]

This is normal case. e.g) member "B"'s behavior
1. When new member("C") joins the group, the group leader(master) sends
MASTERNODERESPONSE to group members with ADD_EVENT(about "C") and own view's
snapshot finally.
2. Members receive MASTERNODERESPONSE and process processMasterNodeResponse().
3. In processMasterNodeResponse(), ADD_EVENT notified with master view's
snapshot by ClusterViewManager.
4. Then, ViewWindow analyzes the event packet(ADD_EVENT).
5. Finally, members receive a join notification about new member(about "C").

But In new memeber("C"), some problem occurred. There is no logic about
notifying other members' ADD_EVENT(about "B")
1. When new member("C") joins the group, the group leader(master) sends
MASTERNODERESPONSE to group members with ADD_EVENT and own view's snapshot
finally.[same above]
2. "C" receive MASTERNODERESPONSE and process processMasterNodeResponse()[same
above]
3. In processMasterNodeResponse(), MASTER_CHANGE_EVENT notified without master
view's snapshot because current master is self.
4. Then, ViewWindow analyzes the event packet(MASTER_CHANGE_EVENT). Of course
when ViewWindow receives MASTER_CHANGE_EVENT, ViewWindow notifies join
notifications based on view history if previous view doesn't have any members.
Maybe this is the logic for notifying other members' join notifications in new
member("C"). But current view based on event packet(MASTER_CHANGE_EVENT) is not
master view unfortunately. Current view has only "C"'s local view(currently
only master member and own member added). So only master's join notification
occurred.
5. In processMasterNodeResponse(), ADD_EVENT notified with master view's
snapshot by ClusterViewManager.[same above]
6. Then, ViewWindow analyzes the event packet(ADD_EVENT).[same above]
7. new member("C") receives own join notification.[same above]

So, I think this problem can be fixed if MASTER_CHANGE_EVENT notified with
master view's snapshot above 3. Then above 4, ViewWindow can find that previous
view doesn't have other members as well as master member. And then above 5, In
processMasterNode(), ClusterViewManager can notifies only ADD_EVENT without
master view's snapshot because MASTER_CHANGE_EVENT included master view's
snapshot already notified.

Comment by carryel [ 10/Jun/08 ]

This is now resolved.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=595





[SHOAL-62] sometimes, sending messages to a member failed though the member is still alive. Created: 12/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: leehui Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 62
Status Whiteboard:

shoal-shark-na


 Description   

Assuming that there are two nodes "A" and "B". "A" starts multiple threads to
send messages to "B", while "B" just receives messages from "A". Sometimes, "A"
throws ArrayIndexOutOfBoundsException and reports that "B" is not is not in its
view anymore though "B" is still alive. Use
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
to start two instances. In a litter while, "A" prints:

java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at sun.security.provider.DigestBase.engineUpdate
(DigestBase.java:102)
at sun.security.provider.SHA.implDigest(SHA.java:94)
at sun.security.provider.DigestBase.engineDigest
(DigestBase.java:161)
at sun.security.provider.DigestBase.engineDigest
(DigestBase.java:140)
at java.security.MessageDigest$Delegate.engineDigest
(MessageDigest.java:531)
at java.security.MessageDigest.digest(MessageDigest.java:309)
at java.security.MessageDigest.digest(MessageDigest.java:355)
at com.sun.enterprise.jxtamgmt.NetworkManager.hash
(NetworkManager.java:222)
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerGroupID
(NetworkManager.java:272)
at
com.sun.enterprise.jxtamgmt.NetworkManager.getInfraPeerGroupID
(NetworkManager.java:362)
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
(NetworkManager.java:261)
at com.sun.enterprise.jxtamgmt.ClusterManager.getID
(ClusterManager.java:662)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage
(GroupCommunicationProviderImpl.java:226)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl.sendMessage
(GroupHandleImpl.java:131)
at MultiThreadMessageSender$1.run
(MultiThreadMessageSender.java:52)
at java.lang.Thread.run(Thread.java:595)
2008-6-12 11:20:21
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl sendMessage
警告: GroupHandleImpl.sendMessage : Could not send message :
Member B is not in the View anymore. Hence not performing sendMessage operation



 Comments   
Comment by leehui [ 19/Jun/08 ]

The root cause, please see
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=79

And some fix suggestions, please see
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=81
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=84

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 27/Aug/08 ]

added a synchronized block in NetworkManager.hash() as recommended
in july timeframe.





[SHOAL-63] When I invoke GMSFactory.startGMSModule(...), some NPEs are occurred Created: 26/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 63

 Description   

When I invoke GMSFactory.startGMSModule(...), some NPEs are occurred.

  • NPE List
    1. When group name is null
    -------------------------
    Exception in thread "main" java.lang.NullPointerException
    at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerGroupID
    (NetworkManager.java:272)
    at com.sun.enterprise.jxtamgmt.NetworkManager.getInfraPeerGroupID
    (NetworkManager.java:362)
    at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
    (NetworkManager.java:261)
    at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:562)
    at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:194)
    at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:151)
    at
    com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
    upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
    at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
    at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
    (GroupManagementServiceImpl.java:339)
    at com.sun.enterprise.shoal.groupleadertest.GroupLeaderTest.main
    (GroupLeaderTest.java:67)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
    -------------------------

2. When server token is null
-------------------------
Exception in thread "main" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
(NetworkManager.java:261)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:562)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:194)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:151)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServiceImpl.java:339)
at com.sun.enterprise.shoal.groupleadertest.GroupLeaderTest.main
(GroupLeaderTest.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
-------------------------

3. When properties is null
-------------------------
Exception in thread "main" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:161)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServiceImpl.java:331)
at com.sun.enterprise.shoal.jointest.SimpleJoinTest.runSimpleSample
(SimpleJoinTest.java:40)
at com.sun.enterprise.shoal.jointest.SimpleJoinTest.main
(SimpleJoinTest.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
-------------------------



 Comments   
Comment by carryel [ 26/Jun/08 ]

These are resolved.

https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/e
e/cms/core/GMSFactory.java?r1=1.5&r2=1.6

https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/j
xtamgmt/ClusterManager.java?r1=1.37&r2=1.38





[SHOAL-64] add AtomicBoolean for controlling the started variable Created: 26/Jun/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 64

 Description   

Same problem in both ClusterManager and HealthMonitor 's start().
Make sure that the AtomicBoolean is set in the beginning of the start() method.






[SHOAL-66] Join Notification Signals of own join is not notified Created: 27/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 66

 Description   

Mike Wannabaker reported the following :
==========
I believe it’s a bug then. In both accounts. My SERVER-1 only gets a SERVER-1
message and SERVER-2 only gets a SERVER-1 message.

So when I say it only gets a SERVER-1 message I mean that the method

public void processNotification(Signal p_Signal)

is only being called with that message.

So if I start just SERVER-1, I see the GMS View Changed message, with just the
SERVER-1 in it, but my processNotification(…) is not called. Not until I start
SERVER-2 does it get called.

On SERVER-2, I see the original GMS View Changed with just SERVER-2, and then
GMS View Changed with SERVER-1,SERVER-2, but only get one processNotification(…)
call.

I will investigate further next week, but if you could have a look that would be
great. Is no one else seeing this?

This is my processNotification() method

public void processNotification(Signal p_Signal)
{
try {
p_Signal.acquire();
SignalLogger log = new SignalLogger(p_Signal);
log.logIt();
if(p_Signal instanceof MessageSignal) {
MessageSignal msgSig = (MessageSignal)p_Signal;
String sMember = msgSig.getMemberToken();
Object o = ObjectUtil.toObject(msgSig.getMessage());
if(o instanceof SMessage)

{ SMessage smsg = (SMessage)o; InetAddress sender = m_hmMembers.get(sMember).address; smsg.setSender(sender); SMessageLogger.log.systemInfo(getClass(), "FireMessage: " + smsg); fireMessageReceived(smsg); //fireMessageReceived(smsg); }

else

{ SMessageLogger.log.systemInfo(getClass(), "Message is NOT SMessage??"); }

}
else if(p_Signal instanceof JoinNotificationSignal)

{ JoinNotificationSignal joinSig = (JoinNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof JoinedAndReadyNotificationSignal)

{ JoinedAndReadyNotificationSignal joinSig = (JoinedAndReadyNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureSuspectedSignal)

{ FailureSuspectedSignal suspectSig = (FailureSuspectedSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureRecoverySignal)

{ FailureRecoverySignal failureSig = (FailureRecoverySignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureNotificationSignal)

{ FailureNotificationSignal failureSig = (FailureNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof PlannedShutdownSignal)

{ PlannedShutdownSignal shutdownSig = (PlannedShutdownSignal)p_Signal; processClusterNotification(); }

else

{ SMessageLogger.log.debug(getClass(), "Received Notification of type : " + p_Signal.getClass().getName() + " Server: " + p_Signal.getMemberToken()); }

}
catch(SignalAcquireException e)

{ SMessageLogger.log.fatal(getClass(), "Exception occured while acquiring signal", e); }

finally {
try

{ p_Signal.release(); }

catch(SignalReleaseException e)

{ SMessageLogger.log.warn(getClass(), "Exception occured while releasing signal" , e); }

}
}
From: Shreedhar.Ganapathy@Sun.COM Shreedhar.Ganapathy@Sun.COM
Sent: June 27, 2008 11:21 AM
To: users@shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working

Hi Mike
The expected behavior is that as each server starts, its registered GMS client
components will be notified of the server's own joining the group and any
subsequent joins of other members.
So in essence, server-1 GMS clients should see a JoinNotificationSignal for
server-1, and another for server-2
and in server-2, GMS clients should see a JoinNotificationSignal for server-2
and another for server-1.
The order here does not matter but correctness is important and if not its a bug
to be fixed.

In the log below, Server-1 seems to be getting its own JoinNotificationSignal
which is correct. Does it ever get the JoinNotificationSignal for server-2?
On server-2, I am seeing correct behavior.

(Ignore the log statements that show the view contents, as that is an event
coming from the provider implementation - GMS notification signals are the ones
that GMS clients should look in for correctness).

Let me know.
Thanks
Shreedhar

Mike Wannamaker wrote:

Hi Guys,

I’m still not sure it’s working as it’s supposed to? But maybe it is?

Start SERVER-1

Start SERVER-2

On SERVER-1 I get a JoinMessage but it is from SERVER-1?

On SERVER-2 I get a Join Message from SERVER-1, which is what I
would expect?

Is this correct? This depends on when the two servers are started. If I wait
for a period between startups I get SERVER-2 startup message on SERVER-1 and
SERVER-1 startup message on SERVER-2. But if I start them both at the same time
I get the above behaviour?

Starting both at virtually the same time I get …

SEVER-1 Output:

27-Jun-2008 12:40:29 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:29 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event : ADD_EVENT

27-Jun-2008 12:40:38 AM DEBUG [pool-1-thread-1]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1 >>
JoinNotificationSignalImpl @ 27/06/08 12:40 AM - [RCS_CLUSTER]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])

Server-2 Output

27-Jun-2008 12:40:30 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

27-Jun-2008 12:40:30 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event : ADD_EVENT

27-Jun-2008 12:40:44 AM DEBUG [pool-1-thread-1]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1 >>
JoinNotificationSignalImpl @ 27/06/08 12:40 AM - [RCS_CLUSTER]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
============



 Comments   
Comment by shreedhar_ganapathy [ 27/Jun/08 ]

Awaiting Mike's confirmation to see if the fix is good.

Comment by sheetalv [ 09/Jul/08 ]

assigning to self

Comment by sheetalv [ 28/Jul/08 ]

Shreedhar fixed the issue in Shoal trunk. Now integrated into Sailfin 0.5.

Comment by sheetalv [ 28/Jul/08 ]

cvs message entry :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=612





[SHOAL-67] NPE seen in ClusterManager Line 161 while executing Shoal Sample Created: 28/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 67

 Description   

Multiple users have pointed out this NPE. Cause is known and a fix is ready to
be checked in.
Thanks to David Taylor for the following stack trace :
init:
deps-jar:
compile:
run:
Jun 27, 2008 2:59:38 PM SimpleGMSSample runSimpleSample
INFO: Starting SimpleGMSSample....
Jun 27, 2008 2:59:38 PM SimpleGMSSample initializeGMS
INFO: Initializing Shoal for member: server1214603978093 group:Group1
Jun 27, 2008 2:59:38 PM SimpleGMSSample registerForGroupEvents
INFO: Registering for group event notifications
Jun 27, 2008 2:59:38 PM SimpleGMSSample joinGMSGroup
INFO: Joining Group Group1
Exception in thread "main" java.lang.NullPointerException
at
com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:162)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializ
eGroupCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at
com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupM
anagementServiceImpl.java:331)
at SimpleGMSSample.joinGMSGroup(SimpleGMSSample.java:76)
at SimpleGMSSample.runSimpleSample(SimpleGMSSample.java:46)
at SimpleGMSSample.main(SimpleGMSSample.java:25)

Also, Mike W provided the following stack trace in his email :
Also has anyone run the tests lately? GroupLeaderTest fails,

looks as though ClusterManager line 161

è this.bindInterfaceAddress =
(String)props.get(JxtaConfigConstants.BIND_INTERFACE_ADDRESS.toString());

requires the BIND_INTERFACE_ADDRESS property to be given or throws null pointer?

--ekiM



 Comments   
Comment by shreedhar_ganapathy [ 28/Jun/08 ]

Relevant Checkins
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=611

And

User: shreedhar_ganapathy
Date: 2008-06-28 15:06:07+0000
Modified:
shoal/gms/src/java/com/sun/enterprise/jxtamgmt/ClusterManager.java

Log:
Fix for issue 67 : NPE in setting Bind interface address in ClusterManager Line
161: Added check for empty Properties object.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/jxtamgmt/
===========================================================

File [changed]: ClusterManager.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/jxtamgmt/ClusterManager.java?r1=1.38&r2=1.39
Delta lines: +7 -24
--------------------
— ClusterManager.java 2008-06-26 12:56:11+0000 1.38
+++ ClusterManager.java 2008-06-28 15:06:05+0000 1.39
@@ -36,41 +36,23 @@

package com.sun.enterprise.jxtamgmt;

-import static com.sun.enterprise.jxtamgmt.JxtaUtil.getObjectFromByteArray;
import com.sun.enterprise.ee.cms.core.MemberNotInViewException;
-import net.jxta.document.AdvertisementFactory;
-import net.jxta.document.MimeMediaType;
-import net.jxta.document.StructuredDocument;
-import net.jxta.document.StructuredDocumentFactory;
-import net.jxta.document.XMLDocument;
-import net.jxta.endpoint.ByteArrayMessageElement;
-import net.jxta.endpoint.EndpointAddress;
-import net.jxta.endpoint.Message;
-import net.jxta.endpoint.MessageElement;
-import net.jxta.endpoint.TextDocumentMessageElement;
+import static com.sun.enterprise.jxtamgmt.JxtaUtil.getObjectFromByteArray;
+import net.jxta.document.*;
+import net.jxta.endpoint.*;
import net.jxta.exception.PeerGroupException;
import net.jxta.id.ID;
import net.jxta.impl.endpoint.tcp.TcpTransport;
import net.jxta.impl.pipe.BlockingWireOutputPipe;
import net.jxta.peer.PeerID;
import net.jxta.peergroup.PeerGroup;
-import net.jxta.pipe.InputPipe;
-import net.jxta.pipe.OutputPipe;
-import net.jxta.pipe.PipeMsgEvent;
-import net.jxta.pipe.PipeMsgListener;
-import net.jxta.pipe.PipeService;
+import net.jxta.pipe.*;
import net.jxta.protocol.PipeAdvertisement;
import net.jxta.protocol.RouteAdvertisement;

import java.io.IOException;
import java.io.Serializable;
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.Hashtable;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
+import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.logging.Level;
import java.util.logging.Logger;
@@ -158,8 +140,9 @@
LOG.log(Level.WARNING, ioe.getLocalizedMessage());
}
NetworkManagerRegistry.add(groupName, netManager);

  • if ( props != null )
    + if(props !=null && !props.isEmpty()) { this.bindInterfaceAddress = (String)props.get(JxtaConfigConstants.BIND_INTERFACE_ADDRESS.toString()); + }

    systemAdv = createSystemAdv(netManager.getNetPeerGroup(), instanceName,
    identityMap, bindInterfaceAddress);
    LOG.log(Level.FINER, "Instance ID :" + getSystemAdvertisement().getID());
    this.clusterViewManager = new
    ClusterViewManager(getSystemAdvertisement(), this, viewListeners);





[SHOAL-69] GroupHandle.raiseFence() needs to throw exception if fence is already raised. Created: 20/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 69

 Description   

Based on Bongjae Chang's email to dev alias:
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=103

raiseFence() method in GroupHandle does not throw an exception if the fence is
already raised but quietly returns and only succeeds when a fence is not raised.



 Comments   
Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Fixed per cvs message
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=630





[SHOAL-70] Exception occurs showing Member no longer in the group, when sending messages to an alive member Created: 20/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 70

 Description   

When sending messages, the call to GroupCommunicationProviderImpl.sendMessage()
with a specified target member token, checks to see if this member continues to
exist in the ClusterViewManager's ClusterView. This check is done by getting the
memberToken String's corresponding peer id by calling
clusterManager.getID(targetMemberIdentityToken)

This results in call to NetworkManager.getPeerID() each time.
Multiple calls to NetworkManager.getPeerID() passing in the exact same parameter
for instanceName returns a different Jxta UUID over a period of time.

This is a problem in itself as the consistent hashing algorithm is expected to
guarantee exact same uuid generation for a given constant seed.

Using Leehui's MultiThreadMessageSender test, it is easy to see this problem as
it reports frequently that a target instance is no longer in the group while
sending a message as the call has resulted in a new uuid that does not exist in
the cluster view manager for that known target member token.

To work around this issue, calls to getPeerID should return a cached PeerID
which was originally generated during the first call to establish the Peer in
the PeerGroup.



 Comments   
Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Fix checked in per CVS check messages :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=631
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=632

getPeerID() now consults a instanceToPeerID hashtable for pre existence of the
peer id for a given instanceName token string. If not creates it and adds to
hash table. The hash table is cleared during NetworkManager.stop()

Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Sheetal has already integrated this fix into Shoal branch for Sailfin 0.5 (SGCS
1.0).





[SHOAL-71] HealthMonitor.isConnected : for loop only looks at 1 network interface Created: 23/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 71
Status Whiteboard:

shoal-shark-na


 Description   

The power outage related code in the IndoubtPeerDetector thread's isConnected() method iterates over
only 1 Future task and returns false if the future task is not done. The "return false" statement needs to be
inside the catch block for InterruptedException. Otherwise the following 2 lines of code will never get
executed in the case where the future task is not complete:

fine("Peer Machine for " + entry.adv.getName() + " is down!");
return false;



 Comments   
Comment by sheetalv [ 28/Jul/08 ]

issue fixed in the trunk.

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=636





[SHOAL-72] need a fix for the "unable to create messenger" IOException Created: 24/Jul/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 72
Status Whiteboard:

shoal-shark-na


 Description   

The "unable to create messenger" IOException occurs in different scenarios. One of the scenarions is when
an instance is killed. before instance B can know that the instance A has been killed, it tries to send a
message via ClusterManager.send() (could be to sync the DSC or for some other reason).

When such an IOException occurs, the Shoal code should check which instance is supposedly down. Then
the code wait for a little while before finding the state that that instance is in. If the state is
alive/aliveandready, the message should be sent again as a retry. If the instance is in in_retry_mode (i.e. it
has'nt been deemed in_doubt/failed yet), then the right way of dealing with this should be decided.



 Comments   
Comment by Joe Fialli [ 28/Jul/08 ]

Short term solution described in shoal issue 73.

Change platform to ALL since issue is not specific to MAC os.

Comment by sheetalv [ 28/Jul/08 ]

short term solution in issue 73 has been added to Sailfin 0.5.

Comment by sheetalv [ 31/Jul/08 ]

assigning to Joe.





[SHOAL-73] Change ShoalLogger WARNING for IOException to FINE Created: 24/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 73

 Description   

This is a short term change just before a shoal release.
Longer term solution is documented in a separate shoal issue. (will link when it
is available.)

Currently when there is an IOException from DAS to a server instance that was
either killed or server failed to start for some reason (like ORB bind address
still in use), it results in a WARNING that does not contain sufficient
information for an administrator to know what server instance there was a
difficulty sending a message to. Given that the WARNING was occurring for
non-error cases and there was not enough info in message for an administrator to
easily be able to figure out whether the WARNING is something that requires
attention or not. This log event is being reduced to FINE.

When such a message does occur with FINE, here is how one can correlate the
failure with a server instance name.

From a server.log, here is an event indicating a failure to send to another
server instance in the cluster. From the jxta://uuid-XXX, take the last 6
numbers and search the log for an entry that has a server instance name in it.

[#|2008-07-23T19:44:31.665-0700|WARNING|sun-glassfish-comms-server1.0|javax.enterprise.system.stream.err|_ThreadID=29;_ThreadName=MessageWindowThread;_RequestID=80b25066-a911-401d-b38b-b99a0c3aecc2;|java.io.IOException:
Unable to create a messenger to
jxta://uuid-0CEC11B5D9E64303A621B9B272CD0439FC9C0AEFDE264179A314B0C0C01C0BF803/PipeService/urn:jxta:uuid-0CEC11B5D9E64303A621B9B272CD04396521C3C52E4443928082812BCDC1E25B04
at
net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:238)
at net.jxta.impl.pipe.BlockingWireOutputPipe.send(BlockingWireOutputPipe.java:264)
at com.sun.enterprise.jxtamgmt.ClusterManager.send(ClusterManager.java:495)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:217)
at
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.sendMessage(DistributedStateCacheImpl.java:458)
at
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.addAllToRemoteCache(DistributedStateCacheImpl.java:388)
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.handleDSCMessage(MessageWindow.java:127)
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.newMessageReceived(MessageWindow.java:107)
at com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:91)
at java.lang.Thread.run(Thread.java:619)

#]

Search for BF803, find following log entry that shows that the sendMessage was
to server instance

9: MemberId: n2c1m4, MemberType: CORE, Address:
urn:jxta:uuid-0CEC11B5D9E64303A621B9B272CD0439FC9C0AEFDE264179A314B0C0C01C0BF803

For test in question, the server instance n2c1m4 was killed to test for
FAILURE notification. Thus, the log event does not capture an event that should
be viewed as a failure. The log message needs to be improved to specifically
state what server instance the send message was going to when it failed.
Future fix will make sure that sendmessage to a FAILING instance are only
reported once in server log and reported with actual server instance name.



 Comments   
Comment by Joe Fialli [ 28/Jul/08 ]

Long term solution for this issue is described in
https://shoal.dev.java.net/issues/show_bug.cgi?id=72

Fix checked into shoal for sailfin 0.5 branch.





[SHOAL-75] messages not being delivered over jxta OutputPipe.send Created: 20/Aug/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 75

 Description   

This issue was reported by shoal developer forum post at
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=111

To summarize the issue, there needs to be common place added in shoal
that checks the result of calling net.jxta.pipe.OutputPipe.send() for
whether it returns true or false. When the method returns false, the
caller should wait some small enough amount of time and then try to send again.
The send returning false means the send could not be attempted due to be out of
system resources to perform the send. Trying again will work.

So all places that Shoal is calling OutputPipe.send() should be altered to call
this common method.

The forum email confirms that it is possible to get OutputPipe.send() to return
false, thus when this happens, a message that could be delivered just does not
get sent.

Proposed fix is to add a method in com.sun.enterprise.jxtamanagment.JxtaUtil
that all methods in shoal that call net.jxta.OutputPipe.send() would call so
that the resend logic when OutputPipe.send is in one source code location.

Here is a first pass on that method that will be tried soon.

public static boolean sendMessage(OutputPipe pipe, PeerID peerId, Message
message) throws IOException {
boolean result = false;
final int MAX_SEND_ATTEMPTS = 3; // is this right amount of retries.
final int RETRY_DELAY = XXX; // in milliseconds find out what this should be
result = pipe.send(message);
int sendAttempts = 1;
while (!result && sendAttempts <= MAX_SEND_ATTEMPTS) {
try

{ Thread.sleep(RETRY_DELAY); }

catch (InterruptedException ie) {
}
result = pipe.send(message);
sendAttempts++;
}
if (!result) {
if (LOG.isLoggable(Level.FINE))

{ final String to = peerId == null ? "<broadcast to cluster>" : peerId.toString(); LOG.fine("unable to send message " + message.toString() + " to " + to + " after " + sendAttempts); }

}
return result;
}



 Comments   
Comment by shreedhar_ganapathy [ 26/Aug/08 ]

I think the proposed fix can be addressed by the LWRMulticast class. For p2p
messages, the recipient list can be a set of 1 member. Not sure if it
specifically uses a propagate pipe or a blocking wire output pipe (bwop). It
should preferably use a bwop for reliability, retransmission and flow control.

The retry logic within LWRMulticast should be vary of such failures as network
failures or hardware failures of the recipient so that it can come out of the
tcp close wait. Thus a send message operation should not be such that it would
block for the duration of the tcp retransmission timeout and once it comes out
of such a case, it should not retry. Such protections may be necessary to make
it more robust.

Comment by Joe Fialli [ 30/Oct/08 ]

fix checked into shoal trunk and integrated into sailfin communication as 1.5
nightly





[SHOAL-79] DistributedStateCacheImpl not thread safe? Created: 22/Sep/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: okrische Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File DistributedStateCacheImpl-Diff.txt    
Issuezilla Id: 79

 Description   

Hello,

tho i see several issues, i concentrate myself only on one, the most obvious:

private static final Map<String, DistributedStateCacheImpl> ctxCache =
new HashMap<String, DistributedStateCacheImpl>();
//return the only instance we want to return
static DistributedStateCache getInstance(final String groupName) {
DistributedStateCacheImpl instance;
if (ctxCache.get(groupName) == null)

{ instance = new DistributedStateCacheImpl(groupName); ctxCache.put(groupName, instance); }

else

{ instance = ctxCache.get(groupName); }

return instance;
}

I think, shoal should take care of concurrency issues on ctxCache as well.

Why not using ConcurrentMap as well? Maybe like this (which works fine, as long
instantiating an instance is not a heavy operation):

ConcurrentMap<String, DistributedStateCacheImpl> ctxCache = ...;

static DistributedStateCache getInstance(final String groupName) {
DistributedStateCacheImpl instance = ctxCache.get(groupName);
if (instance == null) {
instance = new DistributedStateCacheImpl(groupName);

// put our mapping only, if no other mapping has been put already
DistributedStateCacheImpl otherInstance =
ctxCache.putIfAbsent(groupName, instance);

// there was another mapping, use that one instead of ours
if (otherInstance != null)

{ instance = otherInstance; }

}
return instance;
}

Other issues:

  • GSMContext ctx is not guarded
  • firstSyncDone is not guarded

Right now it seems, someone has to synchronize on its own. At least this should
be reflected in the javadoc:

"The implemenation itself is not thread-safe"

What do you think?



 Comments   
Comment by shreedhar_ganapathy [ 22/Sep/08 ]

Excellent observation!
Could you also file cases for other issues you see with DistributedCache ?
Are you interested in contributing fixes? We always will welcome those.

Comment by okrische [ 23/Sep/08 ]

You want me to submit a patch on this issue? I can do, will you append it then
to the branch?

Comment by shreedhar_ganapathy [ 23/Sep/08 ]

We'd be happy to do that. If you send the contributor agreement, you will have
commit access to check in your patch after review and some testing.

Thanks
Shreedhar

Comment by okrische [ 25/Sep/08 ]

Created an attachment (id=11)
patch to fix concurrency issues for this class only

Comment by okrische [ 25/Sep/08 ]

Okay, here some comments to the patch:

  • Logger is final static instead of just final. Saves one reference per created
    instance of DistribuedStateCacheImpl
  • cache changed from Map to ConcurrencyHashMap to fix the concurrency issue
  • ctx changed to an AtomicReference ctxRef, since ctx will be set at runtime by
    the first thread, who enters the method to read ctx -> concurrency issue
  • firstSyncDone changed to volatile, since it can be changed at runtime by the
    first thread, who does the sync -> concurrency issue
Comment by okrische [ 25/Sep/08 ]

Ups.

-> meant "ctxCache", not cache

Comment by Joe Fialli [ 25/Sep/08 ]

Thanks for the patch. I will verify the changes against our internal Shoal tests.
If all checks out, I will update this issue, requesting you to check the change
in.

I will update my status on confirming this patch by Monday.

Comment by Joe Fialli [ 27/Oct/08 ]

Patch checked in.
Will be included in next shoal-integration into application server.





[SHOAL-82] notifying cluster view event is not thread safe Created: 12/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 82

 Description   

ClusterViewManager.notifyListeners() can be executed on multi-threads when many
members join the same group concurrently.

Though there are no member's failures, you can see the following log.

------------------------------------
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f
group:TestGroup
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (5d3280a2-a0c5-4ae2-8d41-
d59b57400b8f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03

2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (aeea918f-571b-463b-bfa6-
55c536df0d11) : Members in view for (before change analysis) are :
(a)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (addb1dbe-06cf-43b8-8903-
78605f29091f) : Members in view for (before change analysis) are :
(b)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (fae1414d-702a-42fd-8c7d-
6ffabe8b2e69) : Members in view for (before change analysis) are :
(c)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:20 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (42b22147-7683-481f-a9f4-
85ba5a2b847f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

...
------------------------------------

This log means that five members join "TestGroup"

1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

And this log is printed in ViewWindow based on the viewQueue when new view is
observed.

But above log message, you can see that (a), (b) and (c)'s order are strange.

Because there are no failures, I think that member's number should be increased
gradually(or (a)'num <= (b)'s num <= (c)'s num).

The following code is ClusterViewManager's notifyListeners() method.


void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners)

{ elem.clusterViewEvent(event, getLocalView()); }

}


getLocalView() is thread safe with viewLock but ClusterViewEventListener's
clusterViewEvent() is not thread safe.

The following code is GroupCommunicationProviderImpl's clusterViewEvent()
method which implements ClusterViewEventListener interface.


public void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final
ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try

{ viewQueue.put(ePacket); } catch(InterruptedExcetion e) { ... }

}
-----

I think that local view's snapshot(getLocalView()'s return value) and
viewQueue.put() should be atomic like this.
-----
void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners) {
synchronized( elem ) { elem.clusterViewEvent(event, getLocalView()); }
}
}

or

public synchronized void clusterViewEvent(final ClusterViewEvent
clusterViewEvent, final ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try { viewQueue.put(ePacket); }

catch(InterruptedExcetion e)

{ ... }

}

(In my opinion, I think that the former is better because clusterViewEvent()
can be implemented variously)


In other words,
-------------------------------------------------------------------
getLocalView() --> local view's snapshot --> (hole) --> insert view queue
-------------------------------------------------------------------

As you can see above, before EventPacket is inserted into view queue, there is
some hole. So we can remove the hole with synchronized block or individual lock
object.
If the hole is removed, I think that ViewWindow can receive local view capture
from queue correctly.



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

..





[SHOAL-83] When group leader failed, any member couldn't receive FailureRecovery notification Created: 12/Nov/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 83

 Description   

When group leader failed, any member couldn't receive FailureRecovery
notification.
Of course, members added FailureRecoveryActionFactoryImpl and their callbacks
to GMS.
But if failure member was not group leader, other member received
FailureRecovery notification successfully.

Here are two logs.
--------------------
case 1) When failure member is group leader.

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
ì •ë³´: gms.failureSuspectedEventReceived
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51-
9b79-43e2-92dd-41899c907383...
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
ì •ë³´: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383

case 2) When failure member is not group leader

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
2008. 11. 12 오후 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
ì •ë³´: gms.failureSuspectedEventReceived
2008. 11. 12 오후 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3-
581c-4392-89cf-6a06d736c90f...
2008. 11. 12 오후 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
2008. 11. 12 오후 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
ì •ë³´: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f
2008. 11. 12 오후 9:42:19
com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector
setRecoverySelectionState
ì •ë³´: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed
member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
2008. 11. 12 오후 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureRecoveryAction
ì •ë³´: Sending FailureRecoveryNotification to component service
--------------------

In case1(abnormal case),
group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master
was selected) -> FAILURE_EVENT

In case2(normal case),
member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

For receiving FailureRecovery notification, recovery target should be resolved.
Selection algorithm for recovery target uses previous members' view.

Assume that "A" and "B" are member in the same group and "A" is group leader.

[case1: "B"'s view histroy]
... --> (A, B) --> A failed -> B became to be new master with master change
event -> (B)[previous view] -> failure event -> (B)[current view]

[case2: "A"'s view history]
... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]

In other words,
case1's previous view doesn't have "A"(failure member), so default algorithm
(SimpleSelectionAlgorithm) can't find proper recovery target.
case2's previous view has "B"(failure member), so default algorithm can
select "A" for recovery target.
(I assume that you already know SimpleSelectionAlgorithm)

So I think that this issue has a concern in selection algorithm for recovery
target.

I think that thinking out another simple algorithm can be an example for
resolving this issue.
ex) always selecting first core member in live cache.



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

..

Comment by Joe Fialli [ 21/Aug/09 ]

Shoal test scenario 14 verifies that the fix for this has been integrated.





[SHOAL-84] JXTA Exception on network disconnect Created: 18/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: alireza2008 Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 84

 Description   

I encountered the Exception below during the network disconnection tests--I had
two members in a group on separate hosts within the same subnet (all default
JXTA parameters), then I unplugged the network connection from one of the host
where I received the following exception:

Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:GMSTestMonitor...
Nov 13, 2008 11:51:45 AM net.jxta.endpoint.ThreadedMessenger run
SEVERE: Uncaught throwable in background thread
java.lang.NoClassDefFoundError: net/jxta/impl/endpoint/router/RouterMessenger
at
net.jxta.impl.endpoint.router.EndpointRouter.getMessenger(EndpointRouter.java:2336)
at
net.jxta.impl.endpoint.EndpointServiceImpl.getLocalTransportMessenger(EndpointServiceImpl.java:1566)
at
net.jxta.impl.endpoint.EndpointServiceImpl.access$200(EndpointServiceImpl.java:106)
at
net.jxta.impl.endpoint.EndpointServiceImpl$CanonicalMessenger.connectImpl(EndpointServiceImpl.java:380)
at net.jxta.endpoint.ThreadedMessenger.connect(ThreadedMessenger.java:551)
at net.jxta.endpoint.ThreadedMessenger.run(ThreadedMessenger.java:389)
at java.lang.Thread.run(Unknown Source)
Nov 13, 2008 11:51:48 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group GMSTestGroup : Members in view for
(before change analysis) are :
1: MemberId: GMSTestResource, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033520D314DBB264715B
E83E86B57A610F803



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

reassigned to Joe for fixing post HCF





[SHOAL-85] message not processed/received when GroupHandle.sendMessage with null component name is specified Created: 09/Feb/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Solaris
Platform: Solaris


Attachments: Java Source File ApplicationServer1.java     Java Source File GMSClientService1.java    
Issuezilla Id: 85

 Description   

I modified the test classes:
com.sun.enterprise.ee.cms.tests.ApplicationServer
com.sun.enterprise.ee.cms.tests.GMSClientService

to send and receive a message. I've attached both those classes to this issue.

Based on the javadoc for GroupHandle:

  • ... Specifying
  • a null component name would result in the message being
  • delivered to all registered components in the target member
  • instance.

Therefore using the method:
gh.sendMessage((String)null,null, message.getBytes());

should result in the EJBContainer and TransactionService to each receive the
message passed to sendMessage, but based on the testing I've done that is not
happening.
The messages are sent but never dispatched to either service.
If you set the component name to be non-null:
gh.sendMessage((String)null,"Transaction", message.getBytes());

then the message is received by that component.

It appears that the either the documentation is wrong or there may be a
bug with the distribution of the received message to the component



 Comments   
Comment by Stephen DiMilla [ 09/Feb/09 ]

Created an attachment (id=13)
ApplicationServer java file

Comment by Stephen DiMilla [ 09/Feb/09 ]

Created an attachment (id=14)
GMSClientServer java file

Comment by Joe Fialli [ 05/Feb/10 ]

duplicate of issue 97. already fixed.

      • This issue has been marked as a duplicate of 97 ***




[SHOAL-86] Graceful handling of unexpected exceptions(NPEs) when GMS failed to join the group Created: 30/Mar/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Source File SimpleShoalAPITest.java    
Issuezilla Id: 86

 Description   

When GMS failed to join the group, GMS didn't throw a GMSException but
unexpected exception like a NPE.

There are two issues.

1) GroupManagementService#join() API had better throw a GMSException instead of
a NPE in an unexpected error.
Here is the log.


D:\shoal\gms>rungmsdemo.bat testServer testGroup CORE 30000 INFO
D:\ibm_sdk60\bin
[#|2009-03-
31T10:32:01.677+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=Applicatio
nServer;C
lassName=NetworkManager;MethodName=<init>;|Could not locate World PeerGroup
Module Implementation.
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass
(WorldPeerGroupFact
ory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>
(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF
(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>
(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.startGMS
(ApplicationServer.java:156)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.run
(ApplicationServer.java:107)
at java.lang.Thread.run(Thread.java:735)

#]

Exception in thread "ApplicationServer" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getWorldPeerGroup
(NetworkManager.java:725)
at com.sun.enterprise.jxtamgmt.NetworkManager.startDomain
(NetworkManager.java:696)
at com.sun.enterprise.jxtamgmt.NetworkManager.start
(NetworkManager.java:401)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:136)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.startGMS
(ApplicationServer.java:156)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.run
(ApplicationServer.java:107)
at java.lang.Thread.run(Thread.java:735)


When you try to run rungmsdemo.bat in IBM JDK6, you can see NPEs in
GroupManagementService#join().

2) At 1)'s case above, GMS's other APIs like the GroupHandle need graceful
handling of this problem.

I wrote some codes for this test(SimpleShoalAPITest.java).

Test code is simple.


try {
gms.join();
} catch( GMSException e ) {
// It's OK.
throw e;
} catch( Throwable t ) {
// unexpected error.
List<String> exceptions = testSimpleAPIsWithUnexpectedException( gms );
// print unexpected exceptions
// ...
}

private List<String> testSimpleAPIsWithUnexpectedException(
GroupManagementService gms ) {
if( gms == null )
return null; // It's OK.
List<String> unexpectedExceptions = new Vector<String>();
String dummyString = "";
byte[] dummyBytes = new byte[0];

GroupHandle gh = gms.getGroupHandle();
if( gh == null )
return null; // It's OK.
// test APIs
// ...
}


when GMS failed to join the group, but if GMS was not null and GroupHandle was
not null, I checked all GroupHandle's APIs with dummy String and dummy bytes.

Here is the full log


D:\shoal\gms>java -classpath classes;lib\jxta.jar
com.sun.enterprise.shoal.carryel.SimpleShoalAPITes
t
[#|2009-03-
31T11:32:01.458+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=runSimpleSample;|Starting SimpleShoalAPITest....|#]

[#|2009-03-
31T11:32:02.052+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=initializeGMS;|Initializing Shoal for member: 67fbe786-
ff24-4a1f-81d2-d795bc
b9dd16 group:TestGroup|#]

[#|2009-03-
31T11:32:02.068+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=GMSCon
text;MethodName=<init>;|Initialized Group Communication System....|#]

[#|2009-03-
31T11:32:02.068+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=runSimpleSample;|Joining Group TestGroup|#]

[#|2009-03-
31T11:32:02.068+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=GroupM
anagementServiceImpl;MethodName=join;|Connecting to group......|#]

[#|2009-03-
31T11:32:02.130+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Jxta
Util;MethodName=configureJxtaLogging;|gms configureJxtaLogging: set jxta
logging to default of SEVER
E|#]

[#|2009-03-
31T11:32:02.208+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=initWPGF;|initWPGF storeHome=.shoal\67fbe786-ff24-4a1f-
81d2-d795bcb9dd16|#]

[#|2009-03-
31T11:32:02.208+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Netwo
rkManager;MethodName=clearCache;|clearCache(.shoal\67fbe786-ff24-4a1f-81d2-
d795bcb9dd16) on non-exsi
stent directory|#]

[#|2009-03-
31T11:32:02.443+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=<init>;|Could not locate World PeerGroup Module
Implementation.
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass
(WorldPeerGroupFact
ory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>
(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF
(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>
(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.runSimpleSample
(SimpleShoalAPITest.ja
va:42)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.main
(SimpleShoalAPITest.java:25)

#]

[#|2009-03-
31T11:32:02.443+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Networ
kManager;MethodName=startDomain;|Rendezvous seed?:false|#]

[#|2009-03-
31T11:32:02.443+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=startDomain;|set jxta Multicast Poolsize to 300|#]

[#|2009-03-
31T11:32:02.458+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=startDomain;|node config adv = <?xml version="1.0"
encoding="UTF-8"?>
<!DOCTYPE jxta:CP>
<jxta:CP xml:space="default" type="jxta:PlatformConfig"
xmlns:jxta="http://jxta.org">
<PID>
urn:jxta:uuid-
59616261646162614A7874615032503363A4CD95BF504B68B35687BA4517337A03
</PID>
<Name>
67fbe786-ff24-4a1f-81d2-d795bcb9dd16
</Name>
<Desc>
Created by Jxta Cluster Management NetworkManager
</Desc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000A05
</MCID>
<Parm>
<jxta:TransportAdvertisement
xmlns:jxta="http://jxta.org" xml:space="preserv
e" type="jxta:HTTPTransportAdvertisement">
<Protocol>http</Protocol><ConfigMode>auto</ConfigMode><Port>9700</Port><ServerOf
f/>
</jxta:TransportAdvertisement>
</Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000905
</MCID>
<Parm>
<jxta:TransportAdvertisement
xmlns:jxta="http://jxta.org" xml:space="preserv
e" type="jxta:TCPTransportAdvertisement">
<Protocol>tcp</Protocol><ConfigMode>auto</ConfigMode><Port start="9701"
end="9999">9701</Port><Multi
castAddr>224.0.1.85</MulticastAddr><MulticastPort>1234</MulticastPort><Mcast_Poo
l_Size>300</Mcast_Po
ol_Size><MulticastSize>65536</MulticastSize>
</jxta:TransportAdvertisement>
</Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000105
</MCID>
<Parm type="jxta:PeerGroupConfigAdv"
xmlns:jxta="http://jxta.org" xml:space="preserv
e">
<PeerGroupID>urn:jxta:uuid-
157B8869F02A4210BE61AA03D81ECC6659616261646162614E5047205032503302</PeerG
roupID><PeerGroupName>TestGroup</PeerGroupName><PeerGroupDesc>TestGroup
Infrastructure Group Name</P
eerGroupDesc> </Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000F05
</MCID>
<Parm type="jxta:RelayConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve" clie
nt="true">
<client/><server/> </Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000605
</MCID>
<Parm type="jxta:RdvConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve" config
="client"/>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000505
</MCID>
<Parm type="jxta:PSEConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve"/>
</Svc>
</jxta:CP>

#]

[#|2009-03-
31T11:32:02.505+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=runSimpleSample;|Unexpected exception occured while
joining group:
java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getWorldPeerGroup
(NetworkManager.java:725)
at com.sun.enterprise.jxtamgmt.NetworkManager.startDomain
(NetworkManager.java:696)
at com.sun.enterprise.jxtamgmt.NetworkManager.start
(NetworkManager.java:401)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:136)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.runSimpleSample
(SimpleShoalAPITest.ja
va:42)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.main
(SimpleShoalAPITest.java:25)

#]

[#|2009-03-
31T11:32:02.505+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Distr
ibutedStateCacheImpl;MethodName=addToCache;|Adding to DSC by local
Member:67fbe786-ff24-4a1f-81d2-d7
95bcb9dd16,Component:,key:,State:RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.505+0900|FINEST|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Dist
ributedStateCacheImpl;MethodName=addToLocalCache;|Adding
cKey=GMSMember:67fbe786-ff24-4a1f-81d2-d795
bcb9dd16:Component::key: state=RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.505+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Distr
ibutedStateCacheImpl;MethodName=printDSCContents;|67fbe786-ff24-4a1f-81d2-
d795bcb9dd16:DSC now conta
ins ---------
209999666 key=GMSMember:67fbe786-ff24-4a1f-81d2-d795bcb9dd16:Component::key: :
value=RECOVERY_IN_PRO
GRESS|1238466722505

#]

[#|2009-03-
31T11:32:02.521+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Group
HandleImpl;MethodName=isFenced;|GMSMember:67fbe786-ff24-4a1f-81d2-
d795bcb9dd16:Component::key: value
:RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.521+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Group
HandleImpl;MethodName=isFenced;|Returning true for isFenced query|#]

[#|2009-03-
31T11:32:02.521+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=runSimpleSample;|Unexpected exceptions occured:
GroupHandle#sendMessage( String, byte[] ): java.lang.NullPointerException
GroupHandle#sendMessage( String, String, byte[] ):
java.lang.NullPointerException
GroupHandle#sendMessage( String, String, byte[] ):
java.lang.NullPointerException
GroupHandle#raiseFence( String, String ): java.lang.NullPointerException
GroupHandle#lowerFence( String, String ): java.lang.NullPointerException
GroupHandle#getMemberState( String ): java.lang.NullPointerException
GroupHandle#getMemberState( String, long, long ): java.lang.NullPointerException
GroupHandle#getGroupLeader(): java.lang.NullPointerException
GroupHandle#isGroupLeader(): java.lang.NullPointerException

#]

[#|2009-03-
31T11:32:02.536+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=main;|Exception occured while testing some
APIs:com.sun.enterprise.ee.cms.
core.GMSException: java.lang.NullPointerException|#]


As you can see, the following APIs(9 methods) are not safe.

  • GroupHandle#sendMessage( String, byte[] ): java.lang.NullPointerException
  • GroupHandle#sendMessage( String, String, byte[] ):
    java.lang.NullPointerException
  • GroupHandle#sendMessage( String, String, byte[] ):
    java.lang.NullPointerException
  • GroupHandle#raiseFence( String, String ): java.lang.NullPointerException
  • GroupHandle#lowerFence( String, String ): java.lang.NullPointerException
  • GroupHandle#getMemberState( String ): java.lang.NullPointerException
  • GroupHandle#getMemberState( String, long, long ):
    java.lang.NullPointerException
  • GroupHandle#getGroupLeader(): java.lang.NullPointerException
  • GroupHandle#isGroupLeader(): java.lang.NullPointerException

Shoal should improve these NPEs.

I attached my test code(SimpleShoalAPITest.java)



 Comments   
Comment by carryel [ 30/Mar/09 ]

Created an attachment (id=15)
GroupHanlde API test code

Comment by Joe Fialli [ 17/Apr/09 ]

agree that GMS API methods should not be throwing NPE

Comment by Joe Fialli [ 05/Feb/10 ]

Fixed.

Regression test is com.sun.enterprise.ee.cms.tests.core.GroupHandleTest.java.
Shell script runAPITests.sh.

Verified that the originally checked in test filed with this bug also runs okay.
Throw IllegalArgumentException when null is passed as a parameter and it is not
allowed.





[SHOAL-88] GroupHandle.sendMessage fails frequently when too many concurrent threads sending at same time Created: 21/May/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 88

 Description   

Reported initially by Bongjae Chang.
Following extracted from his emails.


To reproduce issue on same machine,

one command is
java -cp xxx
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
server1 server2 100

another command is
java -cp xxx
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
server2 server1 0

One will notice many failures in log that message was not sent.

com.sun.enterprise.ee.cms.core.GMSException: message
com.sun.enterprise.ee.cms.spi.GMSMessage@1b6c03
6 not sent to
urn:jxta:uuid-59616261646162614A78746150325033A113B2FFB4B64F038C858B9EB8FC413803, send
returned false
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommu
nicationProviderImpl.java:291)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl.sendMessage(GroupHandleImpl.java:133)

at
com.sun.enterprise.shoal.jointest.MultiThreadSenderTest$1.run(MultiThreadSenderTest.java:
103)
at java.lang.Thread.run(Thread.java:717)


It seems that JXTA's OutputPipe.send() returns false continuously because of
overflow. Shoal already tried to send it again with MAX_SEND_RETRIES which is 4
in JxtaUtil#send().

But, it seems that the MAX_SEND_RETRIES value is not enough in my test which has
over 100 sender thread simultaneously.

When I set MAX_SEND_RETRIES to over 1000 experimentally, I found that all
packets could be sent to the remote server successfully, but there was a marked
decline in the sending performance. So, I think that it is not good idea that
MAX_SEND_RETRIES has too large value in my test.



 Comments   
Comment by Joe Fialli [ 21/May/09 ]

Bongae confirmed that putting a synchronized block on OutputPipe corrected
the issue. There will be an RFE submitted to change this synchronization
solution to a more performant pool of OutputPipes (one pipe to be used by one
thread at any point in time).

Comment by Joe Fialli [ 21/May/09 ]

fix checked into shoal on May 21.





[SHOAL-91] jxta class loading issue using IBM JDK 1.6 Created: 20/Aug/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: Linux


Issuezilla Id: 91

 Description   

Reported issue is with IBM JDK 1.6 on Linux, AIX and Windows.
One can work around this issue using IBM JDK 1.5.

Executive summary of bug in IBM JDK 1.6 from Bongjae; (does not include how this
breaks jxta)

Given following test code fragment:

Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.

Complete details provided by Bongjae:

I tried to test current shoal version in IBM JDK 1.6.

But GMS failed to join the group in IBM JDK 1.6.

Here is error log.
---------------------
2009. 3. 21 오후 6:21:17 com.sun.enterprise.jxtamgmt.JxtaUtil configureJxtaLogging
CONFIG: gms configureJxtaLogging: set jxta logging to default of SEVERE
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager initWPGF
CONFIG: initWPGF
storeHome=/home/bluewolf/project/jeus7trunk/target/jeus/domains/dvt/data/gms/dvt
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager <init>
SEVERE: Could not locate World PeerGroup Module Implementation.
Throwable occurred: net.jxta.exception.PeerGroupException: Could not locate
World PeerGroup Module Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFactory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:123)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:347)
...
---------------------

I could know that "Could not locate World PeerGroup Module Implementation"
message was concerned with JDK version and jxta platform when I reviewed
[Shoal-Users] mailing list.

So I tried to test jxta.jar simply.
---------------------
D:\>java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pwi3260sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32
jvmwi3260-20080415_18762 (JIT enabled,
AOT enabled)
J9VM - 20080415_018762_lHdSMr
JIT - r9_20080415_1520
GC - 20080415_AA)
JCL - 20080412_01

D:\>java -jar jxta.jar
Starting the JXTA platform in mode : EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager configure
INFO: Created new configuration. mode = EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager startNetwork
INFO: Starting JXTA Network! MODE = EDGE, HOME = file:/D:/.cache/BootEdge/
2009. 3. 23 오전 11:22:28 net.jxta.impl.peergroup.StdPeerGroup isCompatible
WARNING: Failure handling compatibility statement
Throwable occurred: java.lang.NumberFormatException: Empty version string
at java.lang.Package.isCompatibleWith(Package.java:223)
at net.jxta.impl.peergroup.StdPeerGroup.isCompatible(StdPeerGroup.java:414)
at
net.jxta.impl.peergroup.GenericPeerGroup$1.compatible(GenericPeerGroup.java:131)
at net.jxta.impl.loader.RefJxtaLoader.findClass(RefJxtaLoader.java:254)
at
net.jxta.impl.loader.RefJxtaLoader.findModuleImplAdvertisement(RefJxtaLoader.java:350)
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:241)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)
Uncaught Throwable caught by 'main':
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:244)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)

D:\>
---------------------

I could know that this error was related to Package.isCompatibleWith() method.

Here is my test code.

---------------------
Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.


 Comments   
Comment by Joe Fialli [ 21/Aug/09 ]
      • Issue 92 has been marked as a duplicate of this issue. ***
Comment by Joe Fialli [ 25/Aug/09 ]

I downloaded IBM JDK 6 as part of Eclipse for windows.
The manifest file for rt.jar is incorrectly configured. It is missing
package specification and version info for package java.

Comment by shreedhar_ganapathy [ 18/Sep/09 ]

Working with IBM under support arrangement, they will make a patch over JDK 6 SR5 available to Sun's
customers on AIX platform for this issue. The patch is expected to be made available in about a month
from now.
They do not plan to have this patch available for IBM JDK 6 SR5 on other platforms. This patch should
also work with SR6 since SR 6 is already in freeze so the fix did not make it there. We hope that SR 7
would have the fix incorporated.

Comment by Joe Fialli [ 05/Feb/10 ]

No fix on shoal side. An update version of IBM JDK 6 fixes this issue.





[SHOAL-92] jxta class loading issue using IBM JDK 1.6 Created: 20/Aug/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time