<< Back to previous view

[SHOAL-112] ability to configure GMS member to use SSL Created: 12/Nov/10  Updated: 31/Oct/12

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 112
Tags:
Participants: Joe Fialli

 Description   

Provide a GMS property that enables one to configure a GMS member to use SSL for
its TCP communications. Both supported transports, grizzly and jxta, have the
ability to enable SSL for point to point communication.






[SHOAL-117] Support multiple shoal instances in a single JVM Created: 25/Nov/11  Updated: 25/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Arul Dhesiaseelan Assignee: shreedhar_ganapathy
Resolution: Unresolved Votes: 2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags:
Participants: Arul Dhesiaseelan and shreedhar_ganapathy

 Description   

We have a requirement to run multiple Shoal instances per JVM. We believe Shoal does not support this as the GMSContext is assigned per group, not per server. We have implemented this to support GMSContext per server in the same group allowing multiple contexts coexist in the same JVM. We would be happy to contribute this patch to the Shoal project. Would anyone be interested in this patch?






[SHOAL-89] Improved concurrency for sendMessage Created: 18/Jun/09  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 89
Tags:
Participants: Joe Fialli and shreedhar_ganapathy

 Description   

RFE related to fix for shoal issue 88 to change that synchronization
solution to a more performant pool of OutputPipes (one pipe to be used by one
thread at any point in time).



 Comments   
Comment by shreedhar_ganapathy [ 09/Nov/11 08:45 PM ]

Transferring to Joe for eval and closure.

Comment by Joe Fialli [ 09/Nov/11 09:11 PM ]

there are trade offs for concurrent sendMessage when relying on NIO as the ultimate transport.
so this RFE was considered and postponed due to these tradeoffs.

The concurrent processing resulted in not being able to share the same deserialized output stream with
all send messages. Thus, there is also a space usage and/or multiple desserializations necessary for
each concurrent send.

With regular multicast, only one deserialization of the message to be sent was occurring.
With current implementation, there still is only one.

With concurrent send, there were not so obvious tradeoffs.
So this RFE is on hold for now while sorting through the tradeoffs.





[SHOAL-76] DSC logging performance improvements Created: 13/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Task Priority: Trivial
Reporter: mbien Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


File Attachments: Text File DSC.patch    
Issuezilla Id: 76
Tags:
Participants: Joe Fialli, mbien and shreedhar_ganapathy

 Description   

wrapped logging in potential hot or concurrent code paths into
if(is loglevel loggable){ log(...); }
to prevent unnecessary synchronization and logging overhead.



 Comments   
Comment by mbien [ 13/Sep/08 10:14 AM ]

Created an attachment (id=9)
diff patch

Comment by shreedhar_ganapathy [ 09/Nov/11 08:46 PM ]

Transferring to Joe for eval and closure.





[SHOAL-80] Accessing system property in a rt.jar specific way Created: 23/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: okrische Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 80
Tags:
Participants: Joe Fialli, okrische and shreedhar_ganapathy

 Description   

Watch out in line 99 of com.sun.enterprise.jxtamgmt.NiceLogFormatter:

@SuppressWarnings("unchecked")
private static final String LINE_SEPARATOR =
(String) java.security.AccessController.doPrivileged(
new sun.security.action.GetPropertyAction("line.separator"));

Why not just using:

  • System.getProperty("line.separator")

instead?

The code above is shown as error in eclipse. Probably it just does not like,
that we use code directly on the rt.jar and not from the public API.



 Comments   
Comment by Joe Fialli [ 27/Oct/08 12:51 PM ]

Does not impact the running system, only compile time.

Comment by shreedhar_ganapathy [ 09/Nov/11 08:46 PM ]

Transferring to Joe for eval and closure.





[SHOAL-61] when members join the group concurrently, join notifications of some members are often duplicated or missed Created: 10/Jun/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


File Attachments: Text File shoal_issue61_2009_06_11.txt     Java Source File SimpleJoinTest.java    
Issue Links:
Dependency
blocks SHOAL-50 Expose MASTER_CHANGE_EVENT Resolved
Issuezilla Id: 61
Status Whiteboard:

shoal-shark-na

Tags:
Participants: carryel, Joe Fialli and sheetalv

 Description   

This issue is similar to issue #60.
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)

Members joined the group according to the order in the issue #60,
but on the other hand members joined the group concurrently in this issue.

If all members concurrently join the group first, members don't know who is
group leader and should negotiate the leader. At this case, notifications of
some memebers are often duplicated or missed.

Here is the log of duplicated case. Assume that "A" and "B" are "TestGroup"'s
members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049
group:TestGroup
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
------------------------------------------------------------------------
"A" received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-3fc8b6854049).

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: e9d80499-0f8b-4e2d-8856-3f31dcc25f96
group:TestGroup
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103

2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:04
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:12
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:15
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96

------------------------------------------------------------------------
"B" also received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049).
And because "B" is group leader, "B" don't receive own join notification.

Here is the another log of missed case. Assume that "A" ,"B" and "C"
are "TestGroup"'s members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 197c66d7-f56c-4119-8b1e-18dc330e39d3
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["C"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 468996ee-2d54-4c58-af46-72d903154e31
group:TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------
All members missed some join notification.

Whenever you try to test join concurrently, duplicated or missed results can be
changed.

Anyway, in concurrent join case, all members should receive join notifications
with one another and join notifications should not be duplicated if all memeber
has good health.



 Comments   
Comment by carryel [ 10/Jun/08 11:11 PM ]

Created an attachment (id=8)
I attached a simple test code

Comment by carryel [ 29/Jun/08 09:05 PM ]

1. Testing scenarios
Testing scenarios are simple. Shoal(with Jxta) don't support multiple members
becoming part of the same group from the same JVM.
So, each member should join the group with separate process(JVM).
You can test this manually with executing "SimpleJoinTest" I attached ago.
Whenever you execute "SimpleJoinTest", new member(node) can join
the "TestGroup".

I tested this with creating multiple "SimpleJoinTest"s. You maybe need 3 or
4 "SimpleJoinTest"'s processes.
a) In the beginning, there is no member and no group.
b) I executed multiple "SimpleJoinTest"s in the separate process(JVM)
concurrently.
c) I saw each log. I observed "Signal.getMemberToken()" particularly.
ex) "****JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96"

Strictly speaking, we can't execute multiple processes simultaneously. But
because each member has the discovery timeout, this is an acceptable error
range.
In other words, if you execute "SimpleJoinTest" when other "SimpleJoinTest"s
are waiting for the discovery timeout, you can reproduce strange results.

2. How does the new code behave during the discovery phase?
Assume "A", "B" and "C" will become members of the group.
In my scenario, "A", "B" and "C" will wait for the discovery timeout because
there is no master in the group.
Before they enter discovery phase, they first set the master advertisement as
own advertisement. But masterAssigned is mostly false at this time.
Mostly masterAssigned can be set as true by the following method:

  • In MasterNode.appointMasterNode()
  • In MasterNode.processMasterNodeResponse()
  • In MasterNode.processMasterNodeAnnouncement()

a) In MasterNode.appointMasterNode()
This case is that master is not assigned after the discovery timeout. ex) there
is no master in the group.
Then we use the discovery view which has other members if other members sent
any messages to me in order to put up a cadidate as master.
Of course, because the discovery view always has own advertisement, own
advertisement can become the candidate.

a-1) when own advertisement becomes the master
First, if the cadidate is own advertisement and the discovery view has other
members, clusterViewManager.setMaster() is called with discovery view's
snapshot.
Original code call clusterViewManager.setMaster() with only own view snapshot.
But because the master was already determined as own advertisement, I think
that calling clusterViewManager.setMaster() with discovery view's snapshot is
better than with only own view's snapshot.
Of course, Calling clusterViewManager.setMaster() without discovery view's
snapshot has no problem because when other members receive
processMasterNodeAnnouncement() by master's announceMaster(), they can call
sendSelfNodeAdvertisement(). But if discovery view has them and setMaster() is
called with discovery view, sendSelfNodeAdvertisement() is unnecessary at this
case because master view already has them. So they can set the master directly
without sendSelfNodeAdvertisement().

And about calling announceMaster(),
-----------------------------------------------------
[original appointMasterNode() in MasterNode.java]
...
if (madv.getID().equals(localNodeID)) {
...
if(clusterViewManager.getViewSize() > 1) { announceMaster(manager.getSystemAdvertisement()); }
...
}
-----------------------------------------------------

It can be edited this as the following code
-----------------------------------------------------
if (madv.getID().equals(localNodeID)) {
...
//if(clusterViewManager.getViewSize() > 1) { announceMaster(manager.getSystemAdvertisement()); //}
...
}
-----------------------------------------------------
In other words, if own advertisement becomes the master, announceMaster() is
always called. When I am debuging this, though one more member joined the
group, sometimes clusterViewManager.getViewSize() could be equal to 1 in a
short time. So I think that for safety it is better that it should be edited.
Though announceMaster() is called when clusterViewManager.getViewSize() is
equal to 1, it is no problem because we don't receive own message.

a-2) when other member's advertisement becomes the master
Original code always set the master without notification. Then sometimes
master's view can't be updated. see the following code.
-----------------------------------------------------
[appointMasterNode() method in MasterNode.java]
...
clusterViewManager.setMaster(madv, false);
...
-----------------------------------------------------

-----------------------------------------------------
[setMaster(advertisement, notify) method in ClusterViewManager.java]

if ( !advertisement.equals(masterAdvertisement)) {
...
// notify
}
-----------------------------------------------------
As you see, if current member already set the master, notify is not called.
If first we already called setMaster(advertisement, false) in
MasterNode.appointMasterNode(), when master sends new view to me later and I
receive the view through processMasterNodeAnnouncement() or
processMasterNodeResponse(), notifying new view is not called, though setMaster
() can be called with new view because current masterAdvertisement is already
same to master's advertisement.
So I think it should be also edited. If cadidate is other member, I don't call
setMaster(advertisement, false). Though we don't set the master now, we can
receive the master change event through processMasterNodeAnnouncement() or
processMasterNodeResponse() later.

b) In MasterNode.processMasterNodeResponse():
MASTER_CHANGE_EVENT is notified with master view's snapshot by Issue #60
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)
Additional patch is that when sendSelfNodeAdvertisement() is called,
MASTER_CHANGE_EVENT also is notified with master view's snapshot.

c) In MasterNode.processMasterNodeAnnouncement():
This is very similar to above b). Like b) It should be edited.

So now I want to describe how the new code behaves during the discovery phase.
Actually, new code behaves like old code's original purpose. There is no big
changes.

1) If "A", "B" and "C" joined the group concurrently and when all member are
waiting for the discovery timeout.

1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master.
So all members call announceMaster(). Then All members receive master's
announcements and become aware of master's collision through checkMaster().
Master's collision can be resolved by ID. When the member affirms master node
role or resign master node role, the member notify MASTER_CHANGE_EVENT.
Though original code didn't notify MASTER_CHANGE_EVENT when the member affirms
master node role, I think that it should be edited.
Above a-1) though the member already called setMaster() and notified
MASTER_CHANGE_EVENT and master was not changed, we should notify
MASTER_CHANGE_EVENT because master's view already was changed by collision. If
we don't notify the event, we can't become aware of view changes quickly in
the collision case. Of course, if another event will be occurred later, this
member(master) can become aware of view changes. But I think view changes
should be applied as soon as possible.

1-2) If all members receive each other member's message and discovery view has
all members, candidate is selected from discovery view by TreeMap's ordering
sort.
If all members select the same cadidate, the cadidate member will send master
announcement. other members will process processMasterNodeAnnouncement() and
set the master with current master's view snapshot.

If some members receive each other member's message and some members don't
receive, 1-1) and 1-2) are mixing.

2) If some nodes joined the group late
If some members join the group and there is already master,
new members will send master node query to all members and the master node will
process processMasterNodeQuery(). Then the master node will send master
response with master view's snapshot and new members will process
processMasterNodeResponse() and set the master with current master's view.

3. How the code behaves when a node is shutdown and :
I think my semantics don't have an effect on shutdown algorithm. I know
shutdown and failure case are connected with HealthMonitor.
But I think that some logic about startup in HealthMonitor should be edited.
When node is starting and HealthMonitor is started, MasterNode.probeNode() can
be called by HealthMonitor.
In "1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master"'s case and
master collision case,
if MasterNode.probeNode() is called by HealthMonitor, processMasterNodeResponse
() can be processed. Because processMasterNodeResponse() doesn't assume
collision case, sometime unexpected results can be occurred in the master
selection algorithm.
So I think that health monitor should only start after master discovery was
finished.
So this change don't have an effect on shutdown.

When a node which is not mater restarted before it is determined to be failed,
master's view is same. So members which already joined the group don't receive
any changes.
The node which restarted receives all members' join notifications by master's
response.
When a node which is not master restared after it has been ejected from the
cluster, master's view is changed. So members which already joined the group
only receive failed node's join notification because master already removed the
node from master's view . The node which restarted receives all members' join
notifications by master's response.

When a node which is master restarted before it is determined to be failed, the
node which was master sends discovery messages to all members and waits for the
discovery timeout.
Maybe because other members are not master, the master node don't receive any
messages. So the master node sends master announcement included only own
advertisement to members. Then because members know that master's view only has
master advertisement, members call sendSelfNodeAdvertisement(). Then master can
become aware of existing members through processNodeResponse(). The master node
can receive join notifications of all members. Other members don't receive any
changes because they first call sendSelfNodeAdvertisement() and return before
setMaster().

When a node which is master restarted after it has been ejected from the
cluster, members already elected new master. When master was failed and new
master was elected, because members' view had no additional member, members
don't receive any join events. But when a node which was master restarted, the
node sends master discovery message and receives new master's response. So the
node receives all existing members' join notifications from new master and
other members receives only failed member's join notification.

Comment by sheetalv [ 09/Jul/08 02:57 PM ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 03:00 PM ]

assigning to self

Comment by Joe Fialli [ 06/Feb/09 11:37 AM ]

Reviewing carryel's submitted fix for this issue.
Have already checked in submitted test case and it can be run
via "ant simplejointest".

Comment by carryel [ 22/Jun/09 09:39 PM ]

Created an attachment (id=18)
I attached the proposed patch for history





[SHOAL-55] more reliable failure notification Created: 09/May/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issue Links:
Dependency
blocks SHOAL-58 One instance was killed, other instan... Resolved
Issuezilla Id: 55
Status Whiteboard:

shoal-shark-na

Tags:
Participants: Joe Fialli and sheetalv

 Description   

Instance A is either going down or under load. So Instance B starts to retry its
connection to instance A. Before instance B can deem instance A as dead or
alive, there needs to be an intermediate state called "in_retry_mode" that can
help the GMS clients.
For e.g. CLB can make use of this state to ping instance A again after a little
while.
In memory rep code can also make use of this intermediate state to determine
that instance A is in "in_retry_mode" and then if the pipecloseevent has
occurred, then a new pipe can be created if instance A is now alive.



 Comments   
Comment by sheetalv [ 09/Jul/08 02:54 PM ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 27/Aug/08 01:33 PM ]

2 cases to address:

1. false positives occurring when miss 3 heartbeats from an instance that
is in middle of full GC. (full GC can take 12 to 15 seconds).
Other instances in cluster receive incorrectly receive FAILURE_NOTIFICATION
and instance is still running once full gc completes.

2. nodeagent detects a failed instance and restarts before shoal can detect
the instance has failed and notify others in cluster. Happens on faster,
newer machines.

Comment by sheetalv [ 27/Aug/08 01:38 PM ]
      • Issue 58 has been marked as a duplicate of this issue. ***
Comment by sheetalv [ 27/Oct/08 12:33 PM ]

too big of an architecture change for Sailfin 1.5. NA for Sailfin 1.5.

Comment by sheetalv [ 31/Jul/09 11:02 AM ]

WatchDog notification implementation has been added to Shoal. This takes care of
case 2 (DAS restart) of what Joe has mentioned above.





[SHOAL-111] capability to configure requirement for authentication for GMS member to join group Created: 12/Nov/10  Updated: 12/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 111
Tags:
Participants: Joe Fialli

 Description   

Leverage certificate based authentication (JAAS) to validate whether a GMS
member should be allowed to join a GMS group.



 Comments   
Comment by Joe Fialli [ 12/Nov/10 01:16 PM ]

adjustment to subject title to state that there needs to be a configuration
capability to require authentication for GMS join





[SHOAL-109] optimize virtual broadcast message send Created: 19/Aug/10  Updated: 19/Aug/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 109
Tags:
Participants: Joe Fialli

 Description   

Broadcast that iterates over each active instance and sends over TCP is
inefficiently serializing the payload each time it sends to an instance.

When udp broadcast is used, the payload of gms send message is serialized once
and then broadcast to all instances in the cluster. Correct this inefficiency
since DistributedStateCache and GroupHandle.sendMessage(String targetComponent,
bytes[]) serializes the
GMSMessage object FOR EACH INSTANCE in cluster.

This change will not impact GMS notifications or heartbeats since they rely on
udp broadcast of gms sendMessage.






[SHOAL-3] GMS SPI does not expose receive capability Created: 12/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 3
Tags:
Participants: sheetalv, shreedhar_ganapathy and shreedharganapathy

 Description   

The GMS SPI GroupCommunicationProvider.java does not expose a receive() method
so that SPI implementations may handle messages received from the underlying
Group Communication Provider.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 01:28 PM ]

..

Comment by shreedhar_ganapathy [ 16/Feb/07 03:24 PM ]

..

Comment by sheetalv [ 09/Jul/08 03:02 PM ]

assigning to self





[SHOAL-1] Possible to request same group for different server IDs in same JVM Created: 08/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: jstuyts Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


File Attachments: Text File shoal-allowed-group-combinations.patch    
Issuezilla Id: 1
Status Whiteboard:

shoal-shark-na

Tags:
Participants: jstuyts, sheetalv, shreedhar_ganapathy and shreedharganapathy

 Description   

By using the following sequence of parameters for startGMSModule it is possible
to join the same group with different server IDs in the same JVM:

  • server A, group foo
  • server B, group bar
  • server B, group foo

All three invocations create a module.

The expected behavior is that the third invocation throws an exception stating
that a module for group foo already exists but that its server ID (and/or role)
is different.



 Comments   
Comment by jstuyts [ 15/Nov/06 04:52 AM ]

Created an attachment (id=1)
Patch that will only allow one membership for a group per JVM

Comment by shreedharganapathy [ 17/Jan/07 01:28 PM ]

..

Comment by shreedhar_ganapathy [ 30/May/07 02:37 PM ]

Sorry for the loooong delay in getting to this bug.
Since this is considered a defect that is not a common use case based on various
inputs from external community members and from Sun-internal products, I am
lowering this priority to P4. Will review the patch for all impacts and then
integrate, in the next few days.

Comment by sheetalv [ 09/Jul/08 03:07 PM ]

assigning to self

Comment by shreedhar_ganapathy [ 09/Jul/08 05:52 PM ]

marked shark na i.e not for Sailfin 1.0





[SHOAL-2] New module create for existing group depending on module creation order Created: 08/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: jstuyts Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 2
Status Whiteboard:

shoal-shark-na

Tags:
Participants: jstuyts, sheetalv, shreedhar_ganapathy and shreedharganapathy

 Description   

By using the following sequence of parameters for startGMSModule a new module
will be created for server A, group foo on the third invocation:

  • server A, group foo
  • server B, group bar
  • server A, group foo

The expected behavior is that the module created on the first invocation is
returned by the third invocation.



 Comments   
Comment by jstuyts [ 15/Nov/06 04:54 AM ]

I have added a patch to issue 1 that will also fix this issue.

Comment by shreedharganapathy [ 17/Jan/07 01:28 PM ]

..

Comment by shreedhar_ganapathy [ 30/May/07 02:38 PM ]

Similar to issue 1, as this is not considered a critical requirement based on
various inputs from external community members and from Sun-internal products, I
am lowering this priority to P4.

I will review the patch for all impacts and then integrate.

Comment by sheetalv [ 09/Jul/08 03:07 PM ]

assigning to self

Comment by shreedhar_ganapathy [ 09/Jul/08 05:53 PM ]

marked shark na





[SHOAL-5] GMS SPI should use more that a String to identify members Created: 12/Nov/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 5
Tags:
Participants: sheetalv and shreedharganapathy

 Description   

GMS SPI currently uses a String to identify a member which may work in many
cases. We need to consider a better way to represent a member such that
implementations can easily map such a member to the underlyiing Group
Communication Provider's member representations.

We may also want to open up the possibilities that applications using GMS may
not want to identify their members as a String.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 01:28 PM ]

..

Comment by sheetalv [ 09/Jul/08 03:03 PM ]

assigning to self





[SHOAL-9] Need Shoal User Guide Created: 08/Dec/06  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedharganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 9
Tags:
Participants: sheetalv and shreedharganapathy

 Description   

A Shoal User Guide is needed to guide users to easily integrate the library into
their applications.



 Comments   
Comment by shreedharganapathy [ 17/Jan/07 01:28 PM ]

..

Comment by sheetalv [ 09/Jul/08 03:03 PM ]

assigning to self





[SHOAL-33] test for DSCMessage Created: 23/Jan/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 33
Tags:
Participants: sheetalv

 Description   

DSCMessages should also be sent P2P. This needs to be checked.






[SHOAL-36] messages received not in same order as when sent Created: 30/Jan/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: sheetalv Assignee: hamada
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 36
Status Whiteboard:

shoal-shark-na

Tags:
Participants: hamada and sheetalv

 Description   

I did some extensive testing to see if the messages that are sent, get received by the other instances in
the same order.

I wrote a simple test for testing point to point message send - receive : tests/com/sun/enterprise/ee/
cms/tests/p2pmessagesend/P2PMessageSendAndReceive.java

This test uses the MessageAction Signal to receive messages. I started 2 instances (one is the sender
while the other is the receiver). 10 messages are sent by instance A in sequence. The receiver i.e.
instance B does not receive the messages in the same order that they were sent in.
The logs show that the same thread takes care of calling the processNotification() method for each
message received. So its not a threading issue.

The main line of code is ClusterManager.send(id, message) which then calls outputPipe.send(message).

Here are the logs :

Sender :
INFO: Sending messages...
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 0 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 1 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 2 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 3 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 4 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 5 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 6 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 7 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 8 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 9 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 10 sent from C1 to Group

Receiver :
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 0 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 1 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 3 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 5 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 4 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 7 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 2 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 8 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 6 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 10 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 9 from C1 to Group

I also tried another test. I added some code to ClusterManager.main() to send and receive messages
using the ClusterMessageListener model. The ClusterMessageListener's handleClusterMessage() method
is implemented in GroupCommunicationProviderImpl which puts the incoming message into
MessageQueue which is a FIFO order queue. The MessageWindow then takes the message and goes
through MessageAction API which is a single thread.
In this case, the logs are showing a different thread calling the ClusterManager.pipeMsgEvent() every
time. So looks like the PipeMsgListener is spawning a new thread every time it calls pipeMsgEvent().

I tried checking the order of the messages received via the Listener model in the 1st case as well, as
mentioned above . The sequence of messages received by
ClusterMessageListener.handlerClusterMessage() and the MessageAction signal API is the same. So the
underlying Jxta layer seems to be sending the messages out of order.

main method changes to ClusterManager :

public static void main(final String[] argv) {
JxtaUtil.setupLogHandler();
LOG.setLevel(Level.INFO);
final String name = System.getProperty("INAME", "instanceName");
final String groupName = System.getProperty("GNAME", "groupName");
LOG.log(Level.INFO, "Instance Name :" + name);
final Map props = getPropsForTest();
final Map<String, String> idMap = getIdMap();
final List<ClusterViewEventListener> vListeners =
new ArrayList<ClusterViewEventListener>();
final List<ClusterMessageListener> mListeners =
new ArrayList<ClusterMessageListener>();
vListeners.add(
new ClusterViewEventListener() {
public void clusterViewEvent(
final ClusterViewEvent event,
final ClusterView view) {
//LOG.log(Level.INFO, "event.message", new Object[]{event.getEvent().toString()});
//LOG.log(Level.INFO, "peer.involved", new Object[]{event.getAdvertisement().toString ()});
//LOG.log(Level.INFO, "view.message", new Object[]{view.getPeerNamesInView().toString ()});
}
});
mListeners.add(
new ClusterMessageListener() {
public void handleClusterMessage(
final SystemAdvertisement id, final Object message) { LOG.log(Level.INFO, id.getName()); LOG.log(Level.INFO, "SHEETAL : message received = " + new String(((GMSMessage) message).getMessage())); }
}
);
final ClusterManager manager = new ClusterManager(groupName,
name,
idMap,
props,
vListeners,
mListeners);
manager.start();
//manager.waitForClose();
if (System.getProperty("TYPE").equals("sender")) {
final Object waitLock = new Object();
LOG.log(Level.INFO, "wait 10 secs to shutdown");
synchronized (waitLock) {
try { waitLock.wait(10000); } catch (InterruptedException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. }
}
LOG.log(Level.INFO, "Sending messages...");
final ID id = manager.getID("client2");
for (int i = 0; i <= 10; i++) {
final GMSMessage gMsg = new GMSMessage(name,
MessageFormat.format("P2PMsgSendReceive : message {0} from {1} to {2}", i, name,
groupName).getBytes(),
groupName, Long.getLong("10"));
try { manager.send(id, gMsg); LOG.info("Message " + i + " sent from " + name + " to " + groupName); } catch (IOException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } }
}
manager.waitForClose();
} else if (System.getProperty("TYPE").equals("receiver")) {
final Object waitLock = new Object();
LOG.log(Level.INFO, "wait 30 secs to shutdown");
synchronized (waitLock) {
try { waitLock.wait(30000); } catch (InterruptedException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. }
}
manager.waitForClose();
}
System.exit(0);
}

run_client.sh :

java Dcom.sun.management.jmxremote -DINAME=client$1 -DTYPE=$2 -cp ./lib/jxta.jar:dist/shoal
gms.jar com.sun.enterprise.jxtamgmt.ClusterManager



 Comments   
Comment by sheetalv [ 09/Jul/08 02:50 PM ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 12:55 PM ]

Requires more input from the JXTA team.





[SHOAL-39] HealthMonitor should report if the Network Interface of the local peer is down Created: 01/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 39
Tags:
Participants: sheetalv

 Description   

This will provide an additional layer of failure reporting which will help
diagnose problems for customers.

JDK 6 provides a facility for this such as the NetworkInterface API.



 Comments   
Comment by sheetalv [ 09/Jul/08 02:51 PM ]

Changing to Enhancement





[SHOAL-41] Add support in Shoal for passing in cert stores Created: 11/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 41
Tags:
Participants: sheetalv and shreedhar_ganapathy

 Description   

Jxta and other service providers provide a notion of encryption through various
means. The Properties object that passes in configurational data to service
provider backends should pass in a certstore to the Jxta service provider so
that end to end security can be optionally provided.






[SHOAL-44] accessing JXTA's System ADV information or equivalent Created: 19/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: mbien Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 44
Tags:
Participants: mbien and sheetalv

 Description   

JXTA stores in the system advertisement a lot of useful and never changing
information about the node's runtime environment. It would be great if Shoal
would provide this kind of immutable "node info" additional to the mutable "node
details" (DistributedStateCache).

proposed public API changes:
-node info getter in the GMS
-node info getter in Signal
-mechanism for adding custom values on node join

workaround with DistibutedStateCache possible
but:
-redundant communication
-values are not guaranteed to arrive at the same time



 Comments   
Comment by sheetalv [ 09/Jul/08 03:04 PM ]

assigning to self





[SHOAL-49] Provide a converse to JoinedAndReady when consuming app did not get ready Created: 25/Mar/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 49
Tags:
Participants: sheetalv and shreedhar_ganapathy

 Description   

When the consuming application or product is ready to process its operations, it
can use Shoal's new JoinedAndReady reporting facility to let group members know
of this state.
The converse state of this situation may be a valuable piece of information for
administrative or monitoring applications.

If the application could not get into the joined and ready state for any reason
(for instance, an application server consuming Shoal could not complete its
startup and failed midway), then such an unready state can be conveyed through a
notification that specifically identifies this state.

Need an appropriate name for such a notification so it is meaningful.



 Comments   
Comment by shreedhar_ganapathy [ 25/Mar/08 11:00 AM ]

..





[SHOAL-56] Document Configuration Settings Available to users Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 56
Tags:
Participants: sheetalv and shreedhar_ganapathy

 Description   

We need to document configuration settings that are available to users with
clear explanations on what they are and in some cases under what circumstances
these should be used.



 Comments   
Comment by sheetalv [ 09/Jul/08 02:55 PM ]

marking as Enhancement

Comment by sheetalv [ 09/Jul/08 03:05 PM ]

assigning to self





[SHOAL-54] gms.getGroupHandle().getGroupLeader() throws NullPointerException Created: 03/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: leehui Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Windows XP
Platform: Windows


Issuezilla Id: 54
Status Whiteboard:

shoal-shark-na

Tags:
Participants: leehui and sheetalv

 Description   

If invoke gms.getGroupHandle().getGroupLeader() at once after gms.join(), the
application will throw NullPointerException occasionally, especially when you
run the application in console from command line.



 Comments   
Comment by sheetalv [ 09/Jul/08 02:53 PM ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 03:00 PM ]

assigning to myself





[SHOAL-57] Provide a JMX MBean to list and configure configuration settings Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 57
Tags:
Participants: sheetalv and shreedhar_ganapathy

 Description   

A JMX MBean to list and configure Shoal's providers would be very useful from a
management standpoint.

Additionally, this MBean could also provide runtime statistics ranging from
number of views, current views to request/response metrics.
Adding a placeholder RFE for this purpose.



 Comments   
Comment by sheetalv [ 09/Jul/08 03:05 PM ]

assigning to self





[SHOAL-59] when the node fails, the node details don't get removed from DSC. Created: 18/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: leehui Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 59
Status Whiteboard:

shoal-shark-na

Tags:
Participants: Joe Fialli, leehui and sheetalv

 Description   

The API gms.getAllMemberDetails() should return the same information on all
nodes. But when the node fails, the node details don't get removed from DSC. You
can use
gms.getAllMemberDetails() to get failed node's details. The relative test code
is in shoal's test directory, and named as
com.sun.enterprise.shoal.memberdetailstest.MemberDetailsTest.



 Comments   
Comment by sheetalv [ 09/Jul/08 02:56 PM ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 31/Jul/09 11:10 AM ]

Fix for this would be the following:

Register a shoal FAILURE event handler.
When a FAILURE is received, the distributed state cache should be flushed for
the failing instance.

*******

Possible bug if this is not fixed.
If a FENCE is left raised. (if an instance is performing recovery for another
instance and raises a FENCE while doing the repair) and the stale FENCE data is
left in distributed state cache, the false info will prevent another instance to
recover the instance that had the FENCED raised and then a fatal failure occurred.

Accessing stale dsc data in general does not cause bugs, but the above would be
a bug.





[SHOAL-64] add AtomicBoolean for controlling the started variable Created: 26/Jun/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 64
Tags:
Participants: sheetalv

 Description   

Same problem in both ClusterManager and HealthMonitor 's start().
Make sure that the AtomicBoolean is set in the beginning of the start() method.






[SHOAL-72] need a fix for the "unable to create messenger" IOException Created: 24/Jul/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 72
Status Whiteboard:

shoal-shark-na

Tags:
Participants: Joe Fialli and sheetalv

 Description   

The "unable to create messenger" IOException occurs in different scenarios. One of the scenarions is when
an instance is killed. before instance B can know that the instance A has been killed, it tries to send a
message via ClusterManager.send() (could be to sync the DSC or for some other reason).

When such an IOException occurs, the Shoal code should check which instance is supposedly down. Then
the code wait for a little while before finding the state that that instance is in. If the state is
alive/aliveandready, the message should be sent again as a retry. If the instance is in in_retry_mode (i.e. it
has'nt been deemed in_doubt/failed yet), then the right way of dealing with this should be decided.



 Comments   
Comment by Joe Fialli [ 28/Jul/08 08:08 AM ]

Short term solution described in shoal issue 73.

Change platform to ALL since issue is not specific to MAC os.

Comment by sheetalv [ 28/Jul/08 09:35 AM ]

short term solution in issue 73 has been added to Sailfin 0.5.

Comment by sheetalv [ 31/Jul/08 10:21 AM ]

assigning to Joe.





[SHOAL-74] potential to miss FAILURE_NOTIFICATION when multiple instances killed at same time Created: 19/Aug/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 74
Tags:
Participants: Joe Fialli

 Description   

Bug was uncovered during a code review. The bug is a FAILURE notification could
be missed when 2 or more more instances are killed at same time. (Note that
given the race condition between node agent restarting a killed instance and the
failure notification, only a test that kills the node agent and then kills
instances can be assured of seeing a FALIURE_NOTIFICATION for each server
instance killed. A node agent can restart a server instance before shoal
reports it as FAILED.)

HealthMonitor.InDoubtPeerDetector.processCacheUpdate() iterates over all
instances in cluster checking if any are in doubt. If one instance is detected
to be indoubt, HealthMonitor.InDoubtPeerDetector.determineInDoubtPeers() notifies
the FailureVerifier thread to process current cache looking for InDoubtPeers to
verify which instance should have FAILURE_NOTIFICATION sent.

synchronized (verifierLock) { verifierLock.notify(); LOG.log(Level.FINER, "Done Notifying FailureVerifier for " + entry.adv.getName()); }
The notification signal from InDoubtPeerDetector thread to FailureVerifier
thread is the weak link in this bug. When multiple failures happen at once, the
code is currently written to act on the first instance failure immediately. The
InPeerDoubtDetector should iterate over all instances AND if one OR more
instances are in doubt, then it should notify the FailureVerifier thread to run
over all instances in cluster cache.

Bug could be that InDoubtPeerDetector, runs twice, one notifiying
FailureVerifier() to run on instance cache and it detects first killed instance.
The second time the InDoubtPeerDetector runs, it could notify the
FailureDetector while it is still working on verifiying first failure (with a
snap shotted cache). The second notify to a running FailureVerifier thread will
have no impact and the FAILURE_NOTIFICATION for the second killed server
instance will be detected much later when the next failure occurs or the client
is shutdown.






[SHOAL-82] notifying cluster view event is not thread safe Created: 12/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 82
Tags:
Participants: carryel, Joe Fialli and shreedhar_ganapathy

 Description   

ClusterViewManager.notifyListeners() can be executed on multi-threads when many
members join the same group concurrently.

Though there are no member's failures, you can see the following log.

------------------------------------
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f
group:TestGroup
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (5d3280a2-a0c5-4ae2-8d41-
d59b57400b8f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03

2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (aeea918f-571b-463b-bfa6-
55c536df0d11) : Members in view for (before change analysis) are :
(a)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (addb1dbe-06cf-43b8-8903-
78605f29091f) : Members in view for (before change analysis) are :
(b)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (fae1414d-702a-42fd-8c7d-
6ffabe8b2e69) : Members in view for (before change analysis) are :
(c)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:20 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (42b22147-7683-481f-a9f4-
85ba5a2b847f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

...
------------------------------------

This log means that five members join "TestGroup"

1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

And this log is printed in ViewWindow based on the viewQueue when new view is
observed.

But above log message, you can see that (a), (b) and (c)'s order are strange.

Because there are no failures, I think that member's number should be increased
gradually(or (a)'num <= (b)'s num <= (c)'s num).

The following code is ClusterViewManager's notifyListeners() method.


void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners) { elem.clusterViewEvent(event, getLocalView()); }
}


getLocalView() is thread safe with viewLock but ClusterViewEventListener's
clusterViewEvent() is not thread safe.

The following code is GroupCommunicationProviderImpl's clusterViewEvent()
method which implements ClusterViewEventListener interface.


public void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final
ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try { viewQueue.put(ePacket); } catch(InterruptedExcetion e) { ... }

}
-----

I think that local view's snapshot(getLocalView()'s return value) and
viewQueue.put() should be atomic like this.
-----
void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners) {
synchronized( elem ) { elem.clusterViewEvent(event, getLocalView()); }
}
}

or

public synchronized void clusterViewEvent(final ClusterViewEvent
clusterViewEvent, final ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try { viewQueue.put(ePacket); } } catch(InterruptedExcetion e) { ... }

}

(In my opinion, I think that the former is better because clusterViewEvent()
can be implemented variously)


In other words,
-------------------------------------------------------------------
getLocalView() --> local view's snapshot --> (hole) --> insert view queue
-------------------------------------------------------------------

As you can see above, before EventPacket is inserted into view queue, there is
some hole. So we can remove the hole with synchronized block or individual lock
object.
If the hole is removed, I think that ViewWindow can receive local view capture
from queue correctly.



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 02:16 PM ]

..





[SHOAL-84] JXTA Exception on network disconnect Created: 18/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: alireza2008 Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 84
Tags:
Participants: alireza2008, Joe Fialli and shreedhar_ganapathy

 Description   

I encountered the Exception below during the network disconnection tests--I had
two members in a group on separate hosts within the same subnet (all default
JXTA parameters), then I unplugged the network connection from one of the host
where I received the following exception:

Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:GMSTestMonitor...
Nov 13, 2008 11:51:45 AM net.jxta.endpoint.ThreadedMessenger run
SEVERE: Uncaught throwable in background thread
java.lang.NoClassDefFoundError: net/jxta/impl/endpoint/router/RouterMessenger
at
net.jxta.impl.endpoint.router.EndpointRouter.getMessenger(EndpointRouter.java:2336)
at
net.jxta.impl.endpoint.EndpointServiceImpl.getLocalTransportMessenger(EndpointServiceImpl.java:1566)
at
net.jxta.impl.endpoint.EndpointServiceImpl.access$200(EndpointServiceImpl.java:106)
at
net.jxta.impl.endpoint.EndpointServiceImpl$CanonicalMessenger.connectImpl(EndpointServiceImpl.java:380)
at net.jxta.endpoint.ThreadedMessenger.connect(ThreadedMessenger.java:551)
at net.jxta.endpoint.ThreadedMessenger.run(ThreadedMessenger.java:389)
at java.lang.Thread.run(Unknown Source)
Nov 13, 2008 11:51:48 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group GMSTestGroup : Members in view for
(before change analysis) are :
1: MemberId: GMSTestResource, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033520D314DBB264715B
E83E86B57A610F803



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 02:15 PM ]

reassigned to Joe for fixing post HCF





[SHOAL-90] SEVERE ShoalLogger msg: World Peer Group could not be instantiated Created: 20/Jul/09  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


File Attachments: Text File instance1_secondtime.log    
Issuezilla Id: 90
Tags:
Participants: Joe Fialli

 Description   

A GMS client has a local cache stored in a directory called ".shoal".

If the file permissions of the directory or the content of the directory do not
allow themselves to be deleted (cleared) when the gms client starts up, one will
see the summary log message. Here are all the SEVERE messages one will see when
this occurs.

[#|2009-07-20T12:00:17.298-0400|SEVERE|Shoal|net.jxta.impl.cm.Cm|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=Cm;MethodName=<init>;|Unable
to Initialize databases
SEVERE: Unable to Initialize databases
[#|2009-07-20T12:00:17.327-0400|SEVERE|Shoal|net.jxta.impl.peergroup.StdPeerGroup|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=StdPeerGroup;MethodName=initFirst;|Error
during creation of local store
SEVERE: Error during creation of local store
[#|2009-07-20T12:00:17.328-0400|SEVERE|Shoal|net.jxta.peergroup.WorldPeerGroupFactory|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=WorldPeerGroupFactory;MethodName=newWorldPeerGroup;|World
Peer Group could not be instantiated.
SEVERE: World Peer Group could not be instantiated.
[#|2009-07-20T12:00:17.329-0400|SEVERE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=<init>;|World
Peer Group could not be instantiated.

These errors were generated with following file permissions for .shoal.
dhcp-ubur02-71-15:gms jf39279$ ls -lR .shoal
total 0
drwxr-xr-x 3 root admin 102 Jul 20 11:59 instance1

.shoal/instance1:
total 0
drwxr-xr-x 4 root admin 136 Jul 20 11:59 cm

.shoal/instance1/cm:
total 0
drwxr-xr-x 11 root admin 374 Jul 20 11:59 jxta-WorldGroup
drwxr-xr-x 9 root admin 306 Jul 20 11:59
uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302

.shoal/instance1/cm/jxta-WorldGroup:
total 3272
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsDesc.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsGID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersPID.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements-offsets.tbl
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements.tbl

.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302:
total 3200
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvDstPID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersPID.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements-offsets.tbl
rw-rr- 1 root admin 793088 Jul 20 11:59 advertisements.tbl
drwxr-xr-x 7 root admin 238 Jul 20 11:59 srdi

.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi:
total 5208
rw-rr- 1 root admin 12288 Jul 20 11:59 pipeResolverSrdi-JxtaPropagateId.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 pipeResolverSrdi-offsets.tbl
rw-rr- 1 root admin 792576 Jul 20 11:59 pipeResolverSrdi.tbl
rw-rr- 1 root admin 528896 Jul 20 11:59 routerSrdi-offsets.tbl
rw-rr- 1 root admin 528896 Jul 20 11:59 routerSrdi.tbl

****************



 Comments   
Comment by Joe Fialli [ 20/Jul/09 09:20 AM ]

This issue occurs when one runs a gmsclient as user1 in a directory and then
logs in as user2 in same directory and user2 does not have permission to delete
.shoal cache files created by user1 run of gmsclient.

WORKAROUND:
Remove .shoal files created by a user1 that does not provide permission to
delete files to a user2. (Easiest case to think of is "user1" is root and
"user2" is a non-privledged user in the system.)

Comment by Joe Fialli [ 20/Jul/09 09:25 AM ]

Following log shows that FINE level logging is noting that all attempts to
delete the .shoal cached file is failing. Better user feedback needs to be
provided that these deletes are failing and if not corrected by a system
administrator, the system will not be able to start properly. Startup should
end with a SEVERE error and an exception that results in gms client not
attempting to start up anymore until this file permission issue is resolved.

$ grep FINE instance1_secondtime.log | grep failed
[#|2009-07-20T12:00:17.031-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-AdvMSID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsDesc.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsGID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsMSID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsName.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-offsets.tbl|#]
[#|2009-07-20T12:00:17.036-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-PeersName.idx|#]
[#|2009-07-20T12:00:17.036-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-PeersPID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file .shoal/instance1/cm/jxta-WorldGroup/advertisements.tbl|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-AdvDstPID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-AdvMSID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-offsets.tbl|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-PeersName.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-PeersPID.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements.tbl|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi-JxtaPropagateId.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi-offsets.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/routerSrdi-offsets.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/routerSrdi.tbl|#]

Comment by Joe Fialli [ 20/Jul/09 09:27 AM ]

Created an attachment (id=19)
rungmsdemo.sh instance1 log file illustrating .shoal cache with file permissions not allowing cache to be deleted

Comment by Joe Fialli [ 20/Jul/09 12:23 PM ]

Steps to recreate issue (forgot to add this in initial submission)

  • Log in as root.
  • run ./rungmsdemo.sh in shoal/gms.
  • log out as root
  • run ./rungmsdemo.sh in shoal/gms a second time.
    redirect output to a file. (should be similar to log file attached to this issue)




[SHOAL-93] missing FailureNotificationSignal during network failure when non-master is isolated Created: 05/Sep/09  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: little_zizou Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


File Attachments: Text File Scenario1.zip     Text File Scenario2.zip    
Issuezilla Id: 93
Tags:
Participants: Joe Fialli and little_zizou

 Description   

I have been trying to use shoal with my application, assume I have a cluster
kind of setup with four nodes running on four different systems. If suddenly one
of the node goes out of network and it is not a master node, I get three FailureSuspectedSignals but not all three FailureNotificationSignals. If the
node which went out of network was a master node then, I get three FailureSuspectedSignals and FailureNotificationSignals. Is this not the way it
should behave even in the first case also.



 Comments   
Comment by Joe Fialli [ 08/Sep/09 01:17 PM ]

More information is necessary to research this issue.

1. Please describe what is meant by "out of network".
Is the network cable being pulled from the machine?

2. We have a shoal qe test that verifies that all failure notifications
are sent to surviving group members when a non-master node is killed
(via kill -9). The test verifies that all FAILURE notifications are sent.
(Tests are run on the main branch of shoal. Please confirm you are running
these tests.)

Please submit logs (by attaching a zip of log files) that illustrate your
issue. Logging of FINE would be sufficient to follow what is occuring.

Comment by little_zizou [ 14/Sep/09 03:28 AM ]

Created an attachment (id=20)
Scenario 1 Testcase

Comment by little_zizou [ 14/Sep/09 03:29 AM ]

> More information is necessary to research this issue.
>
> 1. Please describe what is meant by "out of network".
> Is the network cable being pulled from the machine?

I have disabled my LAN network to simulate network failure kind of scenario
(similar to unplugging network cable).

> 2. We have a shoal qe test that verifies that all failure notifications
> ... Please confirm you are running these tests.)

I have not run the tests which you have mentioned but, instead I have written my
own test cases to verify joining nodes to the network and processing failure
notifications.

TestCase Description:
We have 3 systems with 3 shoal clients (Client1, Client2 & Client3), each
client running on a different system with member token names as server1, server2
and server3 respectively, all in the same group.

Scenario 1:
server2 and server3 are started before server1, now when we disable network on
server1, I could see 2 FailureSuspectedSignals and 2 FailureNotificationsignals
(for server2 and server3 respectively), as expected.

Scenario 2:
Now we have 3 clients running on 3 different systems, but the name of member
token which joins the group as "server1" is renamed as "server5".

Systems are started just like in the previous case. server2 and server3 are
started before server5, and disabled the LAN on server5. This time I could see 2
FailureSuspectedSignals, but only one FailureNotificationSignal.

I have attached the test sources and logs of both Scenario1 and Scenario2 for
your reference.

Comment by little_zizou [ 14/Sep/09 03:30 AM ]

Created an attachment (id=21)
Scenario 2 TestCase

Comment by Joe Fialli [ 14/Sep/09 12:01 PM ]

Issue understood.

Code in question is a detected masterFailed and the fact that
only the new master is allowed to announce the failure.

private void assignAndReportFailure(final HealthMessage.Entry entry) {
<deleted non-relevant code>
final boolean masterFailed = (masterNode.getMasterNodeID()).equals(entry.id);
if (masterNode.isMaster() && masterNode.isMasterAssigned()) { <deleted non-relevant code> } else if (masterFailed) {
//remove the failed node
LOG.log(Level.FINE, MessageFormat.format("Master Failed. Removing System
Advertisement :{0} for master named {1}", entry.id.toString(),
entry.adv.getName()));
manager.getClusterViewManager().remove(entry.adv);
masterNode.resetMaster();
masterNode.appointMasterNode();
if (masterNode.isMaster() && masterNode.isMasterAssigned()) {
LOG.log(Level.FINE, MessageFormat.format("Announcing Failure Event
of {0} for name {1}...", entry.id, entry.adv.getName()));
final ClusterViewEvent cvEvent = new
ClusterViewEvent(ClusterViewEvents.FAILURE_EVENT, entry.adv);
masterNode.viewChanged(cvEvent);
}
}
cleanAllCaches(entry);
}
}

To avoid multiple reports of a FAILURE, only the master is typically allowed to
report failure to rest of cluster. For Scenario 2, when the network lan is
disabled on "server5", the reporter of this issue is looking for both "server2"
and "server3" to have failure events. While the heartbeat failure detection
does detect both server2 and server3 are failed (from server5's point of view,
they are both running in their own subnet)in submitted logs for scenario2, the
failure is not reported for server2 since server3 is calculated to be the new
master for server5. Unfortunately, server3 also can not communicate with
"server5". Thus the missing announce of the failure of server2. When "server3"
is detected to have failed, then server5 is the sole instance left in its subnet
cluster, it becomes the master and reports that server3 has failed.

To summarize, heartbeat failure detection is working correctly. "server5" view
of cluster is correct, just the failure notification for "server2" is missing in
this scenario. Reason for missing failure is in code fragment included above.

Comment by Joe Fialli [ 14/Sep/09 12:02 PM ]

started analysis of issue from submitted logs.
see previous comments made when reassigning issue to myself.

Comment by Joe Fialli [ 14/Sep/09 12:13 PM ]

Summary of issue reported for scenario 2 submitted on Sept 14th.

When the network lan fails for a non-master instance of a group,
the submitter of this issue expects to receive a FAILURE notification for
each instance on the isolated subnet that is no longer reachable.

Shoal's heartbeat failure detection is working to detect that the instances no
longer exist; however, isolated instance will not receive any failure
notifications about the no longer reachable members of the group until it
finally makes itself the master node.

For the submitted scenario 1, "server1" becomes the master node after "server2"
is no longer reachable. So no FAILURE events are dropped for that scenario.
Even though "server1" was not the master before lan is disabled,
"server1" is made the Master node for its subnet of one immediately due to
naming comparisions between it and the other remaining server names in gms group.





[SHOAL-104] [Javadocs] Add referential info to JoinedAndReadyNotificationSignal, and its related Action and ActionFactory on its use Created: 26/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 104
Tags:
Participants: Joe Fialli and shreedhar_ganapathy

 Description   

The JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs need referential information
that is mentioned in the javadoc for GroupManagementService#reportJoinedAndReadyState

This will help users understand and relate to how to use this construct in non-GlassFish server
applications.






[SHOAL-102] [Javadocs] Add reference info to JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs Created: 26/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Dhiru Pandey Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 102
Tags:
Participants: Dhiru Pandey and Joe Fialli

 Description   

The JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs need referential information
that is mentioned in the javadoc for GroupManagementService#reportJoinedAndReadyState

This will help users understand and relate to how to use this construct in non-GlassFish server
applications.



 Comments   
Comment by Joe Fialli [ 26/Mar/10 11:51 AM ]

agreed. will update javadoc accordingly.





[SHOAL-99] Add OSGi bundle headers to MANIFEST.MF Created: 09/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Major
Reporter: nickwi Assignee: sheetalv
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 99
Tags:
Participants: nickwi and sheetalv

 Description   

It would be nice to be able to deploy the standard Shoal distribution as an OSGi
bundle. This would require the addition of OSGi headers to the Shoal MANIFEST.MF
file.

This will also require some changes to JXTA - (I got it working by removing the
dependency on javax.security.cert.CertificateException and addition of OSGi
headers), although it appears from here
(https://jxta.dev.java.net/servlets/ReadMsg?list=dev&msgNo=1384) that JXTA 2.6
may already have OSGi support.



 Comments   
Comment by sheetalv [ 09/Mar/10 11:41 AM ]

In the branch SHOAL_1_1_ABSTRACTING_TRANSPORT, Shoal can be built as an OSGi module over Grizzly
as the transport layer. There were some issues with making Shoal OSGi-fied over JXTA. We are in the
evaluation stage for making Shoal over JXTA 2.6 OSGi-fied.





Generated at Sun Apr 20 02:51:23 UTC 2014 using JIRA 4.0.2#472.