[SHOAL-118] regression in cluster starting time. Created: 07/Jan/12  Updated: 09/Jan/12  Resolved: 09/Jan/12

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: None
Fix Version/s: current

Type: Bug Priority: Major
Reporter: zorro Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux



 Description   

build 16.

Cluster startup takes longer than expected intermittently.

Results:
http://aras2.us.oracle.com:8080/logs/gf31/gms//set_01_05_12_t_17_12_28/scenario_0002_Thu_Jan__5_17_21_17_PST_2012.html



 Comments   
Comment by Joe Fialli [ 09/Jan/12 ]

Fix committed in svn 1736.

Comment by Joe Fialli [ 09/Jan/12 ]

patch run of Glassfish Shoal GMS SQE test confirmed this fix in shoal-gms libraries.





[SHOAL-117] Support multiple shoal instances in a single JVM Created: 25/Nov/11  Updated: 25/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Arul Dhesiaseelan Assignee: shreedhar_ganapathy
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We have a requirement to run multiple Shoal instances per JVM. We believe Shoal does not support this as the GMSContext is assigned per group, not per server. We have implemented this to support GMSContext per server in the same group allowing multiple contexts coexist in the same JVM. We would be happy to contribute this patch to the Shoal project. Would anyone be interested in this patch?






[SHOAL-116] rejoin subevent is null in JoinedAndReadyNotificationSignal Created: 28/Feb/11  Updated: 20/Apr/11  Resolved: 20/Apr/11

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-16109 get-health command not showing rejoins Resolved

 Description   

Found this while recreating a rejoin for blogging. The rejoin happens and shows up in the log, but the rejoin event from JoinedAndReadyNotificationSignal.getRejoinSubevent() is null, which affects the output of the GlassFish 'asadmin get-health' command.

Joe already has a fix for this that I'll test soon to commit to the trunk.



 Comments   
Comment by Joe Fialli [ 01/Mar/11 ]

committed fix in trunk.

Comment by Bobby Bissett [ 02/Mar/11 ]

Verified in trunk, rev 1543. Will mark the GF issue fixed when we integrate next.

Comment by Bobby Bissett [ 20/Apr/11 ]

Just opening so I can mark fixed with a specific version.





[SHOAL-115] MultiCastReceiverThread is not clearing buffer in DatagramPacket Created: 28/Feb/11  Updated: 03/Mar/11  Resolved: 03/Mar/11

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: current

Type: Bug Priority: Minor
Reporter: Bobby Bissett Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-16108 validate-multicast tool can give dupl... Resolved

 Description   

In the thread's run method, the same byte buffer is being used for each DatagramPacket created. So the data returned from receive method can contain extra text, which throws of the set of host strings. So hosts will show up more than once in the output of the validate multicast tool.

Simple fix is just to create a new byte array for each packet (or clear the old one). Without this fix, the tool still gives accurate results – it can just include extra copies of previous entries. I have a GF issue filed for integration, and this one for the actual fix.



 Comments   
Comment by Bobby Bissett [ 03/Mar/11 ]

Fixed in revision 1544.





[SHOAL-114] validate-multicast tool should handle unexpected data in receiver thread Created: 01/Feb/11  Updated: 03/Mar/11  Resolved: 03/Mar/11

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: 1.1
Fix Version/s: current

Type: Improvement Priority: Minor
Reporter: Bobby Bissett Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If the validate multicast tool is run at the same time as another process (e.g. GlassFish DAS) using the same port, it will choke on the data received since the format is unexpected. The code in MulticastTester#trimDataString should check the format of what it's parsing first. If it's not something expected, then a warning message should be logged that the tool received unexpected information.



 Comments   
Comment by Bobby Bissett [ 03/Mar/11 ]

Fixed in revision 1544.





[SHOAL-113] Dangling Threads Prevent Graceful JVM Shutdown Created: 15/Dec/10  Updated: 09/Nov/11  Resolved: 09/Nov/11

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: 1.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: erich_liebmann Assignee: Joe Fialli
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We experience problems to gracefully shutdown the JVM because Shoal does not terminate all non-deamon threads upon shutdown.

We initiate Shoal shutdown via the following API call but this does not terminate the GMSContext "viewWindowThread" and "messageWindowThread" and the Router "signalHandlerThread" (non-daemon) threads.

haGroupManagementService.shutdown(shutdownType.INSTANCE_SHUTDOWN);

To work around this problem we had to implement the following hack:

haGroupManagementService.shutdown(shutdownType.INSTANCE_SHUTDOWN);
DirectFieldAccessor gmsContextDirectFieldAccessor = new DirectFieldAccessor(gmsContext);
gmsContextDirectFieldAccessor.setPropertyValue("shuttingDown", true);
((Thread)gmsContextDirectFieldAccessor.getPropertyValue("viewWindowThread")).interrupt();
((Thread)gmsContextDirectFieldAccessor.getPropertyValue("messageWindowThread")).interrupt();

Router router = gmsContext.getRouter();
DirectFieldAccessor routerDirectFieldAccessor = new DirectFieldAccessor(router);
((Thread)routerDirectFieldAccessor.getPropertyValue("signalHandlerThread")).interrupt();

Kindly fix this shutdown issue. Please let me know should you require a proper source code patch for this.



 Comments   
Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.

Comment by Joe Fialli [ 09/Nov/11 ]

this shoal gms issue would prevent glassfish v2 and v3.1 and higher application server from exiting.
It has been fixed in all versions of shoal gms.





[SHOAL-112] ability to configure GMS member to use SSL Created: 12/Nov/10  Updated: 31/Oct/12

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 112

 Description   

Provide a GMS property that enables one to configure a GMS member to use SSL for
its TCP communications. Both supported transports, grizzly and jxta, have the
ability to enable SSL for point to point communication.






[SHOAL-111] capability to configure requirement for authentication for GMS member to join group Created: 12/Nov/10  Updated: 12/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 111

 Description   

Leverage certificate based authentication (JAAS) to validate whether a GMS
member should be allowed to join a GMS group.



 Comments   
Comment by Joe Fialli [ 12/Nov/10 ]

adjustment to subject title to state that there needs to be a configuration
capability to require authentication for GMS join





[SHOAL-110] enabling a second network interface in the bios causes view change issues Created: 19/Oct/10  Updated: 20/Oct/10  Resolved: 20/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Rajiv Mordani Assignee: Joe Fialli
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 110

 Description   

I enabled my second interface in the bios (didn't assign an ip or anything to the
interface - just enabled in the bios) and then when I start a cluster GMS does not
seem to work well in the scenario thus causing shoal cache to not work well
either. Once I disabled the interface in bios everything started working.



 Comments   
Comment by Joe Fialli [ 19/Oct/10 ]

request for steps to recreate this issue.

specifically, what was the order for the following operations

  • asadmin start-domain
  • asadmin create-cluster
  • assadmin start-cluster
  • enabling 2nd network interface

We can not support changes to network interfaces in the middle of first 3
methods above when GMS-BIND-INTERFACE-ADDRESS-cluster-name is not being used on
DAS and all instances in the cluster.

The dynamic finding of first network address can change between DAS joining
cluster and instances being started via start-cluster if network interface
change is made in between.

Comment by Joe Fialli [ 19/Oct/10 ]

additionally need to know if GMS-BIND-INTERFACE-ADDRESS is being set for
clustered instances and/or DAS. (GMS-BIND-INTERFACE-ADDRESS should be set for
all clustered instances AND the DAS.)

Comment by Joe Fialli [ 20/Oct/10 ]

Enhancement scheduled for glassfish 3.2 described by
https://glassfish.dev.java.net/issues/show_bug.cgi?id=13056 would enable
detection of misconfiguration of network and/or glassfish cluster.

Not possible to diagnose this in gms since it is dynamic runtime configuration
and there is no place defined that all instances in a cluster are configured.
But in glassfish, the domain.xml has static configuration of clustered instances
and DAS. That info can be used by a tool (3.2 extension of "asadmin
validate-multicast" to validate if multicast is working between all members of
glassfish cluster). If there is an issue detected, user is required to figure
out why multicast is not working as glassfish cluster is configured.





[SHOAL-109] optimize virtual broadcast message send Created: 19/Aug/10  Updated: 19/Aug/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 109

 Description   

Broadcast that iterates over each active instance and sends over TCP is
inefficiently serializing the payload each time it sends to an instance.

When udp broadcast is used, the payload of gms send message is serialized once
and then broadcast to all instances in the cluster. Correct this inefficiency
since DistributedStateCache and GroupHandle.sendMessage(String targetComponent,
bytes[]) serializes the
GMSMessage object FOR EACH INSTANCE in cluster.

This change will not impact GMS notifications or heartbeats since they rely on
udp broadcast of gms sendMessage.






[SHOAL-108] setting Bind Interface Address in grizzly transport Created: 16/Jul/10  Updated: 27/Oct/10  Resolved: 27/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Bobby Bissett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 108

 Description   

GMS is not detecting an invalid GMS_BIND_INTERFACE_ADDRESS which results in a No
available ports exist" message.
need to fix this with a SEVERE failure of bad BIND_INTERFACE_ADDRESS

The log messages generated are:
#|2010-07-16T09:02:32.689-0700|SEVERE|Shoal|ShoalLogger|_ThreadID=10;_ThreadName=main;ClassName=NetworkUtility;MethodName=getAvailableTCPPort;|Fatal
error. No available ports exist for 10.5.217.120 in range 9090 to 9120|#]

[#|2010-07-16T09:02:32.691-0700|SEVERE|Shoal|GMSAdminCLI|_ThreadID=10;_ThreadName=main;ClassName=GMSAdminCLI;MethodName=registerAndJoinCluster;|Exception
occured :com.sun.enterprise.ee.cms.core.GMSException: failed to join group
testgroup|#]



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]
      • Issue 105 has been marked as a duplicate of this issue. ***
Comment by Bobby Bissett [ 27/Oct/10 ]

Taking this one.

Comment by Bobby Bissett [ 27/Oct/10 ]

This is fixed Shoal in revision 1325 by adding a method
NetworkUtilty#isBindAddressValid. However, it won't be called from GlassFish
until the next integration. See GF issue
https://glassfish.dev.java.net/issues/show_bug.cgi?id=14006 for information on
the integration.





[SHOAL-107] MasterNode ensure delivery of GMS notifications over UDP Created: 15/Jun/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 107

 Description   

GMS membership notifications such as JOIN, JOIN_AND_READY, FAILURE,
PLANNED_SHUTDOWN, FAILURE_SUSPECTED, GroupLeadership are broadcasted from
MasterNode over UDP. A protocol will be developed between masters and other
members of group to ensure that this event is delivered and, if not, master will
resend the event to ALIVE instances that have not acked receiving the notification.

Currently, ensuring that the MasterNode is not a heavily loaded application,
(such as Domain Application Server in Glassfish that does not run apps) and
configuring via OS tuning of UDP buffers has ensured UDP messages are not dropped.
Addressing this issue will provide robustness of event delivery w/o requiring OS
tuning or partition of application load from Shoal GMS MasterNode.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fix checked in.

The master sends along latest MasterViewID with every heartbeat message it
broadcasts. The gms group members record each MasterChangeEvent MasterViewID it
has received. When gms group member detects that it has not received a specific
masterViewID, it requests the master resend to just itself (via more reliable TCP).

Tested this with simulated failure injection.
Wrote ReliableMulticast junit test.





[SHOAL-106] Grizzly transport: sendMessage gets a NPE in NIOContext.configureOpType Created: 30/Apr/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 106

 Description   

The following occurred in a gms developer test that a gms client sent
a message to another instance and the other instance was attempting to send
a reply back and it received the error below. We have a workaround
to get around the failure BUT it is just a short term hack to get testers by the
issue. Will be consulting with Grizzly developers on this issue to identify if
it is a misuse by Shoal or an issue needing resolution in grizzly.

Here is the stack trace.

java.lang.NullPointerException when calling registered application callback
method com.sun.enterprise.ee.cms.tests.GMSAdminAgent.processNotification. The
method should have handled this exception.
java.lang.NullPointerException
at com.sun.grizzly.NIOContext.configureOpType(NIOContext.java:431)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.notifyCallbackHandlerPseudoConnect(CacheableConnectorHandler.java:221)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.doConnect(CacheableConnectorHandler.java:168)
at
com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.connect(CacheableConnectorHandler.java:122)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyTCPConnectorWrapper.send(GrizzlyTCPConnectorWrapper.java:104)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyTCPConnectorWrapper.doSend(GrizzlyTCPConnectorWrapper.java:86)
at
com.sun.enterprise.mgmt.transport.AbstractMessageSender.send(AbstractMessageSender.java:34)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.send(GrizzlyNetworkManager.java:478)
at com.sun.enterprise.mgmt.ClusterManager.send(ClusterManager.java:458)
at
com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:316)
at
com.sun.enterprise.ee.cms.impl.base.GroupHandleImpl.sendMessage(GroupHandleImpl.java:128)
at
com.sun.enterprise.ee.cms.tests.GMSAdminAgent.processNotification(GMSAdminAgent.java:449)
at
com.sun.enterprise.ee.cms.impl.client.MessageActionImpl.processMessage(MessageActionImpl.java:86)
at
com.sun.enterprise.ee.cms.impl.client.MessageActionImpl.consumeSignal(MessageActionImpl.java:69)
at
com.sun.enterprise.ee.cms.impl.common.Router.notifyMessageAction(Router.java:377)
at
com.sun.enterprise.ee.cms.impl.common.Router.notifyMessageAction(Router.java:402)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.analyzeSignal(SignalHandler.java:128)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.handleSignal(SignalHandler.java:106)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.run(SignalHandler.java:91)
at java.lang.Thread.run(Thread.java:637)



 Comments   
Comment by Joe Fialli [ 30/Apr/10 ]

Was using Grizzly 1.9.19 beta2 when this occurred.

Comment by Joe Fialli [ 07/Oct/10 ]

NPE is fixed in Grizzly transport.





[SHOAL-105] enhance validation of GMS configuration property BIND_INTERFACE_ADDRESS Created: 31/Mar/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 105

 Description   

Configuration of BIND_INTERFACE_ADDRESS should validate that provided
IP-ADDRESS/HOSTNAME is a local IP address.



 Comments   
Comment by Joe Fialli [ 31/Mar/10 ]

accepting error check. system misbehaves and can not find a valid port for
server socket when BIND_INTERFACE_ADDRESS is set to non-local IP address
acciddently. Need to address this for better usability.

Comment by Joe Fialli [ 07/Oct/10 ]

duplicate

      • This issue has been marked as a duplicate of 108 ***




[SHOAL-104] [Javadocs] Add referential info to JoinedAndReadyNotificationSignal, and its related Action and ActionFactory on its use Created: 26/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 104

 Description   

The JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs need referential information
that is mentioned in the javadoc for GroupManagementService#reportJoinedAndReadyState

This will help users understand and relate to how to use this construct in non-GlassFish server
applications.






[SHOAL-103] [Javadocs] Add reference info to JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs Created: 26/Mar/10  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Dhiru Pandey Assignee: Joe Fialli
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 103

 Description   

The JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs need referential information
that is mentioned in the javadoc for GroupManagementService#reportJoinedAndReadyState

This will help users understand and relate to how to use this construct in non-GlassFish server
applications.



 Comments   
Comment by shreedhar_ganapathy [ 26/Mar/10 ]

Sorry forgot to logout after Dhirup and I were looking into something under his user id - this bug logged
as coming from him - but I filed it.

Sorry dhiru.

Comment by shreedhar_ganapathy [ 26/Mar/10 ]

Closing as invalid = Will open a new issue under my user id





[SHOAL-102] [Javadocs] Add reference info to JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs Created: 26/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Dhiru Pandey Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 102

 Description   

The JoinedAndReadyNotificationSignal, Action and ActionFactory javadocs need referential information
that is mentioned in the javadoc for GroupManagementService#reportJoinedAndReadyState

This will help users understand and relate to how to use this construct in non-GlassFish server
applications.



 Comments   
Comment by Joe Fialli [ 26/Mar/10 ]

agreed. will update javadoc accordingly.





[SHOAL-101] very intermittent - ABSTRACT_TRANSPORT BRANCH: dropped Shoal message(using Grizzly transport) in distributed system testing Created: 18/Mar/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File shoal_bug_101_instance105.log     Text File shoal_bug_101_instance106.log    
Issuezilla Id: 101

 Description   

Running HAMessageBuddyReplicationSimulator (see shoal workspace developer test
script runHAMessageBuddyReplicationSimulator.sh) on a distributed group of 9
instances, 1 out of 5 times in running entire test, there will be the
following message drop.

The test is confirming a dropped message when the 2 exceptions below occur in
server logs.

Message test output detecting a dropped message.

Never received objectId:45 msgId:248, from:106
---------------------------------------------------------------
106: FAILED. Confirmed (1) messages were dropped

Here is the matching exception.

[#|2010-03-18T11:18:41.831-0700|WARNING|Shoal|ShoalLogger|_ThreadID=26;_ThreadName=-WorkerThread(31);ClassName=NetworkUtility;MethodName=deserialize;|NetworkUtility.deserialized
current objects:
messages=

{NAD=com.sun.enterprise.ee.cms.impl.base.SystemAdvertisementImpl@e8f7fdef, targetPeerId=192.168.46.109:9130:2299:cluster1:n1c1m9, sourcePeerId=192.168.46.108:9130:2299:cluster1:n1c1m8}

failed while deserializing
name=APPMESSAGE
java.io.StreamCorruptedException: invalid type code: 58
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1356)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at
com.sun.enterprise.mgmt.transport.NetworkUtility.deserialize(NetworkUtility.java:419)
at
com.sun.enterprise.mgmt.transport.MessageImpl.readMessagesFromBytes(MessageImpl.java:233)
at
com.sun.enterprise.mgmt.transport.MessageImpl.parseMessage(MessageImpl.java:214)
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser.hasNextMessage(GrizzlyMessageProtocolParser.java:140)
at
com.sun.grizzly.filter.ParserProtocolFilter.execute(ParserProtocolFilter.java:139)
at
com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:135)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:102)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:88)
at
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:57)
at com.sun.grizzly.NIOContext.execute(NIOContext.java:510)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:357)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:257)
at
com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:194)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:129)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.dowork(FixedThreadPool.java:379)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.run(FixedThreadPool.java:360)
at java.lang.Thread.run(Thread.java:619)

#]

Mar 18, 2010 11:18:41 AM
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser
hasNextMessage
WARNING: hasNextMessage()
Thread:-WorkerThread(31),position:6744,nextMsgStartPos:0,expectingMoreData:true,hasMoreBytesToParse:false,error:false,msg
size:5405,message: MessageImpl[v1:CLUSTER_MANAGER_MESSAGE:NAD, Target:
192.168.46.109:9130:2299:cluster1:n1c1m9 , Source:
192.168.46.108:9130:2299:cluster1:n1c1m8,
com.sun.enterprise.mgmt.transport.MessageIOException: failed to deserialize a
message : name = APPMESSAGE

Have not seen this issue occur running checked in
runHAMessageBuddyReplicationSimulator.sh on single machine
with 10 instances in cluster. Will double check this by running it several times.

Also, verifying that there is no message drops when running shoal over jxta
transport.



 Comments   
Comment by Joe Fialli [ 18/Mar/10 ]

accepting issue.

Comment by Joe Fialli [ 18/Mar/10 ]

Created an attachment (id=22)
shoal logs running test runHAMessageBuddyReplicationSimulator with exception in grizzly transport layer receiving the message

Comment by Joe Fialli [ 18/Mar/10 ]

Created an attachment (id=23)
server log of instance sending message that was lost on instance106 - nothing in log that is helpful. Just added for completeness that issue is only showing on receiving side, no send error noted.

Comment by Joe Fialli [ 07/Oct/10 ]

The NPE is fixed in grizzly.





[SHOAL-100] Sometimes FailureRecoverySignal is not notified Created: 09/Mar/10  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 100

 Description   

When I tested FailureRecovery with GroupLeadershipNotification, I found
sometimes the signal was not notified.

For FailureRecovery, an appropriate recoverer should be selected for
FailureRecovery. But sometimes the recoverer was null when I debugged because
of wrong previousView in com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.java.

The com.sun.enterprise.ee.cms.impl.common.ViewWindow interface has following
methods and com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.java implements
those.

public interface ViewWindow {
List<GMSMember> getPreviousView();
List<GMSMember> getCurrentView();
...
}

But com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.java returns un-safe lists.

See the following code.

In ViewWindow.java

public List<GMSMember> getPreviousView() {
List<GMSMember> result = EMPTRY_GMS_MEMBER_LIST;
synchronized(views) {
final int INDEX = views.size() -2;
if(INDEX >= 0)

{ // views.get(INDEX) is a List and can be shared unexpectedly. result = views.get(INDEX); }

}
return result;
}
...
private void addGroupLeadershipNotificationSignal( ... ) {
...
signals.add( new GroupLeadershipNotificationSignalImpl( token,
getPreviousView(), getCurrentView(), ...);
}

In GroupLeadershipNotificationSignalImpl.java

public void release() throws SignalReleaseException {
if( previousView != null )
previousView.clear();
if( currentView != null )
currentView.clear();
...
}

As you see above, sometimes the view(previous or current view) could be shared
and cleared unexpectedly by other components.

Joe proposed the patch which ensures that all future manipulations of a view
snapshot are read-only. After the view snapshot is constructed, all future
manipulations are read-only and this change enforces that.
And I also removed "List.clear()" code in
GroupLeadershipNotificationSignalImpl.java because I think the logic is
unnecessary.



 Comments   
Comment by carryel [ 09/Mar/10 ]

This patch is applied to trunk/rev.825





[SHOAL-99] Add OSGi bundle headers to MANIFEST.MF Created: 09/Mar/10  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Major
Reporter: nickwi Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 99

 Description   

It would be nice to be able to deploy the standard Shoal distribution as an OSGi
bundle. This would require the addition of OSGi headers to the Shoal MANIFEST.MF
file.

This will also require some changes to JXTA - (I got it working by removing the
dependency on javax.security.cert.CertificateException and addition of OSGi
headers), although it appears from here
(https://jxta.dev.java.net/servlets/ReadMsg?list=dev&msgNo=1384) that JXTA 2.6
may already have OSGi support.



 Comments   
Comment by sheetalv [ 09/Mar/10 ]

In the branch SHOAL_1_1_ABSTRACTING_TRANSPORT, Shoal can be built as an OSGi module over Grizzly
as the transport layer. There were some issues with making Shoal OSGi-fied over JXTA. We are in the
evaluation stage for making Shoal over JXTA 2.6 OSGi-fied.





[SHOAL-98] sending a message that exceeds message limit causes exception in receiver Created: 22/Jan/10  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 98

 Description   

Using branch SHOAL_1_1_ABSTRACTING_TRANSPORT using grizzly as the transport if
you send a message with a payload that exceeds the maximum message size the
receiver of the message throws the following execption:

[#|2010-01-22T09:23:52.274-0500|INFO|Shoal|ShoalLogger|_ThreadID=16;_ThreadName=com.sun.enterprise.ee.cms.impl.common.Router
Thread;ClassName=SignalHandler;MethodName=run;|SignalHandler task named
com.sun.enterprise.ee.cms.impl.common.Router Thread exiting|#]

Jan 22, 2010 9:23:52 AM
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser
hasNextMessage
INFO: hasNextMessage()
Thread:-WorkerThread(6),position:8192,nextMsgStartPos:0,expectingMoreData:false,hasMoreBytesToParse:false,error:false,msg
size:9585,message: MessageImpl[v1:CLUSTER_MANAGER_MESSAGE:{}]
java.lang.Exception: too large message
at
com.sun.enterprise.mgmt.transport.grizzly.GrizzlyMessageProtocolParser.hasNextMessage(GrizzlyMessageProtocolParser.java:127)
at
com.sun.grizzly.filter.ParserProtocolFilter.execute(ParserProtocolFilter.java:139)
at
com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:135)
at
com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:102)
at
com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:88)
at
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53)
at
com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:57)
at com.sun.grizzly.NIOContext.execute(NIOContext.java:510)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:357)
at
com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:257)
at
com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:194)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:129)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.dowork(FixedThreadPool.java:379)
at
com.sun.grizzly.util.FixedThreadPool$BasicWorker.run(FixedThreadPool.java:360)
at java.lang.Thread.run(Thread.java:637)

The code used to do this is:
int size = 65000;
byte[] bArray = new byte[size];
bArray[0] = 'a';
int k = 1;
for (; k < size - 1; k++) {
bArray[k] = 'X';
}
bArray[k] = 'z';
try {
gh.sendMessage("TestComponent", bArray);
} catch (GMSException ge) {
}



 Comments   
Comment by Joe Fialli [ 05/Feb/10 ]

Throw MessageIOException when member calls sendMessage on a message that exceeds
message limit.

Comment by Joe Fialli [ 05/Feb/10 ]

Fixed checked into abstract transport branch. This bug was only in that branch.





[SHOAL-97] GroupHandle.sendMessage with Null targetComponentName results in exception being thrown Created: 21/Jan/10  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 97

 Description   

Based on the javadoc, "Specifying a null component name would result in the
message being delivered to all registered components in the target member instance."

When actually done via:
gh.sendMessage(null, "testMsg3".getBytes());

The receiving side gets the following exception:

[#|2010-01-21T14:55:23.953-0500|SEVERE|Shoal|ShoalLogger|_ThreadID=14;_ThreadName=com.sun.enterprise.ee.cms.impl.common.Router
Thread;ClassName=SignalHandler;MethodName=run;|exiting due unhandled exception
in thread com.sun.enterprise.ee.cms.impl.common.Router Thread
java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:334)
at
com.sun.enterprise.ee.cms.impl.common.Router.notifyMessageAction(Router.java:356)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.analyzeSignal(SignalHandler.java:128)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.handleSignal(SignalHandler.java:106)
at
com.sun.enterprise.ee.cms.impl.common.SignalHandler.run(SignalHandler.java:91)
at java.lang.Thread.run(Thread.java:637)

#]


 Comments   
Comment by Joe Fialli [ 05/Feb/10 ]

Fix checked into trunk and pluggable transport branch.

Sending to null targetComponent is no longer allowed.
Updated javadoc to reflect this change.

Comment by Joe Fialli [ 05/Feb/10 ]
      • Issue 85 has been marked as a duplicate of this issue. ***




[SHOAL-96] Re-joining a group throws a NPE exception. Created: 19/Jan/10  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: dhcavalcanti Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Linux


Issuezilla Id: 96

 Description   

If a peer join a group, then leaves the group, and then joins the group again,
the shoal framework throws a NullPointException.

Here is the code that produces the problem:

package sample.shoal;

import com.sun.enterprise.ee.cms.core.GMSConstants.shutdownType;
import com.sun.enterprise.ee.cms.core.GMSFactory;
import com.sun.enterprise.ee.cms.core.GroupManagementService;

public class Sample {

private static final String GROUP_NAME = "MyGroup";
private static final String PEER_NAME = "Peer";
private static final int WAIT_PERIOD = 7000;

public static void main(String[] args)
throws Exception

{ GroupManagementService gms = (GroupManagementService) GMSFactory.startGMSModule(PEER_NAME, GROUP_NAME, GroupManagementService.MemberType.CORE, null); gms.join(); Thread.sleep(WAIT_PERIOD); gms.shutdown(shutdownType.INSTANCE_SHUTDOWN); Thread.sleep(WAIT_PERIOD); gms.join(); Thread.sleep(WAIT_PERIOD); gms.shutdown(shutdownType.INSTANCE_SHUTDOWN); Thread.sleep(WAIT_PERIOD); System.exit(0); }

}

And here is the output log:

Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group MyGroup : Members in view for
MASTER_CHANGE_EVENT(before change analysis) are :
1: MemberId: Peer, MemberType: CORE, Address: urn:jxta:uuid-
59616261646162614A7874615032503384B6AF8DCB384ABA9DB5B80617CF675D03

Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT for Member: Peer of Group: MyGroup
Jan 19, 2010 3:52:50 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addGroupLeadershipNotificationSignal
INFO: adding GroupLeadershipNotification signal leaderMember: Peer of group:
MyGroup
Jan 19, 2010 3:52:55 PM com.sun.enterprise.jxtamgmt.ClusterViewManager addToView
WARNING: no changes from previous view, skipping notification of listeners for
cluster view event MASTER_CHANGE_EVENT from member: Peer group: MyGroup
Jan 19, 2010 3:52:55 PM com.sun.enterprise.jxtamgmt.MasterNode appointMasterNode
INFO: Assuming Master Node designation member:Peer for group:MyGroup
Jan 19, 2010 3:52:57 PM com.sun.enterprise.ee.cms.impl.jxta.GMSContext leave
INFO: Leaving GMS group MyGroup with shutdown type set to InstanceShutdown
Exception in thread "MessageWindowThread:MyGroup" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)
Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group MyGroup : Members in view for
MASTER_CHANGE_EVENT(before change analysis) are :
1: MemberId: Peer, MemberType: CORE, Address: urn:jxta:uuid-
59616261646162614A7874615032503384B6AF8DCB384ABA9DB5B80617CF675D03

Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT for Member: Peer of Group: MyGroup
Jan 19, 2010 3:53:04 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addGroupLeadershipNotificationSignal
INFO: adding GroupLeadershipNotification signal leaderMember: Peer of group:
MyGroup
Jan 19, 2010 3:53:09 PM com.sun.enterprise.jxtamgmt.ClusterViewManager addToView
WARNING: no changes from previous view, skipping notification of listeners for
cluster view event MASTER_CHANGE_EVENT from member: Peer group: MyGroup
Jan 19, 2010 3:53:09 PM com.sun.enterprise.jxtamgmt.MasterNode appointMasterNode
INFO: Assuming Master Node designation member:Peer for group:MyGroup
Jan 19, 2010 3:53:11 PM com.sun.enterprise.ee.cms.impl.jxta.GMSContext leave
INFO: Leaving GMS group MyGroup with shutdown type set to InstanceShutdown



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fixed.

Added a junit regression test in GroupManagementServiceImplTest for this scenario.





[SHOAL-95] Stale ClusterView in Master due to no HeatlhMessages Created: 05/Jan/10  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 95

 Description   

Issue was initially reported to shoal.dev.java.net by CC'ed email address.
Related email: https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=233
The issue is an edge case occurring during cluster startup.

Problem was reported that when starting up 20 instances for a cluster, sometimes
it was observed that an instance was in view of Master but the instance did not
exist and the instance was not in HealthMonitor so Master never attempted to
check if instance existed. Master continued to propogate existing of this
non-existence member and never performed heartbeat failure detection to verify
the existence or non-existence of instance. This is the
bug that will be addressed when this issue is resolved. The Master will always
perform Heartbeat Failure detection for all instances in its clusterview. This
will be addressed by synchronizing HealthMonitor knowledge of each instance in
clusterview.

Hypothesis is that an instance failed during startup between sending out its
MasterNodeQuery but before sending its first HealthMessage of STARTING.

Here is an observed WARNING message in server log when this occurs.
This error message is consistent to what one would see when sending to an
instance that just failed. Since Shoal heartbeat failure detection takes some
time to detect a FAILURE (about 7-8 seconds with default values), this is not
something that should be a concern. However, if this message is seen beyond 8
second window it should take to detect failure, then it is a concern.
>> WARNING: ClusterManager.send : sending of message
>> net.jxta.endpoint.Message@11882231(2)

{270}

failed. Unable to create an
>> OutputPipe for
>> urn:jxta:uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303
>> route = null
>> java.io.IOException: Unable to create a messenger to
>>
jxta://uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303/PipeService/urn:jxta:uuid-63B5938B46F147609C1C998286EA5F3B6E0638B5DF604AEEAC09A3FAE829FBE804

>>
>> at
>>
net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:238)

>>
>> at
>>
net.jxta.impl.pipe.BlockingWireOutputPipe.<init>(BlockingWireOutputPipe.java:154)
>>
>> at
>>
net.jxta.impl.pipe.BlockingWireOutputPipe.<init>(BlockingWireOutputPipe.java:135)
>>
>> at
>> net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:503)
>>
>> at
>> net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:435)
>>
>> at
>>
net.jxta.impl.pipe.PipeServiceInterface.createOutputPipe(PipeServiceInterface.java:170)

>>
>> at
>> com.sun.enterprise.jxtamgmt.ClusterManager.send(ClusterManager.java:505)
>> at
>>
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:254)

>>
>> at
>>
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.sendMessage(DistributedStateCacheImpl.java:500)

>>
>> at
>>
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.addToRemoteCache(DistributedStateCacheImpl.java:234)

>>

Note: this issue does not impact Shoal/GMS in Glassfish/Sailfin Application
Server.If App Server fails during startup, the app always catches exception and
sends a planned shutdown notification to cluster.



 Comments   
Comment by Joe Fialli [ 06/Jan/10 ]

Fix checked into trunk and transport branch.
Fine log message in ClusterViewManager.add(addToView) verified fix was working.





[SHOAL-94] IndexOutOfBoundException setting VIRTUAL_MULTICAST_URI_LIST to one uri Created: 14/Dec/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 94

 Description   

IndexOutOfBoundException occured on following line in NetworkManager.java, line 184.

LOG.config("VIRTUAL_MULTICAST_URI_LIST=" + virtualMulticastURIList + "
rendezvousSeedURIs.get(0)=" + rendezvousSeedURIs.get(1));
}
Workaround is to set VIRTUAL_MULTICAST_URI_LIST to have two items in it.
Merely add a "," and then replicate the orginal uri to workaround this issue.

Code Fix is to change "1" to "0" in rendezvousSeedURIs.get() call above.

[#|2009-12-11T23:28:40.949+0000|SEVERE|sun-glassfish-comms-server2.0|javax.enterprise.system.core|_ThreadID=10;_ThreadName=main;_RequestID=065c7edb-2ec7-4b93-aa04-92886980678e;|CORE5071:
An error occured during initializationjavax.management.RuntimeMBeanException:
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:869)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:838)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.enterprise.admin.util.proxy.ProxyClass.invoke(ProxyClass.java:90)
at $Proxy1.invoke(Unknown Source)
at
com.sun.enterprise.admin.server.core.jmx.SunoneInterceptor.invoke(SunoneInterceptor.java:304)
at
com.sun.enterprise.interceptor.DynamicInterceptor.invoke(DynamicInterceptor.java:170)
at
com.sun.enterprise.ee.cms.lifecycle.GMSLifecycleImpl.onInitialization(GMSLifecycleImpl.java:123)
at
com.sun.enterprise.server.ApplicationServer.onInitialization(ApplicationServer.java:265)
at
com.sun.enterprise.server.ondemand.OnDemandServer.onInitialization(OnDemandServer.java:103)
at com.sun.enterprise.server.PEMain.run(PEMain.java:399)
at com.sun.enterprise.server.PEMain.main(PEMain.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
com.sun.enterprise.server.PELaunch.main(PELaunch.java:415)Caused by:
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at
java.util.ArrayList.RangeCheck(ArrayList.java:547) at
java.util.ArrayList.get(ArrayList.java:322) at
com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:184)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:138)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:154)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:145)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:367)
at
com.sun.enterprise.ee.admin.mbeans.GMSClientMBeanHelper.initGMSGroupForNamedCluster(GMSClientMBeanHelper.java:123)
at
com.sun.enterprise.ee.admin.mbeans.GMSClientMBean.initGMSGroupForNamedCluster(GMSClientMBean.java:157)
at
com.sun.enterprise.ee.admin.mbeans.GMSClientMBean.initGMSGroupForAllClusters(GMSClientMBean.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at javax.management.StandardMBean.invoke(StandardMBean.java:391) at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
... 19 more



 Comments   
Comment by Joe Fialli [ 14/Dec/09 ]

Regression was introduced by a putback on 6/19/2009.
Fix is well understood and detailed in intial bug submission.

Comment by Joe Fialli [ 14/Dec/09 ]

Fixes checked into main shoal trunk and into shoal abstract transport branch.

Still needs to be integrated into sailfin 2.0.x and glassfish v2.1.1.x, where x
is the next patch after current released bits.





[SHOAL-93] missing FailureNotificationSignal during network failure when non-master is isolated Created: 05/Sep/09  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: little_zizou Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Attachments: Text File Scenario1.zip     Text File Scenario2.zip    
Issuezilla Id: 93

 Description   

I have been trying to use shoal with my application, assume I have a cluster
kind of setup with four nodes running on four different systems. If suddenly one
of the node goes out of network and it is not a master node, I get three FailureSuspectedSignals but not all three FailureNotificationSignals. If the
node which went out of network was a master node then, I get three FailureSuspectedSignals and FailureNotificationSignals. Is this not the way it
should behave even in the first case also.



 Comments   
Comment by Joe Fialli [ 08/Sep/09 ]

More information is necessary to research this issue.

1. Please describe what is meant by "out of network".
Is the network cable being pulled from the machine?

2. We have a shoal qe test that verifies that all failure notifications
are sent to surviving group members when a non-master node is killed
(via kill -9). The test verifies that all FAILURE notifications are sent.
(Tests are run on the main branch of shoal. Please confirm you are running
these tests.)

Please submit logs (by attaching a zip of log files) that illustrate your
issue. Logging of FINE would be sufficient to follow what is occuring.

Comment by little_zizou [ 14/Sep/09 ]

Created an attachment (id=20)
Scenario 1 Testcase

Comment by little_zizou [ 14/Sep/09 ]

> More information is necessary to research this issue.
>
> 1. Please describe what is meant by "out of network".
> Is the network cable being pulled from the machine?

I have disabled my LAN network to simulate network failure kind of scenario
(similar to unplugging network cable).

> 2. We have a shoal qe test that verifies that all failure notifications
> ... Please confirm you are running these tests.)

I have not run the tests which you have mentioned but, instead I have written my
own test cases to verify joining nodes to the network and processing failure
notifications.

TestCase Description:
We have 3 systems with 3 shoal clients (Client1, Client2 & Client3), each
client running on a different system with member token names as server1, server2
and server3 respectively, all in the same group.

Scenario 1:
server2 and server3 are started before server1, now when we disable network on
server1, I could see 2 FailureSuspectedSignals and 2 FailureNotificationsignals
(for server2 and server3 respectively), as expected.

Scenario 2:
Now we have 3 clients running on 3 different systems, but the name of member
token which joins the group as "server1" is renamed as "server5".

Systems are started just like in the previous case. server2 and server3 are
started before server5, and disabled the LAN on server5. This time I could see 2
FailureSuspectedSignals, but only one FailureNotificationSignal.

I have attached the test sources and logs of both Scenario1 and Scenario2 for
your reference.

Comment by little_zizou [ 14/Sep/09 ]

Created an attachment (id=21)
Scenario 2 TestCase

Comment by Joe Fialli [ 14/Sep/09 ]

Issue understood.

Code in question is a detected masterFailed and the fact that
only the new master is allowed to announce the failure.

private void assignAndReportFailure(final HealthMessage.Entry entry) {
<deleted non-relevant code>
final boolean masterFailed = (masterNode.getMasterNodeID()).equals(entry.id);
if (masterNode.isMaster() && masterNode.isMasterAssigned())

{ <deleted non-relevant code> }

else if (masterFailed) {
//remove the failed node
LOG.log(Level.FINE, MessageFormat.format("Master Failed. Removing System
Advertisement :

{0} for master named {1}", entry.id.toString(),
entry.adv.getName()));
manager.getClusterViewManager().remove(entry.adv);
masterNode.resetMaster();
masterNode.appointMasterNode();
if (masterNode.isMaster() && masterNode.isMasterAssigned()) {
LOG.log(Level.FINE, MessageFormat.format("Announcing Failure Event
of {0}

for name

{1}

...", entry.id, entry.adv.getName()));
final ClusterViewEvent cvEvent = new
ClusterViewEvent(ClusterViewEvents.FAILURE_EVENT, entry.adv);
masterNode.viewChanged(cvEvent);
}
}
cleanAllCaches(entry);
}
}

To avoid multiple reports of a FAILURE, only the master is typically allowed to
report failure to rest of cluster. For Scenario 2, when the network lan is
disabled on "server5", the reporter of this issue is looking for both "server2"
and "server3" to have failure events. While the heartbeat failure detection
does detect both server2 and server3 are failed (from server5's point of view,
they are both running in their own subnet)in submitted logs for scenario2, the
failure is not reported for server2 since server3 is calculated to be the new
master for server5. Unfortunately, server3 also can not communicate with
"server5". Thus the missing announce of the failure of server2. When "server3"
is detected to have failed, then server5 is the sole instance left in its subnet
cluster, it becomes the master and reports that server3 has failed.

To summarize, heartbeat failure detection is working correctly. "server5" view
of cluster is correct, just the failure notification for "server2" is missing in
this scenario. Reason for missing failure is in code fragment included above.

Comment by Joe Fialli [ 14/Sep/09 ]

started analysis of issue from submitted logs.
see previous comments made when reassigning issue to myself.

Comment by Joe Fialli [ 14/Sep/09 ]

Summary of issue reported for scenario 2 submitted on Sept 14th.

When the network lan fails for a non-master instance of a group,
the submitter of this issue expects to receive a FAILURE notification for
each instance on the isolated subnet that is no longer reachable.

Shoal's heartbeat failure detection is working to detect that the instances no
longer exist; however, isolated instance will not receive any failure
notifications about the no longer reachable members of the group until it
finally makes itself the master node.

For the submitted scenario 1, "server1" becomes the master node after "server2"
is no longer reachable. So no FAILURE events are dropped for that scenario.
Even though "server1" was not the master before lan is disabled,
"server1" is made the Master node for its subnet of one immediately due to
naming comparisions between it and the other remaining server names in gms group.





[SHOAL-92] jxta class loading issue using IBM JDK 1.6 Created: 20/Aug/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: Linux


Issuezilla Id: 92

 Description   

Reported issue is with IBM JDK 1.6 on Linux, AIX and Windows.
One can work around this issue using IBM JDK 1.5.

Executive summary of bug in IBM JDK 1.6 from Bongjae; (does not include how this
breaks jxta)

Given following test code fragment:

Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.

Complete details provided by Bongjae:

I tried to test current shoal version in IBM JDK 1.6.

But GMS failed to join the group in IBM JDK 1.6.

Here is error log.
---------------------
2009. 3. 21 오후 6:21:17 com.sun.enterprise.jxtamgmt.JxtaUtil configureJxtaLogging
CONFIG: gms configureJxtaLogging: set jxta logging to default of SEVERE
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager initWPGF
CONFIG: initWPGF
storeHome=/home/bluewolf/project/jeus7trunk/target/jeus/domains/dvt/data/gms/dvt
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager <init>
SEVERE: Could not locate World PeerGroup Module Implementation.
Throwable occurred: net.jxta.exception.PeerGroupException: Could not locate
World PeerGroup Module Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFactory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:123)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:347)
...
---------------------

I could know that "Could not locate World PeerGroup Module Implementation"
message was concerned with JDK version and jxta platform when I reviewed
[Shoal-Users] mailing list.

So I tried to test jxta.jar simply.
---------------------
D:\>java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pwi3260sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32
jvmwi3260-20080415_18762 (JIT enabled,
AOT enabled)
J9VM - 20080415_018762_lHdSMr
JIT - r9_20080415_1520
GC - 20080415_AA)
JCL - 20080412_01

D:\>java -jar jxta.jar
Starting the JXTA platform in mode : EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager configure
INFO: Created new configuration. mode = EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager startNetwork
INFO: Starting JXTA Network! MODE = EDGE, HOME = file:/D:/.cache/BootEdge/
2009. 3. 23 오전 11:22:28 net.jxta.impl.peergroup.StdPeerGroup isCompatible
WARNING: Failure handling compatibility statement
Throwable occurred: java.lang.NumberFormatException: Empty version string
at java.lang.Package.isCompatibleWith(Package.java:223)
at net.jxta.impl.peergroup.StdPeerGroup.isCompatible(StdPeerGroup.java:414)
at
net.jxta.impl.peergroup.GenericPeerGroup$1.compatible(GenericPeerGroup.java:131)
at net.jxta.impl.loader.RefJxtaLoader.findClass(RefJxtaLoader.java:254)
at
net.jxta.impl.loader.RefJxtaLoader.findModuleImplAdvertisement(RefJxtaLoader.java:350)
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:241)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)
Uncaught Throwable caught by 'main':
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:244)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)

D:\>
---------------------

I could know that this error was related to Package.isCompatibleWith() method.

Here is my test code.

---------------------
Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.


 Comments   
Comment by Joe Fialli [ 21/Aug/09 ]

duplicate of 91

      • This issue has been marked as a duplicate of 91 ***




[SHOAL-91] jxta class loading issue using IBM JDK 1.6 Created: 20/Aug/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: Linux


Issuezilla Id: 91

 Description   

Reported issue is with IBM JDK 1.6 on Linux, AIX and Windows.
One can work around this issue using IBM JDK 1.5.

Executive summary of bug in IBM JDK 1.6 from Bongjae; (does not include how this
breaks jxta)

Given following test code fragment:

Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.

Complete details provided by Bongjae:

I tried to test current shoal version in IBM JDK 1.6.

But GMS failed to join the group in IBM JDK 1.6.

Here is error log.
---------------------
2009. 3. 21 오후 6:21:17 com.sun.enterprise.jxtamgmt.JxtaUtil configureJxtaLogging
CONFIG: gms configureJxtaLogging: set jxta logging to default of SEVERE
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager initWPGF
CONFIG: initWPGF
storeHome=/home/bluewolf/project/jeus7trunk/target/jeus/domains/dvt/data/gms/dvt
2009. 3. 21 오후 6:21:18 com.sun.enterprise.jxtamgmt.NetworkManager <init>
SEVERE: Could not locate World PeerGroup Module Implementation.
Throwable occurred: net.jxta.exception.PeerGroupException: Could not locate
World PeerGroup Module Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFactory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:123)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:347)
...
---------------------

I could know that "Could not locate World PeerGroup Module Implementation"
message was concerned with JDK version and jxta platform when I reviewed
[Shoal-Users] mailing list.

So I tried to test jxta.jar simply.
---------------------
D:\>java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pwi3260sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32
jvmwi3260-20080415_18762 (JIT enabled,
AOT enabled)
J9VM - 20080415_018762_lHdSMr
JIT - r9_20080415_1520
GC - 20080415_AA)
JCL - 20080412_01

D:\>java -jar jxta.jar
Starting the JXTA platform in mode : EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager configure
INFO: Created new configuration. mode = EDGE
2009. 3. 23 오전 11:22:28 net.jxta.platform.NetworkManager startNetwork
INFO: Starting JXTA Network! MODE = EDGE, HOME = file:/D:/.cache/BootEdge/
2009. 3. 23 오전 11:22:28 net.jxta.impl.peergroup.StdPeerGroup isCompatible
WARNING: Failure handling compatibility statement
Throwable occurred: java.lang.NumberFormatException: Empty version string
at java.lang.Package.isCompatibleWith(Package.java:223)
at net.jxta.impl.peergroup.StdPeerGroup.isCompatible(StdPeerGroup.java:414)
at
net.jxta.impl.peergroup.GenericPeerGroup$1.compatible(GenericPeerGroup.java:131)
at net.jxta.impl.loader.RefJxtaLoader.findClass(RefJxtaLoader.java:254)
at
net.jxta.impl.loader.RefJxtaLoader.findModuleImplAdvertisement(RefJxtaLoader.java:350)
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:241)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)
Uncaught Throwable caught by 'main':
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass(WorldPeerGroupFact
ory.java:244)
at
net.jxta.peergroup.WorldPeerGroupFactory.<init>(WorldPeerGroupFactory.java:178)
at
net.jxta.peergroup.NetPeerGroupFactory.<init>(NetPeerGroupFactory.java:204)
at net.jxta.platform.NetworkManager.startNetwork(NetworkManager.java:410)
at net.jxta.impl.peergroup.Boot.main(Boot.java:139)

D:\>
---------------------

I could know that this error was related to Package.isCompatibleWith() method.

Here is my test code.

---------------------
Package javaLangPackage = Package.getPackage( "java.lang" );
System.out.println( javaLangPackage.getSpecificationVersion() );
---------------------

In Sun JDK6
---------------------
1.6
---------------------

In IBM JDK6
---------------------
null
---------------------

So Package#isCompatibleWith() can return "java.lang.NumberFormatException: Empty
version string" exception in IBM JDK6 because specVersion can be null.

  • current cvs jxta.jar works well in Sun JDK1.5, Sun JDK1.6 and IBM JDK1.5.
  • I tested this case in Windows and Linux. Both Windows and Linux returned same
    error when I used IBM JDK1.6.


 Comments   
Comment by Joe Fialli [ 21/Aug/09 ]
      • Issue 92 has been marked as a duplicate of this issue. ***
Comment by Joe Fialli [ 25/Aug/09 ]

I downloaded IBM JDK 6 as part of Eclipse for windows.
The manifest file for rt.jar is incorrectly configured. It is missing
package specification and version info for package java.

Comment by shreedhar_ganapathy [ 18/Sep/09 ]

Working with IBM under support arrangement, they will make a patch over JDK 6 SR5 available to Sun's
customers on AIX platform for this issue. The patch is expected to be made available in about a month
from now.
They do not plan to have this patch available for IBM JDK 6 SR5 on other platforms. This patch should
also work with SR6 since SR 6 is already in freeze so the fix did not make it there. We hope that SR 7
would have the fix incorporated.

Comment by Joe Fialli [ 05/Feb/10 ]

No fix on shoal side. An update version of IBM JDK 6 fixes this issue.





[SHOAL-90] SEVERE ShoalLogger msg: World Peer Group could not be instantiated Created: 20/Jul/09  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File instance1_secondtime.log    
Issuezilla Id: 90

 Description   

A GMS client has a local cache stored in a directory called ".shoal".

If the file permissions of the directory or the content of the directory do not
allow themselves to be deleted (cleared) when the gms client starts up, one will
see the summary log message. Here are all the SEVERE messages one will see when
this occurs.

[#|2009-07-20T12:00:17.298-0400|SEVERE|Shoal|net.jxta.impl.cm.Cm|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=Cm;MethodName=<init>;|Unable
to Initialize databases
SEVERE: Unable to Initialize databases
[#|2009-07-20T12:00:17.327-0400|SEVERE|Shoal|net.jxta.impl.peergroup.StdPeerGroup|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=StdPeerGroup;MethodName=initFirst;|Error
during creation of local store
SEVERE: Error during creation of local store
[#|2009-07-20T12:00:17.328-0400|SEVERE|Shoal|net.jxta.peergroup.WorldPeerGroupFactory|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=WorldPeerGroupFactory;MethodName=newWorldPeerGroup;|World
Peer Group could not be instantiated.
SEVERE: World Peer Group could not be instantiated.
[#|2009-07-20T12:00:17.329-0400|SEVERE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=<init>;|World
Peer Group could not be instantiated.

These errors were generated with following file permissions for .shoal.
dhcp-ubur02-71-15:gms jf39279$ ls -lR .shoal
total 0
drwxr-xr-x 3 root admin 102 Jul 20 11:59 instance1

.shoal/instance1:
total 0
drwxr-xr-x 4 root admin 136 Jul 20 11:59 cm

.shoal/instance1/cm:
total 0
drwxr-xr-x 11 root admin 374 Jul 20 11:59 jxta-WorldGroup
drwxr-xr-x 9 root admin 306 Jul 20 11:59
uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302

.shoal/instance1/cm/jxta-WorldGroup:
total 3272
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsDesc.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsGID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-GroupsName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersPID.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements-offsets.tbl
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements.tbl

.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302:
total 3200
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvDstPID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-AdvMSID.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersName.idx
rw-rr- 1 root admin 12288 Jul 20 11:59 advertisements-PeersPID.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 advertisements-offsets.tbl
rw-rr- 1 root admin 793088 Jul 20 11:59 advertisements.tbl
drwxr-xr-x 7 root admin 238 Jul 20 11:59 srdi

.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi:
total 5208
rw-rr- 1 root admin 12288 Jul 20 11:59 pipeResolverSrdi-JxtaPropagateId.idx
rw-rr- 1 root admin 792576 Jul 20 11:59 pipeResolverSrdi-offsets.tbl
rw-rr- 1 root admin 792576 Jul 20 11:59 pipeResolverSrdi.tbl
rw-rr- 1 root admin 528896 Jul 20 11:59 routerSrdi-offsets.tbl
rw-rr- 1 root admin 528896 Jul 20 11:59 routerSrdi.tbl

****************



 Comments   
Comment by Joe Fialli [ 20/Jul/09 ]

This issue occurs when one runs a gmsclient as user1 in a directory and then
logs in as user2 in same directory and user2 does not have permission to delete
.shoal cache files created by user1 run of gmsclient.

WORKAROUND:
Remove .shoal files created by a user1 that does not provide permission to
delete files to a user2. (Easiest case to think of is "user1" is root and
"user2" is a non-privledged user in the system.)

Comment by Joe Fialli [ 20/Jul/09 ]

Following log shows that FINE level logging is noting that all attempts to
delete the .shoal cached file is failing. Better user feedback needs to be
provided that these deletes are failing and if not corrected by a system
administrator, the system will not be able to start properly. Startup should
end with a SEVERE error and an exception that results in gms client not
attempting to start up anymore until this file permission issue is resolved.

$ grep FINE instance1_secondtime.log | grep failed
[#|2009-07-20T12:00:17.031-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-AdvMSID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsDesc.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsGID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsMSID.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-GroupsName.idx|#]
[#|2009-07-20T12:00:17.035-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-offsets.tbl|#]
[#|2009-07-20T12:00:17.036-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-PeersName.idx|#]
[#|2009-07-20T12:00:17.036-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/jxta-WorldGroup/advertisements-PeersPID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file .shoal/instance1/cm/jxta-WorldGroup/advertisements.tbl|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-AdvDstPID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-AdvMSID.idx|#]
[#|2009-07-20T12:00:17.037-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-offsets.tbl|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-PeersName.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements-PeersPID.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/advertisements.tbl|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi-JxtaPropagateId.idx|#]
[#|2009-07-20T12:00:17.038-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi-offsets.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/pipeResolverSrdi.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/routerSrdi-offsets.tbl|#]
[#|2009-07-20T12:00:17.039-0400|FINE|Shoal|ShoalLogger|_ThreadID=11;_ThreadName=ApplicationServer;ClassName=NetworkManager;MethodName=clearCache;|failed
to deleted cache file
.shoal/instance1/cm/uuid-C773444DBF054B18A31A9546EA5BB81559616261646162614E5047205032503302/srdi/routerSrdi.tbl|#]

Comment by Joe Fialli [ 20/Jul/09 ]

Created an attachment (id=19)
rungmsdemo.sh instance1 log file illustrating .shoal cache with file permissions not allowing cache to be deleted

Comment by Joe Fialli [ 20/Jul/09 ]

Steps to recreate issue (forgot to add this in initial submission)

  • Log in as root.
  • run ./rungmsdemo.sh in shoal/gms.
  • log out as root
  • run ./rungmsdemo.sh in shoal/gms a second time.
    redirect output to a file. (should be similar to log file attached to this issue)




[SHOAL-89] Improved concurrency for sendMessage Created: 18/Jun/09  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Improvement Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 89

 Description   

RFE related to fix for shoal issue 88 to change that synchronization
solution to a more performant pool of OutputPipes (one pipe to be used by one
thread at any point in time).



 Comments   
Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.

Comment by Joe Fialli [ 09/Nov/11 ]

there are trade offs for concurrent sendMessage when relying on NIO as the ultimate transport.
so this RFE was considered and postponed due to these tradeoffs.

The concurrent processing resulted in not being able to share the same deserialized output stream with
all send messages. Thus, there is also a space usage and/or multiple desserializations necessary for
each concurrent send.

With regular multicast, only one deserialization of the message to be sent was occurring.
With current implementation, there still is only one.

With concurrent send, there were not so obvious tradeoffs.
So this RFE is on hold for now while sorting through the tradeoffs.





[SHOAL-88] GroupHandle.sendMessage fails frequently when too many concurrent threads sending at same time Created: 21/May/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 88

 Description   

Reported initially by Bongjae Chang.
Following extracted from his emails.


To reproduce issue on same machine,

one command is
java -cp xxx
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
server1 server2 100

another command is
java -cp xxx
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
server2 server1 0

One will notice many failures in log that message was not sent.

com.sun.enterprise.ee.cms.core.GMSException: message
com.sun.enterprise.ee.cms.spi.GMSMessage@1b6c03
6 not sent to
urn:jxta:uuid-59616261646162614A78746150325033A113B2FFB4B64F038C858B9EB8FC413803, send
returned false
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommu
nicationProviderImpl.java:291)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl.sendMessage(GroupHandleImpl.java:133)

at
com.sun.enterprise.shoal.jointest.MultiThreadSenderTest$1.run(MultiThreadSenderTest.java:
103)
at java.lang.Thread.run(Thread.java:717)


It seems that JXTA's OutputPipe.send() returns false continuously because of
overflow. Shoal already tried to send it again with MAX_SEND_RETRIES which is 4
in JxtaUtil#send().

But, it seems that the MAX_SEND_RETRIES value is not enough in my test which has
over 100 sender thread simultaneously.

When I set MAX_SEND_RETRIES to over 1000 experimentally, I found that all
packets could be sent to the remote server successfully, but there was a marked
decline in the sending performance. So, I think that it is not good idea that
MAX_SEND_RETRIES has too large value in my test.



 Comments   
Comment by Joe Fialli [ 21/May/09 ]

Bongae confirmed that putting a synchronized block on OutputPipe corrected
the issue. There will be an RFE submitted to change this synchronization
solution to a more performant pool of OutputPipes (one pipe to be used by one
thread at any point in time).

Comment by Joe Fialli [ 21/May/09 ]

fix checked into shoal on May 21.





[SHOAL-87] (trivial) When a failure suspected event is received, the log message is printed inadequately Created: 08/May/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Trivial
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 87

 Description   

When failure suspected event is occurred, the INFO's log message is printed,
but the log message is wrong.

Here is the log.


2009. 5. 9 오후 12:16:43 com.sun.enterprise.ee.cms.impl.jeus.ViewWindowImpl
addInDoubtMemberSignals
ì •ë³´: Received FailureSuspectedEvent for Member: b496383a-a1e0-48df-82f6-
3c1463129acf of Group:

{1}

The group name is missing.



 Comments   
Comment by carryel [ 08/May/09 ]

Fixed the log message.
The group name is added.





[SHOAL-86] Graceful handling of unexpected exceptions(NPEs) when GMS failed to join the group Created: 30/Mar/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Source File SimpleShoalAPITest.java    
Issuezilla Id: 86

 Description   

When GMS failed to join the group, GMS didn't throw a GMSException but
unexpected exception like a NPE.

There are two issues.

1) GroupManagementService#join() API had better throw a GMSException instead of
a NPE in an unexpected error.
Here is the log.


D:\shoal\gms>rungmsdemo.bat testServer testGroup CORE 30000 INFO
D:\ibm_sdk60\bin
[#|2009-03-
31T10:32:01.677+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=Applicatio
nServer;C
lassName=NetworkManager;MethodName=<init>;|Could not locate World PeerGroup
Module Implementation.
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass
(WorldPeerGroupFact
ory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>
(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF
(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>
(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.startGMS
(ApplicationServer.java:156)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.run
(ApplicationServer.java:107)
at java.lang.Thread.run(Thread.java:735)

#]

Exception in thread "ApplicationServer" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getWorldPeerGroup
(NetworkManager.java:725)
at com.sun.enterprise.jxtamgmt.NetworkManager.startDomain
(NetworkManager.java:696)
at com.sun.enterprise.jxtamgmt.NetworkManager.start
(NetworkManager.java:401)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:136)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.startGMS
(ApplicationServer.java:156)
at com.sun.enterprise.ee.cms.tests.ApplicationServer.run
(ApplicationServer.java:107)
at java.lang.Thread.run(Thread.java:735)


When you try to run rungmsdemo.bat in IBM JDK6, you can see NPEs in
GroupManagementService#join().

2) At 1)'s case above, GMS's other APIs like the GroupHandle need graceful
handling of this problem.

I wrote some codes for this test(SimpleShoalAPITest.java).

Test code is simple.


try {
gms.join();
} catch( GMSException e ) {
// It's OK.
throw e;
} catch( Throwable t ) {
// unexpected error.
List<String> exceptions = testSimpleAPIsWithUnexpectedException( gms );
// print unexpected exceptions
// ...
}

private List<String> testSimpleAPIsWithUnexpectedException(
GroupManagementService gms ) {
if( gms == null )
return null; // It's OK.
List<String> unexpectedExceptions = new Vector<String>();
String dummyString = "";
byte[] dummyBytes = new byte[0];

GroupHandle gh = gms.getGroupHandle();
if( gh == null )
return null; // It's OK.
// test APIs
// ...
}


when GMS failed to join the group, but if GMS was not null and GroupHandle was
not null, I checked all GroupHandle's APIs with dummy String and dummy bytes.

Here is the full log


D:\shoal\gms>java -classpath classes;lib\jxta.jar
com.sun.enterprise.shoal.carryel.SimpleShoalAPITes
t
[#|2009-03-
31T11:32:01.458+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=runSimpleSample;|Starting SimpleShoalAPITest....|#]

[#|2009-03-
31T11:32:02.052+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=initializeGMS;|Initializing Shoal for member: 67fbe786-
ff24-4a1f-81d2-d795bc
b9dd16 group:TestGroup|#]

[#|2009-03-
31T11:32:02.068+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=GMSCon
text;MethodName=<init>;|Initialized Group Communication System....|#]

[#|2009-03-
31T11:32:02.068+0900|INFO|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Simple
ShoalAPITest;MethodName=runSimpleSample;|Joining Group TestGroup|#]

[#|2009-03-
31T11:32:02.068+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=GroupM
anagementServiceImpl;MethodName=join;|Connecting to group......|#]

[#|2009-03-
31T11:32:02.130+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Jxta
Util;MethodName=configureJxtaLogging;|gms configureJxtaLogging: set jxta
logging to default of SEVER
E|#]

[#|2009-03-
31T11:32:02.208+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=initWPGF;|initWPGF storeHome=.shoal\67fbe786-ff24-4a1f-
81d2-d795bcb9dd16|#]

[#|2009-03-
31T11:32:02.208+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Netwo
rkManager;MethodName=clearCache;|clearCache(.shoal\67fbe786-ff24-4a1f-81d2-
d795bcb9dd16) on non-exsi
stent directory|#]

[#|2009-03-
31T11:32:02.443+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=<init>;|Could not locate World PeerGroup Module
Implementation.
net.jxta.exception.PeerGroupException: Could not locate World PeerGroup Module
Implementation.
at
net.jxta.peergroup.WorldPeerGroupFactory.getDefaultWorldPeerGroupClass
(WorldPeerGroupFact
ory.java:244)
at net.jxta.peergroup.WorldPeerGroupFactory.<init>
(WorldPeerGroupFactory.java:178)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF
(NetworkManager.java:623)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>
(NetworkManager.java:213)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:133)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.runSimpleSample
(SimpleShoalAPITest.ja
va:42)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.main
(SimpleShoalAPITest.java:25)

#]

[#|2009-03-
31T11:32:02.443+0900|FINE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassNa
me=Networ
kManager;MethodName=startDomain;|Rendezvous seed?:false|#]

[#|2009-03-
31T11:32:02.443+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=startDomain;|set jxta Multicast Poolsize to 300|#]

[#|2009-03-
31T11:32:02.458+0900|CONFIG|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Netw
orkManager;MethodName=startDomain;|node config adv = <?xml version="1.0"
encoding="UTF-8"?>
<!DOCTYPE jxta:CP>
<jxta:CP xml:space="default" type="jxta:PlatformConfig"
xmlns:jxta="http://jxta.org">
<PID>
urn:jxta:uuid-
59616261646162614A7874615032503363A4CD95BF504B68B35687BA4517337A03
</PID>
<Name>
67fbe786-ff24-4a1f-81d2-d795bcb9dd16
</Name>
<Desc>
Created by Jxta Cluster Management NetworkManager
</Desc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000A05
</MCID>
<Parm>
<jxta:TransportAdvertisement
xmlns:jxta="http://jxta.org" xml:space="preserv
e" type="jxta:HTTPTransportAdvertisement">
<Protocol>http</Protocol><ConfigMode>auto</ConfigMode><Port>9700</Port><ServerOf
f/>
</jxta:TransportAdvertisement>
</Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000905
</MCID>
<Parm>
<jxta:TransportAdvertisement
xmlns:jxta="http://jxta.org" xml:space="preserv
e" type="jxta:TCPTransportAdvertisement">
<Protocol>tcp</Protocol><ConfigMode>auto</ConfigMode><Port start="9701"
end="9999">9701</Port><Multi
castAddr>224.0.1.85</MulticastAddr><MulticastPort>1234</MulticastPort><Mcast_Poo
l_Size>300</Mcast_Po
ol_Size><MulticastSize>65536</MulticastSize>
</jxta:TransportAdvertisement>
</Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000105
</MCID>
<Parm type="jxta:PeerGroupConfigAdv"
xmlns:jxta="http://jxta.org" xml:space="preserv
e">
<PeerGroupID>urn:jxta:uuid-
157B8869F02A4210BE61AA03D81ECC6659616261646162614E5047205032503302</PeerG
roupID><PeerGroupName>TestGroup</PeerGroupName><PeerGroupDesc>TestGroup
Infrastructure Group Name</P
eerGroupDesc> </Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000F05
</MCID>
<Parm type="jxta:RelayConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve" clie
nt="true">
<client/><server/> </Parm>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000605
</MCID>
<Parm type="jxta:RdvConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve" config
="client"/>
</Svc>
<Svc>
<MCID>
urn:jxta:uuid-DEADBEEFDEAFBABAFEEDBABE0000000505
</MCID>
<Parm type="jxta:PSEConfig" xmlns:jxta="http://jxta.org"
xml:space="preserve"/>
</Svc>
</jxta:CP>

#]

[#|2009-03-
31T11:32:02.505+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=runSimpleSample;|Unexpected exception occured while
joining group:
java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getWorldPeerGroup
(NetworkManager.java:725)
at com.sun.enterprise.jxtamgmt.NetworkManager.startDomain
(NetworkManager.java:696)
at com.sun.enterprise.jxtamgmt.NetworkManager.start
(NetworkManager.java:401)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>
(ClusterManager.java:136)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommuni
cationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join
(GMSContext.java:123)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServ
iceImpl.java:339)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.runSimpleSample
(SimpleShoalAPITest.ja
va:42)
at com.sun.enterprise.shoal.carryel.SimpleShoalAPITest.main
(SimpleShoalAPITest.java:25)

#]

[#|2009-03-
31T11:32:02.505+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Distr
ibutedStateCacheImpl;MethodName=addToCache;|Adding to DSC by local
Member:67fbe786-ff24-4a1f-81d2-d7
95bcb9dd16,Component:,key:,State:RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.505+0900|FINEST|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Dist
ributedStateCacheImpl;MethodName=addToLocalCache;|Adding
cKey=GMSMember:67fbe786-ff24-4a1f-81d2-d795
bcb9dd16:Component::key: state=RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.505+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Distr
ibutedStateCacheImpl;MethodName=printDSCContents;|67fbe786-ff24-4a1f-81d2-
d795bcb9dd16:DSC now conta
ins ---------
209999666 key=GMSMember:67fbe786-ff24-4a1f-81d2-d795bcb9dd16:Component::key: :
value=RECOVERY_IN_PRO
GRESS|1238466722505

#]

[#|2009-03-
31T11:32:02.521+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Group
HandleImpl;MethodName=isFenced;|GMSMember:67fbe786-ff24-4a1f-81d2-
d795bcb9dd16:Component::key: value
:RECOVERY_IN_PROGRESS|1238466722505|#]

[#|2009-03-
31T11:32:02.521+0900|FINER|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;ClassN
ame=Group
HandleImpl;MethodName=isFenced;|Returning true for isFenced query|#]

[#|2009-03-
31T11:32:02.521+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=runSimpleSample;|Unexpected exceptions occured:
GroupHandle#sendMessage( String, byte[] ): java.lang.NullPointerException
GroupHandle#sendMessage( String, String, byte[] ):
java.lang.NullPointerException
GroupHandle#sendMessage( String, String, byte[] ):
java.lang.NullPointerException
GroupHandle#raiseFence( String, String ): java.lang.NullPointerException
GroupHandle#lowerFence( String, String ): java.lang.NullPointerException
GroupHandle#getMemberState( String ): java.lang.NullPointerException
GroupHandle#getMemberState( String, long, long ): java.lang.NullPointerException
GroupHandle#getGroupLeader(): java.lang.NullPointerException
GroupHandle#isGroupLeader(): java.lang.NullPointerException

#]

[#|2009-03-
31T11:32:02.536+0900|SEVERE|Shoal|ShoalLogger|_ThreadID=0;_ThreadName=main;Class
Name=Simp
leShoalAPITest;MethodName=main;|Exception occured while testing some
APIs:com.sun.enterprise.ee.cms.
core.GMSException: java.lang.NullPointerException|#]


As you can see, the following APIs(9 methods) are not safe.

  • GroupHandle#sendMessage( String, byte[] ): java.lang.NullPointerException
  • GroupHandle#sendMessage( String, String, byte[] ):
    java.lang.NullPointerException
  • GroupHandle#sendMessage( String, String, byte[] ):
    java.lang.NullPointerException
  • GroupHandle#raiseFence( String, String ): java.lang.NullPointerException
  • GroupHandle#lowerFence( String, String ): java.lang.NullPointerException
  • GroupHandle#getMemberState( String ): java.lang.NullPointerException
  • GroupHandle#getMemberState( String, long, long ):
    java.lang.NullPointerException
  • GroupHandle#getGroupLeader(): java.lang.NullPointerException
  • GroupHandle#isGroupLeader(): java.lang.NullPointerException

Shoal should improve these NPEs.

I attached my test code(SimpleShoalAPITest.java)



 Comments   
Comment by carryel [ 30/Mar/09 ]

Created an attachment (id=15)
GroupHanlde API test code

Comment by Joe Fialli [ 17/Apr/09 ]

agree that GMS API methods should not be throwing NPE

Comment by Joe Fialli [ 05/Feb/10 ]

Fixed.

Regression test is com.sun.enterprise.ee.cms.tests.core.GroupHandleTest.java.
Shell script runAPITests.sh.

Verified that the originally checked in test filed with this bug also runs okay.
Throw IllegalArgumentException when null is passed as a parameter and it is not
allowed.





[SHOAL-85] message not processed/received when GroupHandle.sendMessage with null component name is specified Created: 09/Feb/09  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Stephen DiMilla Assignee: Joe Fialli
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Solaris
Platform: Solaris


Attachments: Java Source File ApplicationServer1.java     Java Source File GMSClientService1.java    
Issuezilla Id: 85

 Description   

I modified the test classes:
com.sun.enterprise.ee.cms.tests.ApplicationServer
com.sun.enterprise.ee.cms.tests.GMSClientService

to send and receive a message. I've attached both those classes to this issue.

Based on the javadoc for GroupHandle:

  • ... Specifying
  • a null component name would result in the message being
  • delivered to all registered components in the target member
  • instance.

Therefore using the method:
gh.sendMessage((String)null,null, message.getBytes());

should result in the EJBContainer and TransactionService to each receive the
message passed to sendMessage, but based on the testing I've done that is not
happening.
The messages are sent but never dispatched to either service.
If you set the component name to be non-null:
gh.sendMessage((String)null,"Transaction", message.getBytes());

then the message is received by that component.

It appears that the either the documentation is wrong or there may be a
bug with the distribution of the received message to the component



 Comments   
Comment by Stephen DiMilla [ 09/Feb/09 ]

Created an attachment (id=13)
ApplicationServer java file

Comment by Stephen DiMilla [ 09/Feb/09 ]

Created an attachment (id=14)
GMSClientServer java file

Comment by Joe Fialli [ 05/Feb/10 ]

duplicate of issue 97. already fixed.

      • This issue has been marked as a duplicate of 97 ***




[SHOAL-84] JXTA Exception on network disconnect Created: 18/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: alireza2008 Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 84

 Description   

I encountered the Exception below during the network disconnection tests--I had
two members in a group on separate hosts within the same subnet (all default
JXTA parameters), then I unplugged the network connection from one of the host
where I received the following exception:

Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
Nov 13, 2008 11:51:45 AM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:GMSTestMonitor...
Nov 13, 2008 11:51:45 AM net.jxta.endpoint.ThreadedMessenger run
SEVERE: Uncaught throwable in background thread
java.lang.NoClassDefFoundError: net/jxta/impl/endpoint/router/RouterMessenger
at
net.jxta.impl.endpoint.router.EndpointRouter.getMessenger(EndpointRouter.java:2336)
at
net.jxta.impl.endpoint.EndpointServiceImpl.getLocalTransportMessenger(EndpointServiceImpl.java:1566)
at
net.jxta.impl.endpoint.EndpointServiceImpl.access$200(EndpointServiceImpl.java:106)
at
net.jxta.impl.endpoint.EndpointServiceImpl$CanonicalMessenger.connectImpl(EndpointServiceImpl.java:380)
at net.jxta.endpoint.ThreadedMessenger.connect(ThreadedMessenger.java:551)
at net.jxta.endpoint.ThreadedMessenger.run(ThreadedMessenger.java:389)
at java.lang.Thread.run(Unknown Source)
Nov 13, 2008 11:51:48 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group GMSTestGroup : Members in view for
(before change analysis) are :
1: MemberId: GMSTestResource, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033520D314DBB264715B
E83E86B57A610F803



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

reassigned to Joe for fixing post HCF





[SHOAL-83] When group leader failed, any member couldn't receive FailureRecovery notification Created: 12/Nov/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 83

 Description   

When group leader failed, any member couldn't receive FailureRecovery
notification.
Of course, members added FailureRecoveryActionFactoryImpl and their callbacks
to GMS.
But if failure member was not group leader, other member received
FailureRecovery notification successfully.

Here are two logs.
--------------------
case 1) When failure member is group leader.

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
ì •ë³´: gms.failureSuspectedEventReceived
2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51-
9b79-43e2-92dd-41899c907383...
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
ì •ë³´: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383

case 2) When failure member is not group leader

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
2008. 11. 12 오후 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
ì •ë³´: gms.failureSuspectedEventReceived
2008. 11. 12 오후 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
ì •ë³´: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3-
581c-4392-89cf-6a06d736c90f...
2008. 11. 12 오후 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group DemoGroup : Members in view for
(before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
2008. 11. 12 오후 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
ì •ë³´: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f
2008. 11. 12 오후 9:42:19
com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector
setRecoverySelectionState
ì •ë³´: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed
member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
2008. 11. 12 오후 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureRecoveryAction
ì •ë³´: Sending FailureRecoveryNotification to component service
--------------------

In case1(abnormal case),
group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master
was selected) -> FAILURE_EVENT

In case2(normal case),
member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

For receiving FailureRecovery notification, recovery target should be resolved.
Selection algorithm for recovery target uses previous members' view.

Assume that "A" and "B" are member in the same group and "A" is group leader.

[case1: "B"'s view histroy]
... --> (A, B) --> A failed -> B became to be new master with master change
event -> (B)[previous view] -> failure event -> (B)[current view]

[case2: "A"'s view history]
... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]

In other words,
case1's previous view doesn't have "A"(failure member), so default algorithm
(SimpleSelectionAlgorithm) can't find proper recovery target.
case2's previous view has "B"(failure member), so default algorithm can
select "A" for recovery target.
(I assume that you already know SimpleSelectionAlgorithm)

So I think that this issue has a concern in selection algorithm for recovery
target.

I think that thinking out another simple algorithm can be an example for
resolving this issue.
ex) always selecting first core member in live cache.



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

..

Comment by Joe Fialli [ 21/Aug/09 ]

Shoal test scenario 14 verifies that the fix for this has been integrated.





[SHOAL-82] notifying cluster view event is not thread safe Created: 12/Nov/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 82

 Description   

ClusterViewManager.notifyListeners() can be executed on multi-threads when many
members join the same group concurrently.

Though there are no member's failures, you can see the following log.

------------------------------------
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f
group:TestGroup
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (5d3280a2-a0c5-4ae2-8d41-
d59b57400b8f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03

2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (aeea918f-571b-463b-bfa6-
55c536df0d11) : Members in view for (before change analysis) are :
(a)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (addb1dbe-06cf-43b8-8903-
78605f29091f) : Members in view for (before change analysis) are :
(b)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (fae1414d-702a-42fd-8c7d-
6ffabe8b2e69) : Members in view for (before change analysis) are :
(c)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 오후 5:44:20 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup (42b22147-7683-481f-a9f4-
85ba5a2b847f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

...
------------------------------------

This log means that five members join "TestGroup"

1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address:
urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

And this log is printed in ViewWindow based on the viewQueue when new view is
observed.

But above log message, you can see that (a), (b) and (c)'s order are strange.

Because there are no failures, I think that member's number should be increased
gradually(or (a)'num <= (b)'s num <= (c)'s num).

The following code is ClusterViewManager's notifyListeners() method.


void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners)

{ elem.clusterViewEvent(event, getLocalView()); }

}


getLocalView() is thread safe with viewLock but ClusterViewEventListener's
clusterViewEvent() is not thread safe.

The following code is GroupCommunicationProviderImpl's clusterViewEvent()
method which implements ClusterViewEventListener interface.


public void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final
ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try

{ viewQueue.put(ePacket); } catch(InterruptedExcetion e) { ... }

}
-----

I think that local view's snapshot(getLocalView()'s return value) and
viewQueue.put() should be atomic like this.
-----
void notifyListeners(final ClusterViewEvent event) {
Log.log(...);
for (ClusterViewEventListener elem : cvListeners) {
synchronized( elem ) { elem.clusterViewEvent(event, getLocalView()); }
}
}

or

public synchronized void clusterViewEvent(final ClusterViewEvent
clusterViewEvent, final ClusterView clusterView) {
...
final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(),
clusterViewEvent.getAdvertisement(), clusterView);
final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext
().getViewQueue();
try { viewQueue.put(ePacket); }

catch(InterruptedExcetion e)

{ ... }

}

(In my opinion, I think that the former is better because clusterViewEvent()
can be implemented variously)


In other words,
-------------------------------------------------------------------
getLocalView() --> local view's snapshot --> (hole) --> insert view queue
-------------------------------------------------------------------

As you can see above, before EventPacket is inserted into view queue, there is
some hole. So we can remove the hole with synchronized block or individual lock
object.
If the hole is removed, I think that ViewWindow can receive local view capture
from queue correctly.



 Comments   
Comment by shreedhar_ganapathy [ 22/Nov/08 ]

..





[SHOAL-81] Propagate Senders HM.Entry seqid in sent HealthMessage Created: 02/Oct/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File unexpectedfailure.log    
Issuezilla Id: 81

 Description   

Requires changes in HealthMessage.initialize() and getDocument().

HealthMessage.getDocument() should write the HealthMessage.Entry.sequenceId into
its XML document representation.
HealthMessage.initialize() should read the senders sequence id for health
message.entry from XML document representation.

Currently, the receiver side is just creating a sequence id based on
order of receiving messages. Jxta messaging protocol does not guarantee
that messages are received in precise order that they were sent, so the
current sequencing mechanism could be resulting in out of order processing
of health messages. This could result in incorrect computed cache state for an
instance in the master node.



 Comments   
Comment by Joe Fialli [ 06/Oct/08 ]

Created an attachment (id=12)
server log summarizing out of order message processing

Comment by Joe Fialli [ 06/Oct/08 ]

https://shoal.dev.java.net/nonav/issues/showattachment.cgi/12/unexpectedfailure.log

Following attachment summarizes a failure that occurs due to this defect.
Messages are sent by instance in following order:
aliveandready
clusterstopping
stopping

The DAS (master node) receives the messages in the following order:
stopping (receiving side seqid 960)
clusterstopping (receiving side seqid 961)
aliveandready (receiving side seqid 963)

The DAS processes the message in following order:
clusterstopping (961)
stopping(960)
aliveandready (963)

The aliveandready message being processed last makes a stopped instance
appear to come back to life as far as Master is concerned.
It is then marked as INDOUBT by master and then verified FAILED.
Must correct this ordering issue to fix this.

Comment by Joe Fialli [ 11/Nov/08 ]

Fix delivered. Senders sequence id is now propagated.

Also, use start time of member and sequence id to order messages between
one invocation and a restart invocation of server instance.
(Nodeagent can restart a failed instance quickly so this can happen)





[SHOAL-80] Accessing system property in a rt.jar specific way Created: 23/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: okrische Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 80

 Description   

Watch out in line 99 of com.sun.enterprise.jxtamgmt.NiceLogFormatter:

@SuppressWarnings("unchecked")
private static final String LINE_SEPARATOR =
(String) java.security.AccessController.doPrivileged(
new sun.security.action.GetPropertyAction("line.separator"));

Why not just using:

  • System.getProperty("line.separator")

instead?

The code above is shown as error in eclipse. Probably it just does not like,
that we use code directly on the rt.jar and not from the public API.



 Comments   
Comment by Joe Fialli [ 27/Oct/08 ]

Does not impact the running system, only compile time.

Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.





[SHOAL-79] DistributedStateCacheImpl not thread safe? Created: 22/Sep/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: okrische Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File DistributedStateCacheImpl-Diff.txt    
Issuezilla Id: 79

 Description   

Hello,

tho i see several issues, i concentrate myself only on one, the most obvious:

private static final Map<String, DistributedStateCacheImpl> ctxCache =
new HashMap<String, DistributedStateCacheImpl>();
//return the only instance we want to return
static DistributedStateCache getInstance(final String groupName) {
DistributedStateCacheImpl instance;
if (ctxCache.get(groupName) == null)

{ instance = new DistributedStateCacheImpl(groupName); ctxCache.put(groupName, instance); }

else

{ instance = ctxCache.get(groupName); }

return instance;
}

I think, shoal should take care of concurrency issues on ctxCache as well.

Why not using ConcurrentMap as well? Maybe like this (which works fine, as long
instantiating an instance is not a heavy operation):

ConcurrentMap<String, DistributedStateCacheImpl> ctxCache = ...;

static DistributedStateCache getInstance(final String groupName) {
DistributedStateCacheImpl instance = ctxCache.get(groupName);
if (instance == null) {
instance = new DistributedStateCacheImpl(groupName);

// put our mapping only, if no other mapping has been put already
DistributedStateCacheImpl otherInstance =
ctxCache.putIfAbsent(groupName, instance);

// there was another mapping, use that one instead of ours
if (otherInstance != null)

{ instance = otherInstance; }

}
return instance;
}

Other issues:

  • GSMContext ctx is not guarded
  • firstSyncDone is not guarded

Right now it seems, someone has to synchronize on its own. At least this should
be reflected in the javadoc:

"The implemenation itself is not thread-safe"

What do you think?



 Comments   
Comment by shreedhar_ganapathy [ 22/Sep/08 ]

Excellent observation!
Could you also file cases for other issues you see with DistributedCache ?
Are you interested in contributing fixes? We always will welcome those.

Comment by okrische [ 23/Sep/08 ]

You want me to submit a patch on this issue? I can do, will you append it then
to the branch?

Comment by shreedhar_ganapathy [ 23/Sep/08 ]

We'd be happy to do that. If you send the contributor agreement, you will have
commit access to check in your patch after review and some testing.

Thanks
Shreedhar

Comment by okrische [ 25/Sep/08 ]

Created an attachment (id=11)
patch to fix concurrency issues for this class only

Comment by okrische [ 25/Sep/08 ]

Okay, here some comments to the patch:

  • Logger is final static instead of just final. Saves one reference per created
    instance of DistribuedStateCacheImpl
  • cache changed from Map to ConcurrencyHashMap to fix the concurrency issue
  • ctx changed to an AtomicReference ctxRef, since ctx will be set at runtime by
    the first thread, who enters the method to read ctx -> concurrency issue
  • firstSyncDone changed to volatile, since it can be changed at runtime by the
    first thread, who does the sync -> concurrency issue
Comment by okrische [ 25/Sep/08 ]

Ups.

-> meant "ctxCache", not cache

Comment by Joe Fialli [ 25/Sep/08 ]

Thanks for the patch. I will verify the changes against our internal Shoal tests.
If all checks out, I will update this issue, requesting you to check the change
in.

I will update my status on confirming this patch by Monday.

Comment by Joe Fialli [ 27/Oct/08 ]

Patch checked in.
Will be included in next shoal-integration into application server.





[SHOAL-78] add isLoggable around logging that is lower than warning Created: 19/Sep/08  Updated: 12/Nov/10  Resolved: 12/Nov/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 78

 Description   

Performance optimization:

logging messages that concatenate strings together as parameters waste processing
time. All logging messages that concatenate strings together and logging level
is less than WARNING need to follow this pattern.

if (logger.isLoggable(Level.XXX) )

{ log.XXX("..." + "...." + ...); }

where XXX is logging level less than WARNING.



 Comments   
Comment by Joe Fialli [ 19/Sep/08 ]

take ownership of task

Comment by Joe Fialli [ 19/Sep/08 ]

Accept task. Given that it is not high priority, it will be placed in queue.
Should be completed before Sailfin 1.5 ships.

Comment by Joe Fialli [ 01/Oct/08 ]

Partially fixed for distributed state cache logging messages.

Comment by Joe Fialli [ 12/Nov/10 ]

fixes checked in





[SHOAL-77] CMS SIGSEGV error generated by the JVM Created: 16/Sep/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Closed
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: andbur Assignee: sheetalv
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File PL5_hserr_pid2847.log    
Issuezilla Id: 77

 Description   

Couldn't change two of the upper header field...
Found in version should be 1.0 and
subcomponent = server_lifecycle

Operative System : Linux, CXP9013152/1 R2B02
Sailfin version : v5 b37g

  1. An unexpected error has been detected by Java Runtime Environment:
    #
  2. SIGSEGV (0xb) at pc=0x00002b5f56ecc650, pid=2847, tid=1098402112
    #
  3. Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64)
  4. Problematic frame:
  5. C [libc.so.6+0x72650] strlen+0x20
    #
  6. If you would like to submit a bug report, please visit:
  7. http://java.sun.com/webapps/bugreport/crash.jsp
  8. The crash happened outside the Java Virtual Machine in native code.
  9. See problematic frame for where to report the bug.
    #

--------------- T H R E A D ---------------

Current thread (0x00002aab77ecc400):
JavaThread "com.sun.enterprise.ee.cms.impl.common.Router Thread"
[_thread_in_native, id=2934, stack(0x0000000041764000,0x0000000041785000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
si_addr=0x0000000000000009

Registers:
RAX=0x0000000000000000, RBX=0x000000004177d250, RCX=0x000000004177d250,
RDX=0x000000004177d330
RSP=0x000000004177caa8, RBP=0x000000004177d120, RSI=0x0000000000000009,
RDI=0xfffffffffffffff7
R8 =0xfffffffffffffff9, R9 =0x00002b5f56f5b8c0, R10=0x0000000000000000,
R11=0x0000000000000000
R12=0x0000000000000009, R13=0x0000000000000000, R14=0x000000004177cff4,
R15=0x0000000000000000
RIP=0x00002b5f56ecc650, EFL=0x0000000000010297, CSGSFS=0x0000000000000033,
ERR=0x0000000000000004
TRAPNO=0x000000000000000e

Top of Stack: (sp=0x000000004177caa8)
0x000000004177caa8: 00002b5f56e9e98a 00000000000005e8
0x000000004177cab8: 0000000000000000 000000004177d0e0



 Comments   
Comment by andbur [ 16/Sep/08 ]

Created an attachment (id=10)
hserr_pid2847 log file found on payload node no 5 (PL5)

Comment by sheetalv [ 27/Oct/08 ]

Need more information on how to reproduce this issue. Does this issue still occur with the latest Sailfin
1.5 promoted build?

Comment by andbur [ 28/Oct/08 ]

We have only seen this once on SGCS 1.0 build 36g and have not been able to
reproduce since that. I think it's okay to close this issue unless you can see
some obvious fault based on the SIGSEGV dump and if we see it again in 1.5 or
1.0 we'll open up the issue again to help you troubleshoot.

Comment by sheetalv [ 31/Jul/09 ]

not enough information.

Comment by sheetalv [ 31/Jul/09 ]

not enough information.





[SHOAL-76] DSC logging performance improvements Created: 13/Sep/08  Updated: 09/Nov/11

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Task Priority: Trivial
Reporter: mbien Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File DSC.patch    
Issuezilla Id: 76

 Description   

wrapped logging in potential hot or concurrent code paths into
if(is loglevel loggable){
log(...);
}
to prevent unnecessary synchronization and logging overhead.



 Comments   
Comment by mbien [ 13/Sep/08 ]

Created an attachment (id=9)
diff patch

Comment by shreedhar_ganapathy [ 09/Nov/11 ]

Transferring to Joe for eval and closure.





[SHOAL-75] messages not being delivered over jxta OutputPipe.send Created: 20/Aug/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 75

 Description   

This issue was reported by shoal developer forum post at
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=111

To summarize the issue, there needs to be common place added in shoal
that checks the result of calling net.jxta.pipe.OutputPipe.send() for
whether it returns true or false. When the method returns false, the
caller should wait some small enough amount of time and then try to send again.
The send returning false means the send could not be attempted due to be out of
system resources to perform the send. Trying again will work.

So all places that Shoal is calling OutputPipe.send() should be altered to call
this common method.

The forum email confirms that it is possible to get OutputPipe.send() to return
false, thus when this happens, a message that could be delivered just does not
get sent.

Proposed fix is to add a method in com.sun.enterprise.jxtamanagment.JxtaUtil
that all methods in shoal that call net.jxta.OutputPipe.send() would call so
that the resend logic when OutputPipe.send is in one source code location.

Here is a first pass on that method that will be tried soon.

public static boolean sendMessage(OutputPipe pipe, PeerID peerId, Message
message) throws IOException {
boolean result = false;
final int MAX_SEND_ATTEMPTS = 3; // is this right amount of retries.
final int RETRY_DELAY = XXX; // in milliseconds find out what this should be
result = pipe.send(message);
int sendAttempts = 1;
while (!result && sendAttempts <= MAX_SEND_ATTEMPTS) {
try

{ Thread.sleep(RETRY_DELAY); }

catch (InterruptedException ie) {
}
result = pipe.send(message);
sendAttempts++;
}
if (!result) {
if (LOG.isLoggable(Level.FINE))

{ final String to = peerId == null ? "<broadcast to cluster>" : peerId.toString(); LOG.fine("unable to send message " + message.toString() + " to " + to + " after " + sendAttempts); }

}
return result;
}



 Comments   
Comment by shreedhar_ganapathy [ 26/Aug/08 ]

I think the proposed fix can be addressed by the LWRMulticast class. For p2p
messages, the recipient list can be a set of 1 member. Not sure if it
specifically uses a propagate pipe or a blocking wire output pipe (bwop). It
should preferably use a bwop for reliability, retransmission and flow control.

The retry logic within LWRMulticast should be vary of such failures as network
failures or hardware failures of the recipient so that it can come out of the
tcp close wait. Thus a send message operation should not be such that it would
block for the duration of the tcp retransmission timeout and once it comes out
of such a case, it should not retry. Such protections may be necessary to make
it more robust.

Comment by Joe Fialli [ 30/Oct/08 ]

fix checked into shoal trunk and integrated into sailfin communication as 1.5
nightly





[SHOAL-74] potential to miss FAILURE_NOTIFICATION when multiple instances killed at same time Created: 19/Aug/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 74

 Description   

Bug was uncovered during a code review. The bug is a FAILURE notification could
be missed when 2 or more more instances are killed at same time. (Note that
given the race condition between node agent restarting a killed instance and the
failure notification, only a test that kills the node agent and then kills
instances can be assured of seeing a FALIURE_NOTIFICATION for each server
instance killed. A node agent can restart a server instance before shoal
reports it as FAILED.)

HealthMonitor.InDoubtPeerDetector.processCacheUpdate() iterates over all
instances in cluster checking if any are in doubt. If one instance is detected
to be indoubt, HealthMonitor.InDoubtPeerDetector.determineInDoubtPeers() notifies
the FailureVerifier thread to process current cache looking for InDoubtPeers to
verify which instance should have FAILURE_NOTIFICATION sent.

synchronized (verifierLock)

{ verifierLock.notify(); LOG.log(Level.FINER, "Done Notifying FailureVerifier for " + entry.adv.getName()); }

The notification signal from InDoubtPeerDetector thread to FailureVerifier
thread is the weak link in this bug. When multiple failures happen at once, the
code is currently written to act on the first instance failure immediately. The
InPeerDoubtDetector should iterate over all instances AND if one OR more
instances are in doubt, then it should notify the FailureVerifier thread to run
over all instances in cluster cache.

Bug could be that InDoubtPeerDetector, runs twice, one notifiying
FailureVerifier() to run on instance cache and it detects first killed instance.
The second time the InDoubtPeerDetector runs, it could notify the
FailureDetector while it is still working on verifiying first failure (with a
snap shotted cache). The second notify to a running FailureVerifier thread will
have no impact and the FAILURE_NOTIFICATION for the second killed server
instance will be detected much later when the next failure occurs or the client
is shutdown.






[SHOAL-73] Change ShoalLogger WARNING for IOException to FINE Created: 24/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 73

 Description   

This is a short term change just before a shoal release.
Longer term solution is documented in a separate shoal issue. (will link when it
is available.)

Currently when there is an IOException from DAS to a server instance that was
either killed or server failed to start for some reason (like ORB bind address
still in use), it results in a WARNING that does not contain sufficient
information for an administrator to know what server instance there was a
difficulty sending a message to. Given that the WARNING was occurring for
non-error cases and there was not enough info in message for an administrator to
easily be able to figure out whether the WARNING is something that requires
attention or not. This log event is being reduced to FINE.

When such a message does occur with FINE, here is how one can correlate the
failure with a server instance name.

From a server.log, here is an event indicating a failure to send to another
server instance in the cluster. From the jxta://uuid-XXX, take the last 6
numbers and search the log for an entry that has a server instance name in it.

[#|2008-07-23T19:44:31.665-0700|WARNING|sun-glassfish-comms-server1.0|javax.enterprise.system.stream.err|_ThreadID=29;_ThreadName=MessageWindowThread;_RequestID=80b25066-a911-401d-b38b-b99a0c3aecc2;|java.io.IOException:
Unable to create a messenger to
jxta://uuid-0CEC11B5D9E64303A621B9B272CD0439FC9C0AEFDE264179A314B0C0C01C0BF803/PipeService/urn:jxta:uuid-0CEC11B5D9E64303A621B9B272CD04396521C3C52E4443928082812BCDC1E25B04
at
net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:238)
at net.jxta.impl.pipe.BlockingWireOutputPipe.send(BlockingWireOutputPipe.java:264)
at com.sun.enterprise.jxtamgmt.ClusterManager.send(ClusterManager.java:495)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:217)
at
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.sendMessage(DistributedStateCacheImpl.java:458)
at
com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.addAllToRemoteCache(DistributedStateCacheImpl.java:388)
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.handleDSCMessage(MessageWindow.java:127)
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.newMessageReceived(MessageWindow.java:107)
at com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:91)
at java.lang.Thread.run(Thread.java:619)

#]

Search for BF803, find following log entry that shows that the sendMessage was
to server instance

9: MemberId: n2c1m4, MemberType: CORE, Address:
urn:jxta:uuid-0CEC11B5D9E64303A621B9B272CD0439FC9C0AEFDE264179A314B0C0C01C0BF803

For test in question, the server instance n2c1m4 was killed to test for
FAILURE notification. Thus, the log event does not capture an event that should
be viewed as a failure. The log message needs to be improved to specifically
state what server instance the send message was going to when it failed.
Future fix will make sure that sendmessage to a FAILING instance are only
reported once in server log and reported with actual server instance name.



 Comments   
Comment by Joe Fialli [ 28/Jul/08 ]

Long term solution for this issue is described in
https://shoal.dev.java.net/issues/show_bug.cgi?id=72

Fix checked into shoal for sailfin 0.5 branch.





[SHOAL-72] need a fix for the "unable to create messenger" IOException Created: 24/Jul/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 72
Status Whiteboard:

shoal-shark-na


 Description   

The "unable to create messenger" IOException occurs in different scenarios. One of the scenarions is when
an instance is killed. before instance B can know that the instance A has been killed, it tries to send a
message via ClusterManager.send() (could be to sync the DSC or for some other reason).

When such an IOException occurs, the Shoal code should check which instance is supposedly down. Then
the code wait for a little while before finding the state that that instance is in. If the state is
alive/aliveandready, the message should be sent again as a retry. If the instance is in in_retry_mode (i.e. it
has'nt been deemed in_doubt/failed yet), then the right way of dealing with this should be decided.



 Comments   
Comment by Joe Fialli [ 28/Jul/08 ]

Short term solution described in shoal issue 73.

Change platform to ALL since issue is not specific to MAC os.

Comment by sheetalv [ 28/Jul/08 ]

short term solution in issue 73 has been added to Sailfin 0.5.

Comment by sheetalv [ 31/Jul/08 ]

assigning to Joe.





[SHOAL-71] HealthMonitor.isConnected : for loop only looks at 1 network interface Created: 23/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 71
Status Whiteboard:

shoal-shark-na


 Description   

The power outage related code in the IndoubtPeerDetector thread's isConnected() method iterates over
only 1 Future task and returns false if the future task is not done. The "return false" statement needs to be
inside the catch block for InterruptedException. Otherwise the following 2 lines of code will never get
executed in the case where the future task is not complete:

fine("Peer Machine for " + entry.adv.getName() + " is down!");
return false;



 Comments   
Comment by sheetalv [ 28/Jul/08 ]

issue fixed in the trunk.

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=636





[SHOAL-70] Exception occurs showing Member no longer in the group, when sending messages to an alive member Created: 20/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 70

 Description   

When sending messages, the call to GroupCommunicationProviderImpl.sendMessage()
with a specified target member token, checks to see if this member continues to
exist in the ClusterViewManager's ClusterView. This check is done by getting the
memberToken String's corresponding peer id by calling
clusterManager.getID(targetMemberIdentityToken)

This results in call to NetworkManager.getPeerID() each time.
Multiple calls to NetworkManager.getPeerID() passing in the exact same parameter
for instanceName returns a different Jxta UUID over a period of time.

This is a problem in itself as the consistent hashing algorithm is expected to
guarantee exact same uuid generation for a given constant seed.

Using Leehui's MultiThreadMessageSender test, it is easy to see this problem as
it reports frequently that a target instance is no longer in the group while
sending a message as the call has resulted in a new uuid that does not exist in
the cluster view manager for that known target member token.

To work around this issue, calls to getPeerID should return a cached PeerID
which was originally generated during the first call to establish the Peer in
the PeerGroup.



 Comments   
Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Fix checked in per CVS check messages :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=631
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=632

getPeerID() now consults a instanceToPeerID hashtable for pre existence of the
peer id for a given instanceName token string. If not creates it and adds to
hash table. The hash table is cleared during NetworkManager.stop()

Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Sheetal has already integrated this fix into Shoal branch for Sailfin 0.5 (SGCS
1.0).





[SHOAL-69] GroupHandle.raiseFence() needs to throw exception if fence is already raised. Created: 20/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 69

 Description   

Based on Bongjae Chang's email to dev alias:
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=103

raiseFence() method in GroupHandle does not throw an exception if the fence is
already raised but quietly returns and only succeeds when a fence is not raised.



 Comments   
Comment by shreedhar_ganapathy [ 20/Jul/08 ]

Fixed per cvs message
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=630





[SHOAL-68] HealthMonitor's getMemberState should not make a network roundtrip peer's own health state Created: 16/Jul/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 68
Status Whiteboard:

shoal-shark-na


 Description   

When a client component in the same VM calls
JoinNotificationSignal.getMemberState(), the eventual call results in HM making
a network call. This is okay for other peers but when the member involved is the
VM itself, the call should check for that and consult the local health state
cache and return that state.

This is critical for Sailfin's CLB



 Comments   
Comment by sheetalv [ 28/Jul/08 ]

will fix for Sailfin 1.5.

Comment by sheetalv [ 27/Aug/08 ]

needs to be a P2 since it needs to be fixed for Sailfin 1.5

Comment by sheetalv [ 23/Oct/08 ]

This issue has been fixed. A check to see if the member is asking for its own state has already been
checked in.





[SHOAL-67] NPE seen in ClusterManager Line 161 while executing Shoal Sample Created: 28/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 67

 Description   

Multiple users have pointed out this NPE. Cause is known and a fix is ready to
be checked in.
Thanks to David Taylor for the following stack trace :
init:
deps-jar:
compile:
run:
Jun 27, 2008 2:59:38 PM SimpleGMSSample runSimpleSample
INFO: Starting SimpleGMSSample....
Jun 27, 2008 2:59:38 PM SimpleGMSSample initializeGMS
INFO: Initializing Shoal for member: server1214603978093 group:Group1
Jun 27, 2008 2:59:38 PM SimpleGMSSample registerForGroupEvents
INFO: Registering for group event notifications
Jun 27, 2008 2:59:38 PM SimpleGMSSample joinGMSGroup
INFO: Joining Group Group1
Exception in thread "main" java.lang.NullPointerException
at
com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:162)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializ
eGroupCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at
com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupM
anagementServiceImpl.java:331)
at SimpleGMSSample.joinGMSGroup(SimpleGMSSample.java:76)
at SimpleGMSSample.runSimpleSample(SimpleGMSSample.java:46)
at SimpleGMSSample.main(SimpleGMSSample.java:25)

Also, Mike W provided the following stack trace in his email :
Also has anyone run the tests lately? GroupLeaderTest fails,

looks as though ClusterManager line 161

è this.bindInterfaceAddress =
(String)props.get(JxtaConfigConstants.BIND_INTERFACE_ADDRESS.toString());

requires the BIND_INTERFACE_ADDRESS property to be given or throws null pointer?

--ekiM



 Comments   
Comment by shreedhar_ganapathy [ 28/Jun/08 ]

Relevant Checkins
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=611

And

User: shreedhar_ganapathy
Date: 2008-06-28 15:06:07+0000
Modified:
shoal/gms/src/java/com/sun/enterprise/jxtamgmt/ClusterManager.java

Log:
Fix for issue 67 : NPE in setting Bind interface address in ClusterManager Line
161: Added check for empty Properties object.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/jxtamgmt/
===========================================================

File [changed]: ClusterManager.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/jxtamgmt/ClusterManager.java?r1=1.38&r2=1.39
Delta lines: +7 -24
--------------------
— ClusterManager.java 2008-06-26 12:56:11+0000 1.38
+++ ClusterManager.java 2008-06-28 15:06:05+0000 1.39
@@ -36,41 +36,23 @@

package com.sun.enterprise.jxtamgmt;

-import static com.sun.enterprise.jxtamgmt.JxtaUtil.getObjectFromByteArray;
import com.sun.enterprise.ee.cms.core.MemberNotInViewException;
-import net.jxta.document.AdvertisementFactory;
-import net.jxta.document.MimeMediaType;
-import net.jxta.document.StructuredDocument;
-import net.jxta.document.StructuredDocumentFactory;
-import net.jxta.document.XMLDocument;
-import net.jxta.endpoint.ByteArrayMessageElement;
-import net.jxta.endpoint.EndpointAddress;
-import net.jxta.endpoint.Message;
-import net.jxta.endpoint.MessageElement;
-import net.jxta.endpoint.TextDocumentMessageElement;
+import static com.sun.enterprise.jxtamgmt.JxtaUtil.getObjectFromByteArray;
+import net.jxta.document.*;
+import net.jxta.endpoint.*;
import net.jxta.exception.PeerGroupException;
import net.jxta.id.ID;
import net.jxta.impl.endpoint.tcp.TcpTransport;
import net.jxta.impl.pipe.BlockingWireOutputPipe;
import net.jxta.peer.PeerID;
import net.jxta.peergroup.PeerGroup;
-import net.jxta.pipe.InputPipe;
-import net.jxta.pipe.OutputPipe;
-import net.jxta.pipe.PipeMsgEvent;
-import net.jxta.pipe.PipeMsgListener;
-import net.jxta.pipe.PipeService;
+import net.jxta.pipe.*;
import net.jxta.protocol.PipeAdvertisement;
import net.jxta.protocol.RouteAdvertisement;

import java.io.IOException;
import java.io.Serializable;
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.Hashtable;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
+import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.logging.Level;
import java.util.logging.Logger;
@@ -158,8 +140,9 @@
LOG.log(Level.WARNING, ioe.getLocalizedMessage());
}
NetworkManagerRegistry.add(groupName, netManager);

  • if ( props != null )
    + if(props !=null && !props.isEmpty()) { this.bindInterfaceAddress = (String)props.get(JxtaConfigConstants.BIND_INTERFACE_ADDRESS.toString()); + }

    systemAdv = createSystemAdv(netManager.getNetPeerGroup(), instanceName,
    identityMap, bindInterfaceAddress);
    LOG.log(Level.FINER, "Instance ID :" + getSystemAdvertisement().getID());
    this.clusterViewManager = new
    ClusterViewManager(getSystemAdvertisement(), this, viewListeners);





[SHOAL-66] Join Notification Signals of own join is not notified Created: 27/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 66

 Description   

Mike Wannabaker reported the following :
==========
I believe it’s a bug then. In both accounts. My SERVER-1 only gets a SERVER-1
message and SERVER-2 only gets a SERVER-1 message.

So when I say it only gets a SERVER-1 message I mean that the method

public void processNotification(Signal p_Signal)

is only being called with that message.

So if I start just SERVER-1, I see the GMS View Changed message, with just the
SERVER-1 in it, but my processNotification(…) is not called. Not until I start
SERVER-2 does it get called.

On SERVER-2, I see the original GMS View Changed with just SERVER-2, and then
GMS View Changed with SERVER-1,SERVER-2, but only get one processNotification(…)
call.

I will investigate further next week, but if you could have a look that would be
great. Is no one else seeing this?

This is my processNotification() method

public void processNotification(Signal p_Signal)
{
try {
p_Signal.acquire();
SignalLogger log = new SignalLogger(p_Signal);
log.logIt();
if(p_Signal instanceof MessageSignal) {
MessageSignal msgSig = (MessageSignal)p_Signal;
String sMember = msgSig.getMemberToken();
Object o = ObjectUtil.toObject(msgSig.getMessage());
if(o instanceof SMessage)

{ SMessage smsg = (SMessage)o; InetAddress sender = m_hmMembers.get(sMember).address; smsg.setSender(sender); SMessageLogger.log.systemInfo(getClass(), "FireMessage: " + smsg); fireMessageReceived(smsg); //fireMessageReceived(smsg); }

else

{ SMessageLogger.log.systemInfo(getClass(), "Message is NOT SMessage??"); }

}
else if(p_Signal instanceof JoinNotificationSignal)

{ JoinNotificationSignal joinSig = (JoinNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof JoinedAndReadyNotificationSignal)

{ JoinedAndReadyNotificationSignal joinSig = (JoinedAndReadyNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureSuspectedSignal)

{ FailureSuspectedSignal suspectSig = (FailureSuspectedSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureRecoverySignal)

{ FailureRecoverySignal failureSig = (FailureRecoverySignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof FailureNotificationSignal)

{ FailureNotificationSignal failureSig = (FailureNotificationSignal)p_Signal; processClusterNotification(); }

else if(p_Signal instanceof PlannedShutdownSignal)

{ PlannedShutdownSignal shutdownSig = (PlannedShutdownSignal)p_Signal; processClusterNotification(); }

else

{ SMessageLogger.log.debug(getClass(), "Received Notification of type : " + p_Signal.getClass().getName() + " Server: " + p_Signal.getMemberToken()); }

}
catch(SignalAcquireException e)

{ SMessageLogger.log.fatal(getClass(), "Exception occured while acquiring signal", e); }

finally {
try

{ p_Signal.release(); }

catch(SignalReleaseException e)

{ SMessageLogger.log.warn(getClass(), "Exception occured while releasing signal" , e); }

}
}
From: Shreedhar.Ganapathy@Sun.COM Shreedhar.Ganapathy@Sun.COM
Sent: June 27, 2008 11:21 AM
To: users@shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working

Hi Mike
The expected behavior is that as each server starts, its registered GMS client
components will be notified of the server's own joining the group and any
subsequent joins of other members.
So in essence, server-1 GMS clients should see a JoinNotificationSignal for
server-1, and another for server-2
and in server-2, GMS clients should see a JoinNotificationSignal for server-2
and another for server-1.
The order here does not matter but correctness is important and if not its a bug
to be fixed.

In the log below, Server-1 seems to be getting its own JoinNotificationSignal
which is correct. Does it ever get the JoinNotificationSignal for server-2?
On server-2, I am seeing correct behavior.

(Ignore the log statements that show the view contents, as that is an event
coming from the provider implementation - GMS notification signals are the ones
that GMS clients should look in for correctness).

Let me know.
Thanks
Shreedhar

Mike Wannamaker wrote:

Hi Guys,

I’m still not sure it’s working as it’s supposed to? But maybe it is?

Start SERVER-1

Start SERVER-2

On SERVER-1 I get a JoinMessage but it is from SERVER-1?

On SERVER-2 I get a Join Message from SERVER-1, which is what I
would expect?

Is this correct? This depends on when the two servers are started. If I wait
for a period between startups I get SERVER-2 startup message on SERVER-1 and
SERVER-1 startup message on SERVER-2. But if I start them both at the same time
I get the above behaviour?

Starting both at virtually the same time I get …

SEVER-1 Output:

27-Jun-2008 12:40:29 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:29 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event : ADD_EVENT

27-Jun-2008 12:40:38 AM DEBUG [pool-1-thread-1]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1 >>
JoinNotificationSignalImpl @ 27/06/08 12:40 AM - [RCS_CLUSTER]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])

Server-2 Output

27-Jun-2008 12:40:30 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

27-Jun-2008 12:40:30 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view for
(before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A161779845B03

2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC8BFEC603

27-Jun-2008 12:40:38 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event : ADD_EVENT

27-Jun-2008 12:40:44 AM DEBUG [pool-1-thread-1]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1 >>
JoinNotificationSignalImpl @ 27/06/08 12:40 AM - [RCS_CLUSTER]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
============



 Comments   
Comment by shreedhar_ganapathy [ 27/Jun/08 ]

Awaiting Mike's confirmation to see if the fix is good.

Comment by sheetalv [ 09/Jul/08 ]

assigning to self

Comment by sheetalv [ 28/Jul/08 ]

Shreedhar fixed the issue in Shoal trunk. Now integrated into Sailfin 0.5.

Comment by sheetalv [ 28/Jul/08 ]

cvs message entry :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=612





[SHOAL-65] add cluster name to HealthMonitor thread descriptions and log messages Created: 26/Jun/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Trivial
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 65

 Description   

For each cluster/group, the following 3 threads get created in HealthMonitor.java.

"HealthMonitor", "InDoubtPeerDetector Thread" and "FailureVerifier Thread"

It would assist investigating stack traces and logs with multiple clusters
if the cluster/group name was integrated into these names.

For example if one has clusters cluster1, cluster2 and cluster3, these
names would be appended to above thread descriptive names AND also be
included in relevant log messages to provide more complete context.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

Fixed.

this.healthMonitorThread =
new Thread(this, "HealthMonitor for Group:" + manager.getGroupName());





[SHOAL-64] add AtomicBoolean for controlling the started variable Created: 26/Jun/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 64

 Description   

Same problem in both ClusterManager and HealthMonitor 's start().
Make sure that the AtomicBoolean is set in the beginning of the start() method.






[SHOAL-63] When I invoke GMSFactory.startGMSModule(...), some NPEs are occurred Created: 26/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 63

 Description   

When I invoke GMSFactory.startGMSModule(...), some NPEs are occurred.

  • NPE List
    1. When group name is null
    -------------------------
    Exception in thread "main" java.lang.NullPointerException
    at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerGroupID
    (NetworkManager.java:272)
    at com.sun.enterprise.jxtamgmt.NetworkManager.getInfraPeerGroupID
    (NetworkManager.java:362)
    at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
    (NetworkManager.java:261)
    at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:562)
    at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:194)
    at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:151)
    at
    com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
    upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
    at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
    at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
    (GroupManagementServiceImpl.java:339)
    at com.sun.enterprise.shoal.groupleadertest.GroupLeaderTest.main
    (GroupLeaderTest.java:67)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
    -------------------------

2. When server token is null
-------------------------
Exception in thread "main" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
(NetworkManager.java:261)
at com.sun.enterprise.jxtamgmt.NetworkManager.initWPGF(NetworkManager.java:562)
at com.sun.enterprise.jxtamgmt.NetworkManager.<init>(NetworkManager.java:194)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:151)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServiceImpl.java:339)
at com.sun.enterprise.shoal.groupleadertest.GroupLeaderTest.main
(GroupLeaderTest.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
-------------------------

3. When properties is null
-------------------------
Exception in thread "main" java.lang.NullPointerException
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:161)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGro
upCommunicationProvider(GroupCommunicationProviderImpl.java:138)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join
(GroupManagementServiceImpl.java:331)
at com.sun.enterprise.shoal.jointest.SimpleJoinTest.runSimpleSample
(SimpleJoinTest.java:40)
at com.sun.enterprise.shoal.jointest.SimpleJoinTest.main
(SimpleJoinTest.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
-------------------------



 Comments   
Comment by carryel [ 26/Jun/08 ]

These are resolved.

https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/e
e/cms/core/GMSFactory.java?r1=1.5&r2=1.6

https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/j
xtamgmt/ClusterManager.java?r1=1.37&r2=1.38





[SHOAL-62] sometimes, sending messages to a member failed though the member is still alive. Created: 12/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: leehui Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 62
Status Whiteboard:

shoal-shark-na


 Description   

Assuming that there are two nodes "A" and "B". "A" starts multiple threads to
send messages to "B", while "B" just receives messages from "A". Sometimes, "A"
throws ArrayIndexOutOfBoundsException and reports that "B" is not is not in its
view anymore though "B" is still alive. Use
com.sun.enterprise.shoal.multithreadmessagesendertest.MultiThreadMessageSender
to start two instances. In a litter while, "A" prints:

java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at sun.security.provider.DigestBase.engineUpdate
(DigestBase.java:102)
at sun.security.provider.SHA.implDigest(SHA.java:94)
at sun.security.provider.DigestBase.engineDigest
(DigestBase.java:161)
at sun.security.provider.DigestBase.engineDigest
(DigestBase.java:140)
at java.security.MessageDigest$Delegate.engineDigest
(MessageDigest.java:531)
at java.security.MessageDigest.digest(MessageDigest.java:309)
at java.security.MessageDigest.digest(MessageDigest.java:355)
at com.sun.enterprise.jxtamgmt.NetworkManager.hash
(NetworkManager.java:222)
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerGroupID
(NetworkManager.java:272)
at
com.sun.enterprise.jxtamgmt.NetworkManager.getInfraPeerGroupID
(NetworkManager.java:362)
at com.sun.enterprise.jxtamgmt.NetworkManager.getPeerID
(NetworkManager.java:261)
at com.sun.enterprise.jxtamgmt.ClusterManager.getID
(ClusterManager.java:662)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage
(GroupCommunicationProviderImpl.java:226)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl.sendMessage
(GroupHandleImpl.java:131)
at MultiThreadMessageSender$1.run
(MultiThreadMessageSender.java:52)
at java.lang.Thread.run(Thread.java:595)
2008-6-12 11:20:21
com.sun.enterprise.ee.cms.impl.jxta.GroupHandleImpl sendMessage
警告: GroupHandleImpl.sendMessage : Could not send message :
Member B is not in the View anymore. Hence not performing sendMessage operation



 Comments   
Comment by leehui [ 19/Jun/08 ]

The root cause, please see
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=79

And some fix suggestions, please see
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=81
https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=84

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 27/Aug/08 ]

added a synchronized block in NetworkManager.hash() as recommended
in july timeframe.





[SHOAL-61] when members join the group concurrently, join notifications of some members are often duplicated or missed Created: 10/Jun/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File shoal_issue61_2009_06_11.txt     Java Source File SimpleJoinTest.java    
Issue Links:
Dependency
blocks SHOAL-50 Expose MASTER_CHANGE_EVENT Resolved
Issuezilla Id: 61
Status Whiteboard:

shoal-shark-na


 Description   

This issue is similar to issue #60.
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)

Members joined the group according to the order in the issue #60,
but on the other hand members joined the group concurrently in this issue.

If all members concurrently join the group first, members don't know who is
group leader and should negotiate the leader. At this case, notifications of
some memebers are often duplicated or missed.

Here is the log of duplicated case. Assume that "A" and "B" are "TestGroup"'s
members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049
group:TestGroup
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:58 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:04:59 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:10 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:10
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
------------------------------------------------------------------------
"A" received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-3fc8b6854049).

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 11:04:54 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: e9d80499-0f8b-4e2d-8856-3f31dcc25f96
group:TestGroup
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 11:04:55 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103

2008. 6. 10 오후 11:04:56 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:01 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 11:05:04
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:09 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:12
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96
2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: e9d80499-0f8b-4e2d-8856-3f31dcc25f96, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE2E6E148CC1DB479EA7D0C6A0AF50B5A103
2: MemberId: a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC3B82201E1B545DB8E9ECF621244468F03

2008. 6. 10 오후 11:05:12 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 10 오후 11:05:15
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= a2ed5cb6-3cc7-4060-91d6-3fc8b6854049, ServerName = e9d80499-0f8b-4e2d-8856-
3f31dcc25f96, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96

------------------------------------------------------------------------
"B" also received duplicated JoinNotifications(a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049).
And because "B" is group leader, "B" don't receive own join notification.

Here is the another log of missed case. Assume that "A" ,"B" and "C"
are "TestGroup"'s members.
["A"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 197c66d7-f56c-4119-8b1e-18dc330e39d3
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:53
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = true, Signal.getMemberToken()
= 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 197c66d7-f56c-4119-8b1e-
18dc330e39d3, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["B"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b
group:TestGroup
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:41 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 468996ee-2d54-4c58-af46-72d903154e31, ServerName = 0c3c5b33-9a7d-4d85-ba1d-
7a09a52d4e4b, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------

["C"'s log]
------------------------------------------------------------------------
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 468996ee-2d54-4c58-af46-72d903154e31
group:TestGroup
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 10 오후 10:17:42 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 197c66d7-f56c-4119-8b1e-18dc330e39d3, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 197c66d7-f56c-4119-8b1e-18dc330e39d3, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE7D33395A50CC479CAA67ACEEEBD3BDDC03
2: MemberId: 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE9D818A456DA94CB5B53CB024FD26DA8B03
3: MemberId: 468996ee-2d54-4c58-af46-72d903154e31, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEBCDB7FE3F44D47DE944388A09B8081BE03

2008. 6. 10 오후 10:17:47 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 10 오후 10:17:47
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = 0c3c5b33-9a7d-4d85-ba1d-7a09a52d4e4b, ServerName = 468996ee-2d54-4c58-af46-
72d903154e31, Leader = 197c66d7-f56c-4119-8b1e-18dc330e39d3
------------------------------------------------------------------------
All members missed some join notification.

Whenever you try to test join concurrently, duplicated or missed results can be
changed.

Anyway, in concurrent join case, all members should receive join notifications
with one another and join notifications should not be duplicated if all memeber
has good health.



 Comments   
Comment by carryel [ 10/Jun/08 ]

Created an attachment (id=8)
I attached a simple test code

Comment by carryel [ 29/Jun/08 ]

1. Testing scenarios
Testing scenarios are simple. Shoal(with Jxta) don't support multiple members
becoming part of the same group from the same JVM.
So, each member should join the group with separate process(JVM).
You can test this manually with executing "SimpleJoinTest" I attached ago.
Whenever you execute "SimpleJoinTest", new member(node) can join
the "TestGroup".

I tested this with creating multiple "SimpleJoinTest"s. You maybe need 3 or
4 "SimpleJoinTest"'s processes.
a) In the beginning, there is no member and no group.
b) I executed multiple "SimpleJoinTest"s in the separate process(JVM)
concurrently.
c) I saw each log. I observed "Signal.getMemberToken()" particularly.
ex) "****JoinNotification received: GroupLeader = false, Signal.getMemberToken
() = e9d80499-0f8b-4e2d-8856-3f31dcc25f96, ServerName = a2ed5cb6-3cc7-4060-91d6-
3fc8b6854049, Leader = e9d80499-0f8b-4e2d-8856-3f31dcc25f96"

Strictly speaking, we can't execute multiple processes simultaneously. But
because each member has the discovery timeout, this is an acceptable error
range.
In other words, if you execute "SimpleJoinTest" when other "SimpleJoinTest"s
are waiting for the discovery timeout, you can reproduce strange results.

2. How does the new code behave during the discovery phase?
Assume "A", "B" and "C" will become members of the group.
In my scenario, "A", "B" and "C" will wait for the discovery timeout because
there is no master in the group.
Before they enter discovery phase, they first set the master advertisement as
own advertisement. But masterAssigned is mostly false at this time.
Mostly masterAssigned can be set as true by the following method:

  • In MasterNode.appointMasterNode()
  • In MasterNode.processMasterNodeResponse()
  • In MasterNode.processMasterNodeAnnouncement()

a) In MasterNode.appointMasterNode()
This case is that master is not assigned after the discovery timeout. ex) there
is no master in the group.
Then we use the discovery view which has other members if other members sent
any messages to me in order to put up a cadidate as master.
Of course, because the discovery view always has own advertisement, own
advertisement can become the candidate.

a-1) when own advertisement becomes the master
First, if the cadidate is own advertisement and the discovery view has other
members, clusterViewManager.setMaster() is called with discovery view's
snapshot.
Original code call clusterViewManager.setMaster() with only own view snapshot.
But because the master was already determined as own advertisement, I think
that calling clusterViewManager.setMaster() with discovery view's snapshot is
better than with only own view's snapshot.
Of course, Calling clusterViewManager.setMaster() without discovery view's
snapshot has no problem because when other members receive
processMasterNodeAnnouncement() by master's announceMaster(), they can call
sendSelfNodeAdvertisement(). But if discovery view has them and setMaster() is
called with discovery view, sendSelfNodeAdvertisement() is unnecessary at this
case because master view already has them. So they can set the master directly
without sendSelfNodeAdvertisement().

And about calling announceMaster(),
-----------------------------------------------------
[original appointMasterNode() in MasterNode.java]
...
if (madv.getID().equals(localNodeID)) {
...
if(clusterViewManager.getViewSize() > 1)

{ announceMaster(manager.getSystemAdvertisement()); }

...
}
-----------------------------------------------------

It can be edited this as the following code
-----------------------------------------------------
if (madv.getID().equals(localNodeID)) {
...
//if(clusterViewManager.getViewSize() > 1)

{ announceMaster(manager.getSystemAdvertisement()); //}

...
}
-----------------------------------------------------
In other words, if own advertisement becomes the master, announceMaster() is
always called. When I am debuging this, though one more member joined the
group, sometimes clusterViewManager.getViewSize() could be equal to 1 in a
short time. So I think that for safety it is better that it should be edited.
Though announceMaster() is called when clusterViewManager.getViewSize() is
equal to 1, it is no problem because we don't receive own message.

a-2) when other member's advertisement becomes the master
Original code always set the master without notification. Then sometimes
master's view can't be updated. see the following code.
-----------------------------------------------------
[appointMasterNode() method in MasterNode.java]
...
clusterViewManager.setMaster(madv, false);
...
-----------------------------------------------------

-----------------------------------------------------
[setMaster(advertisement, notify) method in ClusterViewManager.java]

if ( !advertisement.equals(masterAdvertisement))

{ ... // notify }

-----------------------------------------------------
As you see, if current member already set the master, notify is not called.
If first we already called setMaster(advertisement, false) in
MasterNode.appointMasterNode(), when master sends new view to me later and I
receive the view through processMasterNodeAnnouncement() or
processMasterNodeResponse(), notifying new view is not called, though setMaster
() can be called with new view because current masterAdvertisement is already
same to master's advertisement.
So I think it should be also edited. If cadidate is other member, I don't call
setMaster(advertisement, false). Though we don't set the master now, we can
receive the master change event through processMasterNodeAnnouncement() or
processMasterNodeResponse() later.

b) In MasterNode.processMasterNodeResponse():
MASTER_CHANGE_EVENT is notified with master view's snapshot by Issue #60
(https://shoal.dev.java.net/issues/show_bug.cgi?id=60)
Additional patch is that when sendSelfNodeAdvertisement() is called,
MASTER_CHANGE_EVENT also is notified with master view's snapshot.

c) In MasterNode.processMasterNodeAnnouncement():
This is very similar to above b). Like b) It should be edited.

So now I want to describe how the new code behaves during the discovery phase.
Actually, new code behaves like old code's original purpose. There is no big
changes.

1) If "A", "B" and "C" joined the group concurrently and when all member are
waiting for the discovery timeout.

1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master.
So all members call announceMaster(). Then All members receive master's
announcements and become aware of master's collision through checkMaster().
Master's collision can be resolved by ID. When the member affirms master node
role or resign master node role, the member notify MASTER_CHANGE_EVENT.
Though original code didn't notify MASTER_CHANGE_EVENT when the member affirms
master node role, I think that it should be edited.
Above a-1) though the member already called setMaster() and notified
MASTER_CHANGE_EVENT and master was not changed, we should notify
MASTER_CHANGE_EVENT because master's view already was changed by collision. If
we don't notify the event, we can't become aware of view changes quickly in
the collision case. Of course, if another event will be occurred later, this
member(master) can become aware of view changes. But I think view changes
should be applied as soon as possible.

1-2) If all members receive each other member's message and discovery view has
all members, candidate is selected from discovery view by TreeMap's ordering
sort.
If all members select the same cadidate, the cadidate member will send master
announcement. other members will process processMasterNodeAnnouncement() and
set the master with current master's view snapshot.

If some members receive each other member's message and some members don't
receive, 1-1) and 1-2) are mixing.

2) If some nodes joined the group late
If some members join the group and there is already master,
new members will send master node query to all members and the master node will
process processMasterNodeQuery(). Then the master node will send master
response with master view's snapshot and new members will process
processMasterNodeResponse() and set the master with current master's view.

3. How the code behaves when a node is shutdown and :
I think my semantics don't have an effect on shutdown algorithm. I know
shutdown and failure case are connected with HealthMonitor.
But I think that some logic about startup in HealthMonitor should be edited.
When node is starting and HealthMonitor is started, MasterNode.probeNode() can
be called by HealthMonitor.
In "1-1) If all members receive no other member's message and discovery view
doesn't have any members, all members try to become the master"'s case and
master collision case,
if MasterNode.probeNode() is called by HealthMonitor, processMasterNodeResponse
() can be processed. Because processMasterNodeResponse() doesn't assume
collision case, sometime unexpected results can be occurred in the master
selection algorithm.
So I think that health monitor should only start after master discovery was
finished.
So this change don't have an effect on shutdown.

When a node which is not mater restarted before it is determined to be failed,
master's view is same. So members which already joined the group don't receive
any changes.
The node which restarted receives all members' join notifications by master's
response.
When a node which is not master restared after it has been ejected from the
cluster, master's view is changed. So members which already joined the group
only receive failed node's join notification because master already removed the
node from master's view . The node which restarted receives all members' join
notifications by master's response.

When a node which is master restarted before it is determined to be failed, the
node which was master sends discovery messages to all members and waits for the
discovery timeout.
Maybe because other members are not master, the master node don't receive any
messages. So the master node sends master announcement included only own
advertisement to members. Then because members know that master's view only has
master advertisement, members call sendSelfNodeAdvertisement(). Then master can
become aware of existing members through processNodeResponse(). The master node
can receive join notifications of all members. Other members don't receive any
changes because they first call sendSelfNodeAdvertisement() and return before
setMaster().

When a node which is master restarted after it has been ejected from the
cluster, members already elected new master. When master was failed and new
master was elected, because members' view had no additional member, members
don't receive any join events. But when a node which was master restarted, the
node sends master discovery message and receives new master's response. So the
node receives all existing members' join notifications from new master and
other members receives only failed member's join notification.

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

assigning to self

Comment by Joe Fialli [ 06/Feb/09 ]

Reviewing carryel's submitted fix for this issue.
Have already checked in submitted test case and it can be run
via "ant simplejointest".

Comment by carryel [ 22/Jun/09 ]

Created an attachment (id=18)
I attached the proposed patch for history





[SHOAL-60] when new member joins existing group, this member can't receive join notifications of others that already joined Created: 10/Jun/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: carryel Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Source File SimpleJoinTest.java    
Issuezilla Id: 60

 Description   

Assuming that "A", "B" and "C" are members in "TestGroup". Sometimes when new
member join, this member can't receive join notifications of others that
already joined.

This scenario is following(Assume that members will join the same group
according to the order, not concurrently).

1. First, "A" joined and became a group leader.
2. after 1, "B" joined. Then "B" received "A"'s a join notification and own
("C") join notification in "B". No problem.
3. after 2, "C" joined. At this time, "C" must receive "A", "B" and "C" join
notifications in "C". But "C" didn't receive "B"'s a join notification.

Like above, assuming that "A", "B", "C" and "D" are members in "TestGroup", "D"
didn't receive "B" and "C"'s join notifications.

above 3, when new member joined, this member don't receive some members' join
notifications.(In other words, this member receives only own notification and
group leader's notification)

You can also see this result from following logs.

"A"(the group leader): member id="6a92713c-d83e-49a8-8aaa-ad12046a1acb"
"B": member id="77ff0a1c-b9a1-417a-b04c-0028ef6da921"
"C": member id="6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a"
When memebers receive a join notification, "***JoinNotification received:
ServerName = [MY_MEMBER_ID], Signal.getMemberToken() = [MEMBER_ID]" printed.

["A"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:17 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 6a92713c-d83e-49a8-8aaa-ad12046a1acb
group:TestGroup
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:18 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:36:18 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103

2008. 6. 5 오후 1:36:18 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:36:44
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a92713c-d83e-49a8-8aaa-
ad12046a1acb, Signal.getMemberToken() = 77ff0a1c-b9a1-417a-b04c-0028ef6da921
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:03
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a92713c-d83e-49a8-8aaa-
ad12046a1acb, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------

["B"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 77ff0a1c-b9a1-417a-b04c-0028ef6da921
group:TestGroup
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:40 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:36:41
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 6a92713c-d83e-49a8-8aaa-ad12046a1acb
2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
2: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:36:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:36:41
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 77ff0a1c-b9a1-417a-b04c-0028ef6da921
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 77ff0a1c-b9a1-417a-b04c-
0028ef6da921, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------

["C"'s log]
------------------------------------------------------------------------
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Starting SimpleJoinTest....
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
initializeGMS
ì •ë³´: Initializing Shoal for member: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
group:TestGroup
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Registering for group event notifications
2008. 6. 5 오후 1:36:59 com.sun.enterprise.shoal.jointest.SimpleJoinTest
runSimpleSample
ì •ë³´: Joining Group TestGroup
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event :
MASTER_CHANGE_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a8e7161-92ef-4b9e-a5e1-
d9a8c7665b4a, Signal.getMemberToken() = 6a92713c-d83e-49a8-8aaa-ad12046a1acb
2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
ì •ë³´: GMS View Change Received for group TestGroup : Members in view for
(before change analysis) are :
1: MemberId: 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE0F6B7D5CD8CC447180F2D059E273AD5103
2: MemberId: 6a92713c-d83e-49a8-8aaa-ad12046a1acb, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CE15F3706F0E794BF595FCEE9EEA90FCE103
3: MemberId: 77ff0a1c-b9a1-417a-b04c-0028ef6da921, MemberType: CORE, Address:
urn:jxta:uuid-0836778E36C54F728D5B934A965395CEC9482BF0C6A44D55B407E7E3A8D1339803

2008. 6. 5 오후 1:37:00 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
ì •ë³´: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 6. 5 오후 1:37:00
com.sun.enterprise.shoal.jointest.SimpleJoinTest$JoinNotificationCallBack
processNotification
ì •ë³´: ***JoinNotification received: ServerName = 6a8e7161-92ef-4b9e-a5e1-
d9a8c7665b4a, Signal.getMemberToken() = 6a8e7161-92ef-4b9e-a5e1-d9a8c7665b4a
------------------------------------------------------------------------
"C"'s log don't have "B"'s join notification(77ff0a1c-b9a1-417a-b04c-
0028ef6da921).



 Comments   
Comment by carryel [ 10/Jun/08 ]

Created an attachment (id=7)
I attached a simple test code

Comment by carryel [ 10/Jun/08 ]

This is normal case. e.g) member "B"'s behavior
1. When new member("C") joins the group, the group leader(master) sends
MASTERNODERESPONSE to group members with ADD_EVENT(about "C") and own view's
snapshot finally.
2. Members receive MASTERNODERESPONSE and process processMasterNodeResponse().
3. In processMasterNodeResponse(), ADD_EVENT notified with master view's
snapshot by ClusterViewManager.
4. Then, ViewWindow analyzes the event packet(ADD_EVENT).
5. Finally, members receive a join notification about new member(about "C").

But In new memeber("C"), some problem occurred. There is no logic about
notifying other members' ADD_EVENT(about "B")
1. When new member("C") joins the group, the group leader(master) sends
MASTERNODERESPONSE to group members with ADD_EVENT and own view's snapshot
finally.[same above]
2. "C" receive MASTERNODERESPONSE and process processMasterNodeResponse()[same
above]
3. In processMasterNodeResponse(), MASTER_CHANGE_EVENT notified without master
view's snapshot because current master is self.
4. Then, ViewWindow analyzes the event packet(MASTER_CHANGE_EVENT). Of course
when ViewWindow receives MASTER_CHANGE_EVENT, ViewWindow notifies join
notifications based on view history if previous view doesn't have any members.
Maybe this is the logic for notifying other members' join notifications in new
member("C"). But current view based on event packet(MASTER_CHANGE_EVENT) is not
master view unfortunately. Current view has only "C"'s local view(currently
only master member and own member added). So only master's join notification
occurred.
5. In processMasterNodeResponse(), ADD_EVENT notified with master view's
snapshot by ClusterViewManager.[same above]
6. Then, ViewWindow analyzes the event packet(ADD_EVENT).[same above]
7. new member("C") receives own join notification.[same above]

So, I think this problem can be fixed if MASTER_CHANGE_EVENT notified with
master view's snapshot above 3. Then above 4, ViewWindow can find that previous
view doesn't have other members as well as master member. And then above 5, In
processMasterNode(), ClusterViewManager can notifies only ADD_EVENT without
master view's snapshot because MASTER_CHANGE_EVENT included master view's
snapshot already notified.

Comment by carryel [ 10/Jun/08 ]

This is now resolved.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=595





[SHOAL-59] when the node fails, the node details don't get removed from DSC. Created: 18/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: leehui Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Windows


Issuezilla Id: 59
Status Whiteboard:

shoal-shark-na


 Description   

The API gms.getAllMemberDetails() should return the same information on all
nodes. But when the node fails, the node details don't get removed from DSC. You
can use
gms.getAllMemberDetails() to get failed node's details. The relative test code
is in shoal's test directory, and named as
com.sun.enterprise.shoal.memberdetailstest.MemberDetailsTest.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 31/Jul/09 ]

Fix for this would be the following:

Register a shoal FAILURE event handler.
When a FAILURE is received, the distributed state cache should be flushed for
the failing instance.

*******

Possible bug if this is not fixed.
If a FENCE is left raised. (if an instance is performing recovery for another
instance and raises a FENCE while doing the repair) and the stale FENCE data is
left in distributed state cache, the false info will prevent another instance to
recover the instance that had the FENCED raised and then a fatal failure occurred.

Accessing stale dsc data in general does not cause bugs, but the above would be
a bug.





[SHOAL-58] One instance was killed, other instances were not properly notified about that event. Created: 13/May/08  Updated: 25/Nov/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: easarina Assignee: sheetalv
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File server.log.051208_13_08.asqe-sblade-5.n2c1m5     Text File server.log.051208_13_08.asqe-sblade-5.n2c1m5_short    
Issue Links:
Dependency
depends on SHOAL-55 more reliable failure notification Open
Issuezilla Id: 58
Status Whiteboard:

shoal-shark-na


 Description   

Sailfin 1.0 b32. SuSE and Solaris 10 Sparc machines. On SuSE I had a cluster
with three instances, on Solaris I had a cluster with 10 instances. In both
cases I've tried to kill one instance under such conditions:
1) Without sip traffic.
2) With "low" sip traffic.
3) With "big" sip traffic.

Not always, but in many cases, most cases were under conditions "no traffic"
and especially "big traffic", when an instance was killed I did not see in the
log files of other instances: FAILURE_EVENT, Assigned recovery server messages
and in many cases IN_DOUBDT_EVENT. But I always saw ADD_EVENT and
JOINED_AND_READY_EVENT.

When the events were absent I saw in the logs such warnings:

[#|2008-05-07T14:31:05.324-0700|WARNING|sun-comms-appserver1.0|ShoalLogger|_ThreadID=24;_ThreadName=pool-2-thread-19;_RequestID=0d9efdac-4f92-4e0c-bb87-6107fc42ed1e;|
Could not send the LWR Multicast message to get the member state of
urn:jxta:uuid-C9C9584023FA421AA3F0A79F128543642168480919FC4885BAA1EF5F3ED12B2D03
IOException : Unable to create a messenger to
jxta://uuid-C9C9584023FA421AA3F0A79F128543642168480919FC4885BAA1EF5F3ED12B2D03/PipeService/urn:jxta:uuid-C9C9584023FA421AA3F0A79F128543647F57B1A44FDB469D9F94BE4B0722C28904|#]

I've turned the ShoalLogger to FINEST and collected logs.

In the logs I've saw that, for example, for 10-instances cluster, while one
instance was killed, the "true View Size" always was 11. It looks, that other
instances were not properly notified that one instance was killed. I've
attached one log. In two formats: original log and the same log with shorter
lines, to make the log more readable. In that case was killed n1c1m4 instance.



 Comments   
Comment by easarina [ 13/May/08 ]

Created an attachment (id=5)
server.log

Comment by easarina [ 13/May/08 ]

Created an attachment (id=6)
short server.log

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 ]

This issue is similar to issue 55.

      • This issue has been marked as a duplicate of 55 ***




[SHOAL-57] Provide a JMX MBean to list and configure configuration settings Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 57

 Description   

A JMX MBean to list and configure Shoal's providers would be very useful from a
management standpoint.

Additionally, this MBean could also provide runtime statistics ranging from
number of views, current views to request/response metrics.
Adding a placeholder RFE for this purpose.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-56] Document Configuration Settings Available to users Created: 12/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 56

 Description   

We need to document configuration settings that are available to users with
clear explanations on what they are and in some cases under what circumstances
these should be used.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

marking as Enhancement

Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-55] more reliable failure notification Created: 09/May/08  Updated: 25/Nov/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issue Links:
Dependency
blocks SHOAL-58 One instance was killed, other instan... Resolved
Issuezilla Id: 55
Status Whiteboard:

shoal-shark-na


 Description   

Instance A is either going down or under load. So Instance B starts to retry its
connection to instance A. Before instance B can deem instance A as dead or
alive, there needs to be an intermediate state called "in_retry_mode" that can
help the GMS clients.
For e.g. CLB can make use of this state to ping instance A again after a little
while.
In memory rep code can also make use of this intermediate state to determine
that instance A is in "in_retry_mode" and then if the pipecloseevent has
occurred, then a new pipe can be created if instance A is now alive.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by Joe Fialli [ 27/Aug/08 ]

2 cases to address:

1. false positives occurring when miss 3 heartbeats from an instance that
is in middle of full GC. (full GC can take 12 to 15 seconds).
Other instances in cluster receive incorrectly receive FAILURE_NOTIFICATION
and instance is still running once full gc completes.

2. nodeagent detects a failed instance and restarts before shoal can detect
the instance has failed and notify others in cluster. Happens on faster,
newer machines.

Comment by sheetalv [ 27/Aug/08 ]
      • Issue 58 has been marked as a duplicate of this issue. ***
Comment by sheetalv [ 27/Oct/08 ]

too big of an architecture change for Sailfin 1.5. NA for Sailfin 1.5.

Comment by sheetalv [ 31/Jul/09 ]

WatchDog notification implementation has been added to Shoal. This takes care of
case 2 (DAS restart) of what Joe has mentioned above.





[SHOAL-54] gms.getGroupHandle().getGroupLeader() throws NullPointerException Created: 03/May/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: leehui Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Windows XP
Platform: Windows


Issuezilla Id: 54
Status Whiteboard:

shoal-shark-na


 Description   

If invoke gms.getGroupHandle().getGroupLeader() at once after gms.join(), the
application will throw NullPointerException occasionally, especially when you
run the application in console from command line.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 09/Jul/08 ]

assigning to myself





[SHOAL-53] MSG LOSS: MISSING Failure Events in GMS/glassfish Created: 30/Apr/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sviveka Assignee: sheetalv
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 53

 Description   

Build : b31
setup : 9 instaces cluster

Kill one of 9 instances randomly
sleep for 30 seconds
search individual instance logs for FAILURE_EVENT
Almost once in four failures one or more instances failed to receive the FAILURE
notification.



 Comments   
Comment by sviveka [ 30/Apr/08 ]

ccing Sheetal

Comment by shreedhar_ganapathy [ 30/Apr/08 ]

..

Comment by sviveka [ 09/May/08 ]

Not reproducible





[SHOAL-52] NPE in LWRMulticast.pipeMsgEvent Created: 16/Apr/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 52

 Description   

Issue detected with glassfish nightly FCS of 4/10/2008 and patch shoal-gms.jar
and jxta.jar. Issue observed in running SIFT functional test and glassfish
replication devtest. Additionaly had a jxtalogging.jar that enabled jxta
warnings.

The following NPE happened many, many times. Stack trace differs but the
LWRMulticast.java line 223 is always line with NPE.

[#|2008-04-14T17:58:39.461-0400|SEVERE|sun-comms-
appserver1.0|net.jxta.impl.pipe.InputPipeImpl|_ThreadID=21;_ThreadName=Executor

  • 2;_RequestID=c77a2b94-7058-4cc7-b46d-049a8d506499;|Uncaught Throwable in
    listener for : urn:jxta:uuid-
    1DA181435A1B427E8C0DB06E6FEE831A7F57B1A44FDB469D9F94BE4B0722C28904
    (com.sun.enterprise.jxtamgmt.LWRMulticast)
    java.lang.NullPointerException
    at com.sun.enterprise.jxtamgmt.LWRMulticast.pipeMsgEvent
    (LWRMulticast.java:223)
    at net.jxta.impl.pipe.InputPipeImpl.processIncomingMessage
    (InputPipeImpl.java:219)
    at net.jxta.impl.pipe.WirePipe.callLocalListeners(WirePipe.java:383)
    at net.jxta.impl.pipe.WirePipe.processIncomingMessage(WirePipe.java:359)
    at net.jxta.impl.pipe.WirePipeImpl.processIncomingMessage
    (WirePipeImpl.java:338)
    at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage
    (EndpointServiceImpl.java:964)
    at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
    (EndpointServiceInterface.java:342)
    at net.jxta.impl.rendezvous.RendezVousServiceProvider.processReceivedMessage
    (RendezVousServiceProvider.java:502)
    at net.jxta.impl.rendezvous.StdRendezVousService.processReceivedMessage
    (StdRendezVousService.java:240)
    at net.jxta.impl.rendezvous.RendezVousServiceProvider.processIncomingMessage
    (RendezVousServiceProvider.java:159)
    at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage
    (EndpointServiceImpl.java:964)
    at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
    (EndpointServiceInterface.java:342)
    at net.jxta.impl.endpoint.mcast.McastTransport.processMulticast
    (McastTransport.java:749)
    at net.jxta.impl.endpoint.mcast.McastTransport$DatagramProcessor.run
    (McastTransport.java:871)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
    (ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run
    (ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:595)
    #]


 Comments   
Comment by shreedhar_ganapathy [ 16/Apr/08 ]

..

Comment by shreedhar_ganapathy [ 16/Apr/08 ]

..

Comment by hamada [ 16/Apr/08 ]

Patch committed to address initialization race condition of local id.

— LWRMulticast.java 2008-02-23 00:31:20+0000 1.9
+++ LWRMulticast.java 2008-04-15 16:32:05+0000 1.10
@@ -139,18 +139,7 @@
public LWRMulticast(ClusterManager manager,
PipeAdvertisement pipeAd,
PipeMsgListener msgListener) throws IOException {

  • super();
    joinGroup(manager, pipeAd, msgListener);
  • MessageTransport endpointRouter =
    (manager.getNetPeerGroup().getEndpointService()).getMessageTransport("jxta");
  • if (endpointRouter != null) {
  • routeControl = (RouteControl)
    endpointRouter.transportControl(EndpointRouter.GET_ROUTE_CONTROL, null);
  • RouteAdvertisement route = routeControl.getMyLocalRoute();
  • if (route != null) { - routeAdvElement = new TextDocumentMessageElement(ROUTEADV, - (XMLDocument) route.getDocument(MimeMediaType.XMLUTF8), null); - }
  • }
    -
    }

/**
@@ -161,9 +150,8 @@

  • @param msgListener The application message listener
  • @throws IOException if an io error occurs
    */
  • public void joinGroup(ClusterManager manager,
  • PipeAdvertisement pipeAd,
  • PipeMsgListener msgListener) throws IOException {
    + public void joinGroup(ClusterManager manager, PipeAdvertisement pipeAd,
    PipeMsgListener msgListener) throws IOException
    Unknown macro: {+ if (pipeAd.getType() != null &&!pipeAd.getType().equals(PipeService.PropagateType)) { throw new IOException("Only propagate pipe advertisements are supported"); }@@ -174,13 +162,23 @@ throw new IllegalArgumentException("msgListener can not be null"); }

    this.manager = manager;
    + this.localPeerID = manager.getNetPeerGroup().getPeerID();
    + srcElement = new StringMessageElement(SRCIDTAG, localPeerID.toString(),
    null);
    +
    + MessageTransport endpointRouter =
    (manager.getNetPeerGroup().getEndpointService()).getMessageTransport("jxta");
    + if (endpointRouter != null)

    Unknown macro: {+ routeControl = (RouteControl)endpointRouter.transportControl(EndpointRouter.GET_ROUTE_CONTROL, null);+ RouteAdvertisement route = routeControl.getMyLocalRoute();+ if (route != null) { + routeAdvElement = new TextDocumentMessageElement(ROUTEADV, + (XMLDocument) route.getDocument(MimeMediaType.XMLUTF8), null); + }+ }

    this.msgListener = msgListener;
    this.pipeAdv = pipeAd;
    this.pipeSvc = manager.getNetPeerGroup().getPipeService();
    this.in = pipeSvc.createInputPipe(pipeAd, this);
    outputPipe = pipeSvc.createOutputPipe(pipeAd, 1);

  • localPeerID = manager.getNetPeerGroup().getPeerID();
  • srcElement = new StringMessageElement(SRCIDTAG, localPeerID.toString(),
    null);
    LOG.log(Level.FINEST, "Statring LWRMulticast on pipe id :" +
    pipeAdv.getID());
    bound = true;
    }
    @@ -220,7 +218,7 @@

MessageElement element;
PeerID id = getSource(message);

  • if (id.equals(localPeerID)) {
    + if (id != null && id.equals(localPeerID)) { //loop back return; }
Comment by sheetalv [ 09/Jul/08 ]

This is not consistently reproducible.

Comment by sheetalv [ 28/Jul/08 ]

don't see this issue frequently. Hence lowering the priority.

Comment by Joe Fialli [ 27/Aug/08 ]

Fix checked in for issue 52 as noted on April 16 comments below.

Have not seen issue recently, marking as fixed.





[SHOAL-51] signal.getMemberDetails().get(WAITTIMEBEFORESTARTINGRECOVERY))) returns null Created: 16/Apr/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 51
Status Whiteboard:

shoal-shark-na


 Description   

The following stack trace happened 112 times in Steve's SIFT functional run.

java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:415)
at java.lang.Integer.parseInt(Integer.java:497)
at
com.sun.enterprise.ee.server.autotxrecovery.core.TxnFailureRecoveryActionImpl.co
nsumeSignal(TxnFailureRecoveryActionImpl.java:99)
at com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call
(Router.java:509)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

#]

Here is source code line with NPE.
waitTime = Integer.parseInt((String)(signal.getMemberDetails().get
(WAITTIMEBEFORESTARTINGRECOVERY)));

The code in com.sun.enterprise.ee.cms.core.TxnFailureRecoveryActionImpl.java is
not written to allow for WAITTIMEBEFORESTATINGRECOVERY to not be set. I am
assuming this constant should always be set and filing the missing value as a
bug against shoal. If my assumption is incorrect, then this bug should be re-
assigned to glassfish issue tracker so the Failure Recovery handler can be fixed
to accomodate this scenario.



 Comments   
Comment by Joe Fialli [ 16/Apr/08 ]

Initial submit was premature. made summary more specific.

Comment by shreedhar_ganapathy [ 16/Apr/08 ]

..

Comment by sheetalv [ 09/Jul/08 ]

Assigning to Joe.

Comment by sheetalv [ 28/Jul/08 ]

not important for Sailfin 0.5

Comment by Joe Fialli [ 27/Aug/08 ]

has not occurred recently so downgrading it.

Comment by Joe Fialli [ 07/Oct/10 ]

values are not guaranteed to be in distributed state cache for a failed member.
The member may have failed during initialization, which happened frequently with
port in use failures on startup in glassfish v2.1 time frame.

The WAITFORTIME is no longer stored by glassfish v3.1 transaction in distributed
state cache. But the TX_LOG_DIR property is. But new v3.1 code does account for
the value not being set.

Closing this issue. Functional testing of distributed state cache is desirable
to verify that all is working.





[SHOAL-50] Expose MASTER_CHANGE_EVENT Created: 14/Apr/08  Updated: 25/Nov/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: krijnschaap Assignee: carryel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Java Archive File group_leader_notification_patch.jar     Java Archive File group_leader_notification_patch.jar    
Issue Links:
Dependency
depends on SHOAL-61 when members join the group concurren... Open
Issuezilla Id: 50

 Description   

Create a signal whenever the master for the cluster has changed. Either add a
promoted to master/demoted as master to the involved nodes, or a global signal
that just notifies all nodes of the new master, the second probably being the
most flexible solution.



 Comments   
Comment by shreedhar_ganapathy [ 24/Jun/08 ]

Reassigning the enhancement request to community member Bongjae

Comment by carryel [ 14/May/09 ]

This issue could be dependent on issue #61 because issue #61 could change
MASTER_CHANGE_EVENT's notification timiming.

But, anyway, I made the patch by which a user could be aware of
MASTER_CHANGE_EVENT.

A user can receive GroupLeaderShipNotificationSignal when MASTER_CHANGE_EVENT
is occurred by adding GroupLeaderShipNotificationActionFactory to gms like this.


gms.addActionFactory( new GroupLeaderShipNotificationActionFactoryImpl(
callback ) );


I will attach the proposed patch, diff and test app.

Comment by carryel [ 14/May/09 ]

Created an attachment (id=16)
attched the patch and diff

Comment by carryel [ 17/May/09 ]

I received feedbacks from Joe and Shreedhar.

So I modified the proposed patch more.

Here are changes.

  • GroupLeaderShip ===> GroupLeadership
  • all member types as well as CORE can receive the group leadership event.
  • the log message is modified for debugging when a group leadership event is
    fired.
  • GroupLeadershipNotificationSignal has the following API.
    /**
  • provides a list of the previous view's snapshot at time signal arrives.
    *
  • @return List containing the list of <code>GMSMember</code>s which are
    corresponding to the view
    */
    List<GMSMember> getPreviousView();

/**

  • provides a list of the current view's snapshot at time signal arrives.
    *
  • @return List containing the list of <code>GMSMember</code>s which are
    corresponding to the view
    */
    List<GMSMember> getCurrentView();

/**

  • provides a list of all live and current CORE designated members.
    *
  • @return List containing the list of member token ids of core members
    */
    List<String> getCurrentCoreMembers();

/**

  • provides a list of all live members i.e. CORE and SPECTATOR members.
    *
  • @return List containing the list of member token ids of all members
    */
    List<String> getAllCurrentMembers();
Comment by carryel [ 17/May/09 ]

Created an attachment (id=17)
I attached the proposed patch and diff again

Comment by carryel [ 19/May/09 ]

I received feedback from Joe.

Joe wrote:
>A WATCHDOG is not allowed to be a GroupLeader and at this time I don't
envision it needing to know the master or of master change event.
>The WATCHDOG currently broacdcasts failure notification to all members of the
cluster so it does not assume it knows which instance is the Master.
>So lets not send groupleadership events to WATCHDOG

So recent patch should be modified more like this.

In ViewWindow.java

private void analyzeMasterChangeView(final EventPacket packet)

{ ... if( !getGMSContext().isWatchdog() ) addGroupLeadershipNotificationSignal( token, member.getGroupName(), member.getStartTime() ); }

Thanks to Joe.

Comment by Joe Fialli [ 18/Jun/09 ]

Proposed changes are being regression tested by Shoal QA test.

Comment by Joe Fialli [ 05/Feb/10 ]

Fix has been integrated in past.





[SHOAL-49] Provide a converse to JoinedAndReady when consuming app did not get ready Created: 25/Mar/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 49

 Description   

When the consuming application or product is ready to process its operations, it
can use Shoal's new JoinedAndReady reporting facility to let group members know
of this state.
The converse state of this situation may be a valuable piece of information for
administrative or monitoring applications.

If the application could not get into the joined and ready state for any reason
(for instance, an application server consuming Shoal could not complete its
startup and failed midway), then such an unready state can be conveyed through a
notification that specifically identifies this state.

Need an appropriate name for such a notification so it is meaningful.



 Comments   
Comment by shreedhar_ganapathy [ 25/Mar/08 ]

..





[SHOAL-48] Gracefully shutdown does not work, but also creates sideeffects Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 48

 Description   

Hi from Sweden,

I found out that GMS gracefull shutdown,

//leaves the group gracefully
gms.shutdown(GMSConstants.shutdownType.INSTANCE_SHUTDOWN);

does not work, but also has some serious sideeffects :

  • first i get an exception

Exception in thread "MessageWindowThread" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)

  • afterwards, all other group members will see the member that called shutdown()
    permanently in the group member list . This means that the member will appear
    in member list for ever (until all members and their processes terminate).

This have been verified both though out prints and by calling

List<String> members = groupHandle.getAllCurrentMembers();

On the other hand if the members JVM terminates (not gracefull shutdown) the
member is correctly removed.

Please note that this is verified on java 6 patch 4 and the libraries from
sailfin v1 b22. Please also note that i am not running sailfin/glassfish, just
plain JVM.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

Thanks for filing this issue. I will look into it.

Comment by sheetalv [ 18/Mar/08 ]
      • Issue 47 has been marked as a duplicate of this issue. ***
Comment by shreedhar_ganapathy [ 20/Mar/08 ]

The problem with gratefull shutdow were apparent in pure v1-b25, but were fixed
when the pach were applyied.
(this was actually a big issue when we e g wanted to replace cards in a cluster
and the subscriber on the removed server instance where never removed from the
group).
So you can remove issue 48 also.





[SHOAL-47] Gracefully shutdown does not work, but also creates sideeffects Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 47

 Description   

Hi from Sweden,

I found out that GMS gracefull shutdown,

//leaves the group gracefully
gms.shutdown(GMSConstants.shutdownType.INSTANCE_SHUTDOWN);

does not work, but also has some serious sideeffects :

  • first i get an exception

Exception in thread "MessageWindowThread" java.lang.NullPointerException
at
com.sun.enterprise.ee.cms.impl.jxta.MessageWindow.run(MessageWindow.java:86)
at java.lang.Thread.run(Thread.java:619)

  • afterwards, all other group members will see the member that called shutdown()
    permanently in the group member list . This means that the member will appear
    in member list for ever (until all members and their processes terminate).

This have been verified both though out prints and by calling

List<String> members = groupHandle.getAllCurrentMembers();

On the other hand if the members JVM terminates (not gracefull shutdown) the
member is correctly removed.

Please note that this is verified on java 6 patch 4 and the libraries from
sailfin v1 b22. Please also note that i am not running sailfin/glassfish, just
plain JVM.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

This issue has already been filed under issue 48.

      • This issue has been marked as a duplicate of 48 ***
Comment by sheetalv [ 18/Mar/08 ]

This issue has been reopened to track the second part of the description.

Comment by shreedhar_ganapathy [ 20/Mar/08 ]

Based on the following email snippet from Babbis, the issue is resolved with the
1.1 bits
==
The problem with gratefull shutdow were apparent in pure v1-b25, but were fixed
when the pach were applyied.
(this was actually a big issue when we e g wanted to replace cards in a cluster
and the subscriber on the removed server instance where never removed from the
group).
So you can remove issue 48 also.

==





[SHOAL-46] .shoal directory size increases dangerously Created: 18/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: babbisx Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: OpenSolaris


Issuezilla Id: 46

 Description   

Hi from Sweden,

After sending a number of GMS messages, i found out that there is a catalog
named .shoal in the locatio where i run the JVM.
This catalog i several cases exceeded 300 MBs in size which may lead to fill the
file system.

Please note that i am testing shoal on java 6 patch 4, using the libraries from
sailfin v1 b22, not sailfin.

BR

Babbis



 Comments   
Comment by sheetalv [ 18/Mar/08 ]

Thanks for filing this issue. I will look into this issue asap.

Comment by shreedhar_ganapathy [ 18/Mar/08 ]

While we investigate this issue, could you try the latest shoal and jxta jars
that have now been integrated into Sailfin build 24a. For your reference, the
following contains those specific jars :
https://shoal.dev.java.net/files/documents/5135/89898/shoal-1.1_03132008.zip

Comment by shreedhar_ganapathy [ 18/Mar/08 ]

Hi Babbis,
While we investigate this issue, could you try the latest shoal and jxta jars
that have now been integrated into Sailfin build 24a. For your reference, the
following contains those specific jars :
https://shoal.dev.java.net/files/documents/5135/89898/shoal-1.1_03132008.zip

Thanks
Shreedhar

Comment by shreedhar_ganapathy [ 20/Mar/08 ]

Based on the following email from Babbis, the issue is resolved with the 1.1 bits
==
Hi,

This fix worked fine ! No file leaks were after heavy traffic.
I have tested the fix on v1-b25.

Thanks

Babbis
==





[SHOAL-45] MessageWindow NullPointerException handling GMSMessage Created: 17/Mar/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: garyfeltham Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 45
Status Whiteboard:

shoal-shark-na


 Description   

GroupHandle#sendMessage(String targetComponentName, byte[] message) contract
states that "Specifying a null component name would result in the message being
delivered to all registered components" on forwarding a message such as

sendMessage(null, "Example message".getBytes());

a NullPointerException is thrown from MessageWindow#handleGMSMessage where

if(gMsg.getComponentName().equals(GMSConstants.shutdownType.GROUP_SHUTDOWN.toString()))

gMsg.getComponentName() is null.

A fix is:

if (gMsg.getComponentName()!=null &&
gMsg.getComponentName().equals(GMSConstants.shutdownType.GROUP_SHUTDOWN.toString()))



 Comments   
Comment by shreedhar_ganapathy [ 18/Mar/08 ]

Thanks for bringing this to our attention. We will add a test case for this
situation.
Reassigning to Sheetal for fix.

Comment by sheetalv [ 28/Jul/08 ]

not important for Sailfin 0.5

Comment by sheetalv [ 25/Sep/08 ]

Fixed in Shoal trunk.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=646





[SHOAL-44] accessing JXTA's System ADV information or equivalent Created: 19/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: mbien Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 44

 Description   

JXTA stores in the system advertisement a lot of useful and never changing
information about the node's runtime environment. It would be great if Shoal
would provide this kind of immutable "node info" additional to the mutable "node
details" (DistributedStateCache).

proposed public API changes:
-node info getter in the GMS
-node info getter in Signal
-mechanism for adding custom values on node join

workaround with DistibutedStateCache possible
but:
-redundant communication
-values are not guaranteed to arrive at the same time



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

assigning to self





[SHOAL-43] unable to create messenger exception from JXTA Created: 16/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Critical
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 43

 Description   

Hi Mo,
I still keep seeing exceptions as follows even when running the rungmsdemo.sh :

[#|2008-02-15T16:57:14.511-0800|WARNING|Shoal|ShoalLogger|_ThreadID=14;_ThreadName=pool-
1-thread-
3;ClassName=DistributedStateCacheImpl;MethodName=getFromCacheForPattern;|GMSException during
DistributedStateCache Sync....com.sun.enterprise.ee.cms.core.GMSException: java.io.IOException:
Unable to create a messenger to jxta://uuid-
1070C0A4278E4D80AA34A4BA3D45F734CBEC5C6CD8414CB09EF1AC46AC045C7D03/PipeService/urn
:jxta:uuid-1070C0A4278E4D80AA34A4BA3D45F7346521C3C52E4443928082812BCDC1E25B04|#]

There are no changes in my workspace except for some minor log messages. This should be
reproducible just by running the ApplicationServer test.

Can you please look into this? I have noticed it after turning the log level to INFO. It probably got lost
when the log level was set to FINEST.

Simply start the test in 2 terminals as follows :

<terminal 1> sh rungmsdemo.sh C1 G1 CORE 200000 INFO
<terminal 2> sh rungmsdemo.sh C2 G1 CORE 200000 INFO

Thanks
Sheetal



 Comments   
Comment by sheetalv [ 28/Feb/08 ]

This issue has been fixed. Please see

https://shoal.dev.java.net/issues/show_bug.cgi?id=28

Comment by shreedhar_ganapathy [ 25/Mar/08 ]

Vivek reports seeing this with GF v2.1 build 24c which has Jxta jar svn version
537.

[#|2008-03-25T17:41:49.101-0700|WARNING|sun-appserver9.1|ShoalLogger|_ThreadID=20;_ThreadName=pool-2-thread-4;_RequestID=378e023c-9136-4a10-b128-278092756024;|GMSException
during DistributedStateCache
Sync....com.sun.enterprise.ee.cms.core.GMSException: java.io.IOException: Unable
to create a messenger to
jxta://uuid-D061310EC6A64B22A06AB63D5D1A4DC47987FC1134E54090AB24B0C9E01AD7DF03/PipeService/urn:jxta:uuid-D061310EC6A64B22A06AB63D5D1A4DC46521C3C52E4443928082812BCDC1E25B04|#]

[#|2008-03-25T17:41:49.103-0700|SEVERE|sun-appserver9.1|javax.enterprise.resource.corba|_ThreadID=20;_ThreadName=pool-2-thread-4;_RequestID=378e023c-9136-4a10-b128-278092756024;|The
log message is null.
java.lang.NullPointerException
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.addMember(IiopFolbGmsClient.java:359)
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.handleSignal(IiopFolbGmsClient.java:286)
at
com.sun.enterprise.ee.ejb.iiop.IiopFolbGmsClient.consumeSignal(IiopFolbGmsClient.java:174)
at
com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call(Router.java:509)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

Comment by sheetalv [ 25/Apr/08 ]

This issue has been fixed and is available in promoted build b31 of
SJSAS91_FCS_BRANCH.





[SHOAL-42] masterNode.getRouteControl().isConnected returns false intermittently or all the time Created: 16/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Critical
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 42

 Description   

Hi Mo,
I added a log message to print out the value of isConnected() in the HealthMonitor class. I ran the
rungmsdemo.sh test on 2 terminals and I see the value printed as false all the time.
Please comment. This is crucial since the "false" value would mean that our logic to determine
IN_DOUBT state would need to be altered.
Thanks
Sheetal

On Feb 14, 2008, at 4:27 PM, Sheetal Vartak wrote:

Hi Mo,
As you are aware, one of the instances that is started while running my MultiGroupTest suddenly
decides to go into IN_DOUBT state. I was looking at the values computed for
masterNode.getRouteControl().isConnected(entry.id). I found that the value is sometime false and
sometimes true. What I don't understand is how can the value be false at one point (time = t) and then
at some other point (time t+delta) it becomes true.
Can you please shed some light on this?
Thanks
Sheetal



 Comments   
Comment by sheetalv [ 28/Feb/08 ]

Mo has fixed this issue in the single cluster scenario. It still does not work
correctly in a multi-cluster environment. The MultiGroupJoinTest produces false
failures due to isConnected() returning false intermittently.
Test can be run in 2 terminals as follows :
sh runmultigroupjointest.sh C1
sh runmultigroupjointest.sh C2

change log :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=531

Comment by sheetalv [ 04/Mar/08 ]

This issue is now resolved. Added a fix in HealthMonitor to not check for
isConnected for the same instance where the VM is running. The HealthMonitor's
InDoubtPeerDetector thread now iterates through all the entries but skips its
own entry since isConnected() obviously returns false for its own self.

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=536





[SHOAL-41] Add support in Shoal for passing in cert stores Created: 11/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 41

 Description   

Jxta and other service providers provide a notion of encryption through various
means. The Properties object that passes in configurational data to service
provider backends should pass in a certstore to the Jxta service provider so
that end to end security can be optionally provided.






[SHOAL-40] (User Feedback) : provide ability to choose network interface on which to have group communication Created: 01/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 40

 Description   

As requested by John Kim in Shreedhar's blog entry :
http://blogs.sun.com/shreedhar/entry/sailfin_drives_a_new_feature

Need to expose a configuration in Shoal and support in JXTA.



 Comments   
Comment by hamada [ 01/Feb/08 ]

This is already supported in JXTA. A new property constant needs to be defined
in JxtaConfigConstants TCPADDRESS which is turn passed to NetworkManager and set
in NetworkManager.startDomain()

config.setTcpInterfaceAddress(TCPADDRESS);

Comment by sheetalv [ 11/Feb/08 ]

Fix for this issue has been checked in.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=530





[SHOAL-39] HealthMonitor should report if the Network Interface of the local peer is down Created: 01/Feb/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 39

 Description   

This will provide an additional layer of failure reporting which will help
diagnose problems for customers.

JDK 6 provides a facility for this such as the NetworkInterface API.



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

Changing to Enhancement





[SHOAL-38] HealthMonitoring support for hardware/network failures avoiding TCP timeouts Created: 01/Feb/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 38

 Description   

With a hardware or network failure, current JxtaMgmt provider's HealthMonitor
will go into TCP timeout which on certain systems can be as long as 10 minutes.
Need a timeout based mechanism to allow applications to configure a timeout
after which a TCP socket connection based liveness check should terminate and
assign the member as failed. This is needed to provide robustness in the face of
hardware failures.

Fix for this needs to come from JXTA for SailFin as it is a critical req for
Ericsson.



 Comments   
Comment by shreedhar_ganapathy [ 24/Jun/08 ]

Sheetal has integrated a fix into the trunk wrt this feature. The feature allows
health monitoring to report a failure when a failure detection related tcp
connection is blocked for a configured timeout (set to 30 seconds default).

The timeout is configured using the FAILURE_DETECTION_TCP_RETRANSMIT_TIMEOUT and
FAILURE_DETECTION_TCP_RETRANSMIT_PORT properties specified in
ServiceProviderConfigurationKeys.java.

Javadoc corresponding to these properties are as follows:
FAILURE_DETECTION_TCP_RETRANSMIT_PORT
This value of this key is a port common to all cluster members where
a socket will be attempted to be created when a particular instance's configured
periodic heartbeats have been missed for the max retry times.

FAILURE_DETECTION_TCP_RETRANSMIT_TIMEOUT
Maximum time that the health monitoring protocol would wait for a
reachability query to block for a response.





[SHOAL-37] expose API to determine if group is shutting down Created: 30/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Improvement Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: Macintosh


Issuezilla Id: 37

 Description   

Need to add an API :

isGroupShuttingDown() in GroupManagementServiceImpl which will call
ShutdownHelper.isGroupBeingShutdown().

This API can be used by GMS clients like in-memory replication so that replication can be done before
instance shuts down.

Add test to query this API ( should return right info if not shutting down).



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

This has been fixed.
See following check ins :

https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=519
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=520
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=521
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=522
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=523
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=524

A test has been added as well.





[SHOAL-36] messages received not in same order as when sent Created: 30/Jan/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: sheetalv Assignee: hamada
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 36
Status Whiteboard:

shoal-shark-na


 Description   

I did some extensive testing to see if the messages that are sent, get received by the other instances in
the same order.

I wrote a simple test for testing point to point message send - receive : tests/com/sun/enterprise/ee/
cms/tests/p2pmessagesend/P2PMessageSendAndReceive.java

This test uses the MessageAction Signal to receive messages. I started 2 instances (one is the sender
while the other is the receiver). 10 messages are sent by instance A in sequence. The receiver i.e.
instance B does not receive the messages in the same order that they were sent in.
The logs show that the same thread takes care of calling the processNotification() method for each
message received. So its not a threading issue.

The main line of code is ClusterManager.send(id, message) which then calls outputPipe.send(message).

Here are the logs :

Sender :
INFO: Sending messages...
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 0 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 1 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 2 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 3 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 4 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 5 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 6 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 7 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 8 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 9 sent from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive sendMessages
INFO: Message 10 sent from C1 to Group

Receiver :
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 0 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 1 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 3 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 5 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 4 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 7 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 2 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 8 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 6 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 10 from C1 to Group
Jan 29, 2008 2:28:26 PM
com.sun.enterprise.ee.cms.tests.p2pmessagesend.P2PMessageSendAndReceive processNotification
INFO: Message: P2PMsgSendReceive : message 9 from C1 to Group

I also tried another test. I added some code to ClusterManager.main() to send and receive messages
using the ClusterMessageListener model. The ClusterMessageListener's handleClusterMessage() method
is implemented in GroupCommunicationProviderImpl which puts the incoming message into
MessageQueue which is a FIFO order queue. The MessageWindow then takes the message and goes
through MessageAction API which is a single thread.
In this case, the logs are showing a different thread calling the ClusterManager.pipeMsgEvent() every
time. So looks like the PipeMsgListener is spawning a new thread every time it calls pipeMsgEvent().

I tried checking the order of the messages received via the Listener model in the 1st case as well, as
mentioned above . The sequence of messages received by
ClusterMessageListener.handlerClusterMessage() and the MessageAction signal API is the same. So the
underlying Jxta layer seems to be sending the messages out of order.

main method changes to ClusterManager :

public static void main(final String[] argv) {
JxtaUtil.setupLogHandler();
LOG.setLevel(Level.INFO);
final String name = System.getProperty("INAME", "instanceName");
final String groupName = System.getProperty("GNAME", "groupName");
LOG.log(Level.INFO, "Instance Name :" + name);
final Map props = getPropsForTest();
final Map<String, String> idMap = getIdMap();
final List<ClusterViewEventListener> vListeners =
new ArrayList<ClusterViewEventListener>();
final List<ClusterMessageListener> mListeners =
new ArrayList<ClusterMessageListener>();
vListeners.add(
new ClusterViewEventListener() {
public void clusterViewEvent(
final ClusterViewEvent event,
final ClusterView view) {
//LOG.log(Level.INFO, "event.message", new Object[]

{event.getEvent().toString()}

);
//LOG.log(Level.INFO, "peer.involved", new Object[]

{event.getAdvertisement().toString ()}

);
//LOG.log(Level.INFO, "view.message", new Object[]

{view.getPeerNamesInView().toString ()}

);
}
});
mListeners.add(
new ClusterMessageListener() {
public void handleClusterMessage(
final SystemAdvertisement id, final Object message)

{ LOG.log(Level.INFO, id.getName()); LOG.log(Level.INFO, "SHEETAL : message received = " + new String(((GMSMessage) message).getMessage())); }

}
);
final ClusterManager manager = new ClusterManager(groupName,
name,
idMap,
props,
vListeners,
mListeners);
manager.start();
//manager.waitForClose();
if (System.getProperty("TYPE").equals("sender")) {
final Object waitLock = new Object();
LOG.log(Level.INFO, "wait 10 secs to shutdown");
synchronized (waitLock) {
try

{ waitLock.wait(10000); }

catch (InterruptedException e)

{ e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. }
}
LOG.log(Level.INFO, "Sending messages...");
final ID id = manager.getID("client2");
for (int i = 0; i <= 10; i++) {
final GMSMessage gMsg = new GMSMessage(name,
MessageFormat.format("P2PMsgSendReceive : message {0} from {1} to {2}", i, name,
groupName).getBytes(),
groupName, Long.getLong("10"));
try { manager.send(id, gMsg); LOG.info("Message " + i + " sent from " + name + " to " + groupName); } catch (IOException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. }

}
manager.waitForClose();
} else if (System.getProperty("TYPE").equals("receiver")) {
final Object waitLock = new Object();
LOG.log(Level.INFO, "wait 30 secs to shutdown");
synchronized (waitLock) {
try

{ waitLock.wait(30000); }

catch (InterruptedException e)

{ e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. }

}
manager.waitForClose();
}
System.exit(0);
}

run_client.sh :

java Dcom.sun.management.jmxremote -DINAME=client$1 -DTYPE=$2 -cp ./lib/jxta.jar:dist/shoal
gms.jar com.sun.enterprise.jxtamgmt.ClusterManager



 Comments   
Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 ]

Requires more input from the JXTA team.





[SHOAL-35] member state shld not returned as DEAD for a member not in view Created: 23/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 35

 Description   

if a member is not in the view, the state should be returned as UNKNOWN and not
as DEAD.
Also one last check needs to be done for the state before returning (state could
have changed just before returning) so that the right state is returned.



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

The HealthMonitor.getState() has been modified to fix this.





[SHOAL-34] Join notif states different for master and other members Created: 23/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: sheetalv Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 34

 Description   

(Reported in Sailfin) when an instance comes up, the master instance get JOIN
notification with member's state as ALIVE. However other instances get JOIN
notification with member's state as STARTING.
This needs to be looked into.



 Comments   
Comment by sheetalv [ 01/Feb/08 ]

There is an issue in Sailfin for the same :
https://sailfin.dev.java.net/issues/show_bug.cgi?id=420

Comment by sheetalv [ 28/Feb/08 ]

It is OK for instance A to see instance B's state as ALIVE while instance C sees
instance B's state as STARTING or READY. All these 3 states are considered
healthy for an instance for which a Join notif is sent out.
A fix has gone into Shoal workspace to make sure that the most up-to-date state
is returned back via LWRMulticast. A Shoal integration will make the fix
available in Sailfin.

change log at :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=535





[SHOAL-33] test for DSCMessage Created: 23/Jan/08  Updated: 23/Jun/10

Status: Open
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 33

 Description   

DSCMessages should also be sent P2P. This needs to be checked.






[SHOAL-32] test for getCurrentAliveOrReadyMembers Created: 23/Jan/08  Updated: 07/Oct/10  Resolved: 07/Oct/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Task Priority: Major
Reporter: sheetalv Assignee: sheetalv
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 32

 Description   

Need to test the above API from performance and correctness perspective.



 Comments   
Comment by Joe Fialli [ 07/Oct/10 ]

method is deprecated.

replaced by GroupHandle.getCurrentAliveAndReadyCoreView().

There is a dev test already written for it and it is run nightly.





[SHOAL-31] Health Monitor's reportJoinedAndReadyState doesnt send local cluster view event in master node Created: 19/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 31

 Description   

In Health Monitor, the reportJoinedAndReadyState method does not send out a
local clusterViewEvent notification when the sender is the assigned Master. This
is a useful notification even in the case where master node's consuming
application requires such a notification.

Fix is understood



 Comments   
Comment by shreedhar_ganapathy [ 19/Jan/08 ]

Fix checked in
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=514





[SHOAL-30] Health Monitor's getState returns state of members as DEAD if health messages have not yet started Created: 19/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 30

 Description   

getState in HealthMonitor needs to make appropriate assumptions regarding
members who are in clusterViewManager but not yet sent out a health message.
Although this is a small window of time, members state is shown as DEAD at this
time. This happens even for local peer's getState call.
Fix is understood.



 Comments   
Comment by shreedhar_ganapathy [ 19/Jan/08 ]

fix checked in
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=514





[SHOAL-29] API to Signal to group that each member's application is ready to start operations Created: 15/Jan/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Major
Reporter: shreedhar_ganapathy Assignee: sheetalv
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 29

 Description   

Parent applications employing Shoal typically using
GMSFactory.startGMSModule(...) API might find it limiting that Shoal's group
JoinNotificationSignal signifies the application employing it to be ready to
start its operations. This is particularly the case when products that have
sequences of services that have to be started that need to be part of a group
early on but also need the ability to know when the app is ready to be operated
upon.
For instance, where a load balancer is employing Shoal as the health monitoring
system by participating in it as a SPECTATOR member, and the cluster it is load
balancing also uses Shoal, the LB needs to know when the instances are ready to
accept requests.

Further, appserver instances in the cluster may want to know when an instance is
in ready state so that operations such as say, data replication can occur.
This RFE is for a JoinedAndReadyNotificationSignal which would signify to all
members of the group that the member is not only joined the group but is also
ready to start operations.

Additionally, the JoinNotificationSignal requires an additional API to return
the joined member's health state through a method getMemberState(). This would
cover for cases where an instance has already sent out a
JoinedAndReadyNotificationSignal and another instance needs to know the health
state of this member.
The state machine of an instance's startup should be starting, ready and then
alive. Both ready and alive signify an instance's ready and available state.



 Comments   
Comment by shreedhar_ganapathy [ 15/Jan/08 ]

Sheetal and I have checked in this feature through various cvs checkins.
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=503
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=497
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=496
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=495
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=494
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=492
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=485
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=469

We may need to expose an API that allows applications to query the state of
members through GroupHandle.





[SHOAL-28] Add support for multicluster support Created: 10/Nov/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: New Feature Priority: Critical
Reporter: hamada Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File issue28.patch     Java Source File NetworkManager.java    
Issuezilla Id: 28

 Description   

The following patch adds multi cluster support by separating multicast out of
the world group out into the net group, thus allowing total isolation down to
the network layer.



 Comments   
Comment by hamada [ 10/Nov/07 ]

Created an attachment (id=3)
Allows for multi cluster creation

Comment by shreedhar_ganapathy [ 28/Jan/08 ]

The patch supporting multi cluster did not work after the netPeerGroup, started
and stopped fields in NetworkManager that were earlier static were make non-static.
The real issue with Jxta then surfaces whereby the WorldPeerGroup does not allow
for multiple infrastructure peer groups each with their own multicast address
space.

Also see related issue filed in Sailfin that is affecting CLB requirement. A
early fix for this issue would help then proceed with their module's testing.

Comment by shreedhar_ganapathy [ 28/Jan/08 ]

Related Sailfin Issue
https://sailfin.dev.java.net/issues/show_bug.cgi?id=446

Comment by sheetalv [ 30/Jan/08 ]

Created an attachment (id=4)
updated NetworkManager.java

Comment by sheetalv [ 30/Jan/08 ]

I tried running rungmsdemo.sh with the updated NetworkManager class. Made "netPeerGroup",
"started" and "stopped" as non-static variables.
Still no luck. I see the following exception while running the test :
[#|2008-01-30T10:01:24.871-0800|WARNING|Shoal|ShoalLogger|
_ThreadID=25;_ThreadName=pool-1-
thread-4;ClassName=DistributedStateCacheImpl;MethodName=getFromCacheForPattern;|
GMSException during DistributedStateCache Sync....com.sun.enterprise.ee.cms.core.GMSException:
java.io.IOException: Unable to create a messenger to jxta://
uuid-1070C0A4278E4D80AA34A4BA3D45F734CBEC5C6CD8414CB09EF1AC46AC045C7D03/
PipeService/
urn:jxta:uuid-1070C0A4278E4D80AA34A4BA3D45F7346521C3C52E4443928082812BCDC1E25B04|#]

I see the above exception stemming from some other APIs while running the multi-group test.
In the multi-group test, there are 2 instances running separate VMs trying to join 2 groups.
Instances A and B join groups G1 and G2. A and B send messages to both groups. B receives the
messages for group1 but not for group2. Similarly A receives messages for group1 but not for group2.
The following exception is seen when B is trying to send mesage to group2.

Jan 30, 2008 2:35:55 PM com.sun.enterprise.jxtamgmt.HealthMonitor send
WARNING: Failed to send message
java.io.IOException: Unable to create a messenger to jxta://
uuid-45716DDDE2C34663A79D9C808283F839117F856F1C3A4CC083A5D3A0BC2CCD7F03/
PipeService/
urn:jxta:uuid-45716DDDE2C34663A79D9C808283F8397F57B1A44FDB469D9F94BE4B0722C28904
at net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:221)
at net.jxta.impl.pipe.BlockingWireOutputPipe.send(BlockingWireOutputPipe.java:245)
at com.sun.enterprise.jxtamgmt.HealthMonitor.send(HealthMonitor.java:426)
at com.sun.enterprise.jxtamgmt.HealthMonitor.reportMyState(HealthMonitor.java:359)
at com.sun.enterprise.jxtamgmt.HealthMonitor.process(HealthMonitor.java:289)
at com.sun.enterprise.jxtamgmt.HealthMonitor.pipeMsgEvent(HealthMonitor.java:217)
at net.jxta.impl.pipe.InputPipeImpl.processIncomingMessage(InputPipeImpl.java:219)
at net.jxta.impl.pipe.WirePipe.callLocalListeners(WirePipe.java:374)
at net.jxta.impl.pipe.WirePipe.processIncomingMessage(WirePipe.java:350)
at net.jxta.impl.pipe.WirePipeImpl.processIncomingMessage(WirePipeImpl.java:338)
at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage(EndpointServiceImpl.java:
989)
at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
(EndpointServiceInterface.java:352)
at net.jxta.impl.rendezvous.RendezVousServiceProvider.processReceivedMessage
(RendezVousServiceProvider.java:502)
at net.jxta.impl.rendezvous.StdRendezVousService.processReceivedMessage
(StdRendezVousService.java:240)
at net.jxta.impl.rendezvous.RendezVousServiceProvider.processIncomingMessage
(RendezVousServiceProvider.java:159)
at net.jxta.impl.endpoint.EndpointServiceImpl.processIncomingMessage(EndpointServiceImpl.java:
989)
at net.jxta.impl.endpoint.EndpointServiceInterface.processIncomingMessage
(EndpointServiceInterface.java:352)
at net.jxta.impl.endpoint.mcast.McastTransport.processMulticast(McastTransport.java:752)
at net.jxta.impl.endpoint.mcast.McastTransport$DatagramProcessor.run(McastTransport.java:874)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:613)
Jan 30, 2008 2:35:55 PM com.sun.enterprise.ee.cms.tests.multigroupjoin.MultiGroupJoinTest main
SEVERE: Exception occured while joining group:com.sun.enterprise.ee.cms.core.GMSException:
java.io.IOException: Unable to create a messenger to jxta://
uuid-45716DDDE2C34663A79D9C808283F839117F856F1C3A4CC083A5D3A0BC2CCD7F03/
PipeService/
urn:jxta:uuid-45716DDDE2C34663A79D9C808283F8396521C3C52E4443928082812BCDC1E25B04

Comment by sheetalv [ 28/Feb/08 ]

Mo has fixed this issue. The MultiGroupJoinTest shows that an instance can be
part of 2 groups and send and receive messages to it.

change log :
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=525
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=526
https://shoal.dev.java.net/servlets/ReadMsg?list=cvs&msgNo=527





[SHOAL-27] java.lang.IllegalStateException is thrown with default config Created: 06/Nov/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: mbien Assignee: hamada
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All
URL: https://shoal.dev.java.net/servlets/ReadMsg?list=users&msgNo=55


Issuezilla Id: 27
Status Whiteboard:

as91ur1-na


 Description   

GMSFactory.startGMSModule(nodeName, groupName,
GroupManagementService.MemberType.CORE, null)

throws:

Exception in thread "main" java.lang.IllegalStateException: Must specify
rendezvous if 'useOnlySeeds' is enabled and configured as client
at net.jxta.impl.protocol.RdvConfigAdv.getDocument(RdvConfigAdv.java:523)
at
net.jxta.platform.NetworkConfigurator.getPlatformConfig(NetworkConfigurator.java:1778)
at com.sun.enterprise.jxtamgmt.NetworkManager.start(NetworkManager.java:397)
at com.sun.enterprise.jxtamgmt.ClusterManager.<init>(ClusterManager.java:145)
at
com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:129)
at com.sun.enterprise.ee.cms.impl.jxta.GMSContext.join(GMSContext.java:122)
at
com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServiceImpl.java:309)
at net.java.fishfarm.GridNodeController.startNode(GridNodeController.java:89)
...

this is a regression over Shoal 1.0



 Comments   
Comment by shreedhar_ganapathy [ 06/Nov/07 ]

assigning it hamada for better handling.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

Will not make it into GlassFish v2 Update release.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

..

Comment by sheetalv [ 01/Feb/08 ]

tried passing null for properties in the GMSFactory.startGMSModule() API as
mentioned below in ApplicationServer test. Could not reproduce the exception
mentioned below.





[SHOAL-26] Threads don't shutdown Created: 06/Aug/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: bryon Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Windows XP
Platform: Windows


Issuezilla Id: 26
Status Whiteboard:

as91ur1-na, shoal-shark-na


 Description   

It appears that there are 3 threads that don't shutdown when trying to shutdown
Shoal; they are: ViewWindowThread, MessageWindowThread, and
com.sun.enterprise.ee.cms.impl.common.Router. These threads do not terminate
and so the process does not die when performing a shutdown. After examining the
code, it doesn't appear that the shutdown flag is ever changed to signal the
threads that a shutdown is in progress. Also, 2 of the threads don't appear to
ever be interrupted.



 Comments   
Comment by shreedhar_ganapathy [ 11/Aug/07 ]

reassigning to Sheetal to address this issue.

Comment by shreedhar_ganapathy [ 21/Aug/07 ]

lowering to p4 as its not a release stopper for GlassFish which is tracking p1s,
p2s, p3s for v2's FCS.

Comment by shreedhar_ganapathy [ 06/Nov/07 ]

Will not make it into GlassFish v2 update release.

Sheetal could you take a look at this issue and address it in time for SailFin's
feature freeze?

Comment by sheetalv [ 09/Jul/08 ]

NA for Sailfin 1.0

Comment by sheetalv [ 27/Aug/08 ]

Need to interrupt the ViewWindow and MessageWindow threads in GMSContext.leave() since GMSContext
starts the threads.
Router starts the SignalHandler thread and it needs to interrupt it during shutdown.

Comment by sheetalv [ 27/Oct/08 ]

re-assigning to Joe.

Comment by Joe Fialli [ 09/Sep/09 ]

tested and integrated patch submitted by Bongjae.





[SHOAL-25] NPE from net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger Created: 12/Jun/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Minor
Reporter: zorro Assignee: hamada
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: OpenSolaris


Issuezilla Id: 25

 Description   

The following NPE was encountered among 100+ executions without any NPEs.
java.lang.NullPointerException
at
net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:181)
at net.jxta.impl.pipe.BlockingWireOutputPipe.send(BlockingWireOutputPipe.java:209)
at com.sun.enterprise.jxtamgmt.MasterNode.send(MasterNode.java:927)
at com.sun.enterprise.jxtamgmt.MasterNode.probeNode(MasterNode.java:738)
at com.sun.enterprise.jxtamgmt.HealthMonitor.process(HealthMonitor.java:243)
at com.sun.enterprise.jxtamgmt.HealthMonitor.pipeMsgEvent(HealthMonitor.java:216)
at net.jxta.impl.pipe.InputPipeImpl.processIncomingMessage(InputPipeImpl.java:231)
at net.jxta.impl.pipe.WirePipe.callLocalListeners(WirePipe.java:388)
at net.jxta.impl.pipe.WirePipe.processIncomingMessage(WirePipe.java:363)
at net.jxta.impl.pipe.WirePipeImpl.processIncomingMessage(WirePipeImpl.java:357)
at
net.jxta.impl.endpoint.QuotaIncomingMessageListener.doOne(QuotaIncomingMessageListener.java:489)
at
net.jxta.impl.endpoint.QuotaIncomingMessageListener$ListenerExecutorTask.run(QuotaIncomingMessageListener.java:297)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

Scenario:
http://vivazapata.red.iplanet.com:8080/hudson/job/temp1/129/artifact/06_Machine.html



 Comments   
Comment by shreedhar_ganapathy [ 11/Aug/07 ]

reassigning to Mo for addressing in Jxta platform.

Comment by shreedhar_ganapathy [ 11/Aug/07 ]

closing this issue as the dependency on QuoteIncomingListener has been removed
from the current jxta jar. The earlier run was using b57 of Glassfish which had
older shoal and jxta jars.





[SHOAL-24] Incorrect handling of Master Change Events in ViewWindow Created: 06/Jun/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 24

 Description   

ViewWindow does the following on occurrence of Master Change Event.
Compares new view with previous one and if new view contains less number of
members, assumes the missing member to have failed. This can cause false failure
to be reported to GMS clients when in fact the Master may not have yet
discovered the missing member.

Failure determination should be left to the underlying provider's health
monitoring service to report.



 Comments   
Comment by shreedhar_ganapathy [ 06/Jun/07 ]

User: shreedhar_ganapathy
Date: 2007-06-07 00:29:09+0000
Log:
Fix for Issue 24: Incorrect handling of master change event for failures
View Window now does not assume that when a Master Change Event's view contains
less number of members than prior view, the missing members have failed. It
leaves the failure determination to the underlying providers to notify failure.
This should address the false failure notifications we see in large clusters.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================

File [changed]: ViewWindow.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/ViewWindow.java?r1=1.15&r2=1.16
Delta lines: +2 -26
--------------------
— ViewWindow.java 2007-05-01 19:22:52+0000 1.15
+++ ViewWindow.java 2007-06-07 00:29:07+0000 1.16
@@ -30,7 +30,6 @@
import com.sun.enterprise.jxtamgmt.SystemAdvertisement;

import java.io.Serializable;
-import java.text.MessageFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
@@ -44,7 +43,7 @@
/**

  • @author Shreedhar Ganapathy
  • Date: Jun 26, 2006
  • * @version $Revision: 1.15 $
    + * @version $Revision: 1.16 $
    */
    class ViewWindow implements com.sun.enterprise.ee.cms.impl.common.ViewWindow,
    Runnable
    Unknown macro: { private GMSContext ctx;@@ -207,7 +206,6 @@ packet.getClusterView().getSize() != views.get(views.size() - 2).size()) { determineAndAddNewMemberJoins(); - determineAndAddFailureSignals(packet); } }

@@ -250,29 +248,6 @@
return tokens;
}

  • private void determineAndAddFailureSignals(final EventPacket packet) {
  • if (views.size() < 2) { - return; - }
  • final List<GMSMember> oldMembership = views.get(views.size() - 2);
  • String token;
  • for (GMSMember member : oldMembership) { - token = member.getMemberToken(); - analyzeAndFireFailureSignals(member, token, packet); - }
  • }
    -
  • private void analyzeAndFireFailureSignals(final GMSMember member,
  • final String token,
  • final EventPacket packet) {
    -
  • if (member.getMemberType().equalsIgnoreCase(CORETYPE) &&
    !getCurrentCoreMembers().contains(token)) { - logger.log(Level.INFO, "gms.failureEventReceived", token); - addFailureSignals(packet); - getGMSContext().removeFromSuspectList(token); - }
  • }
    -
    private void addPlannedShutdownSignals(final EventPacket packet) { final SystemAdvertisement advert = packet.getSystemAdvertisement(); final String token = advert.getName(); @@ -524,6 +499,7 @@ logger.log(Level.WARNING, e.getLocalizedMessage()); }

    catch (Exception e)

    { logger.log(Level.WARNING, "Exception during DSC sync:"+e); + e.printStackTrace(); }

    }
    }





[SHOAL-23] Occassional NPE see in DistribuedStateCache sync() Created: 02/Jun/07  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: shreedhar_ganapathy Assignee: shreedhar_ganapathy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 23

 Description   

While running com.sun.enterprise.shoal.ShoalMessagingTest, the NPE below is
being seen. This is not happening when the rungmsdemo.sh(bat) test is run.
Possible issue with some object's init not happening in time.

Jun 2, 2007 9:00:32 AM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow syncDSC
WARNING: Exception during DSC sync:java.lang.NullPointerException
java.lang.NullPointerException
at com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.syncCac
he(DistributedStateCacheImpl.java:423)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.syncDSC(ViewWindow.jav
a:519)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.determineAndAddNewMemb
erJoins(ViewWindow.java:236)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.analyzeMasterChangeVie
w(ViewWindow.java:209)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.analyzeViewChange(View
Window.java:193)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.newViewObserved(ViewWi
ndow.java:101)
at com.sun.enterprise.ee.cms.impl.jxta.ViewWindow.run(ViewWindow.java:85
)
at java.lang.Thread.run(Thread.java:595)



 Comments   
Comment by shreedhar_ganapathy [ 02/Jun/07 ]

User: shreedhar_ganapathy
Date: 2007-06-02 17:33:31+0000
Log:
Fix for Issue 23: NPE in DSC occassionally in syncCache()
GMSContext is not inited. Instead of calling the getGMSContext() method which
does the right thing, the code in question relies on the ctx variable directly.

File Changes:

Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================

File [changed]: DistributedStateCacheImpl.java
Url:
https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/DistributedStateCacheImpl.java?r1=1.13&r2=1.14
Delta lines: +4 -4
-------------------
— DistributedStateCacheImpl.java 2007-05-30 22:46:21+0000 1.13
+++ DistributedStateCacheImpl.java 2007-06-02 17:33:28+0000 1.14
@@ -73,7 +73,7 @@
*

  • @author Shreedhar Ganapathy
  • Date: June 20, 2006
  • * @version $Revision: 1.13 $
    + * @version $Revision: 1.14 $
    */
    public class DistributedStateCacheImpl implements DistributedStateCache { private final ConcurrentHashMap<GMSCacheable, Object> cache = @@ -277,8 +277,8 @@ return retval; }

    else{

  • if(!memberToken.equals(ctx.getServerIdentityToken())){
  • MemberStates state =
    ctx.getGroupCommunicationProvider().getMemberState(memberToken);
    + if(!memberToken.equals(getGMSContext().getServerIdentityToken())){
    + MemberStates state =
    getGMSContext