[SHOAL-81] Propagate Senders HM.Entry seqid in sent HealthMessage Created: 02/Oct/08  Updated: 23/Jun/10  Resolved: 23/Jun/10

Status: Resolved
Project: shoal
Component/s: GMS
Affects Version/s: current
Fix Version/s: 1.1

Type: Bug Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File unexpectedfailure.log    
Issuezilla Id: 81

 Description   

Requires changes in HealthMessage.initialize() and getDocument().

HealthMessage.getDocument() should write the HealthMessage.Entry.sequenceId into
its XML document representation.
HealthMessage.initialize() should read the senders sequence id for health
message.entry from XML document representation.

Currently, the receiver side is just creating a sequence id based on
order of receiving messages. Jxta messaging protocol does not guarantee
that messages are received in precise order that they were sent, so the
current sequencing mechanism could be resulting in out of order processing
of health messages. This could result in incorrect computed cache state for an
instance in the master node.



 Comments   
Comment by Joe Fialli [ 06/Oct/08 ]

Created an attachment (id=12)
server log summarizing out of order message processing

Comment by Joe Fialli [ 06/Oct/08 ]

https://shoal.dev.java.net/nonav/issues/showattachment.cgi/12/unexpectedfailure.log

Following attachment summarizes a failure that occurs due to this defect.
Messages are sent by instance in following order:
aliveandready
clusterstopping
stopping

The DAS (master node) receives the messages in the following order:
stopping (receiving side seqid 960)
clusterstopping (receiving side seqid 961)
aliveandready (receiving side seqid 963)

The DAS processes the message in following order:
clusterstopping (961)
stopping(960)
aliveandready (963)

The aliveandready message being processed last makes a stopped instance
appear to come back to life as far as Master is concerned.
It is then marked as INDOUBT by master and then verified FAILED.
Must correct this ordering issue to fix this.

Comment by Joe Fialli [ 11/Nov/08 ]

Fix delivered. Senders sequence id is now propagated.

Also, use start time of member and sequence id to order messages between
one invocation and a restart invocation of server instance.
(Nodeagent can restart a failed instance quickly so this can happen)

Generated at Thu Dec 08 08:35:06 UTC 2016 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.