Issue Details (XML | Word | Printable)

Key: GLASSFISH-16908
Type: Bug Bug
Status: Closed Closed
Resolution: Invalid
Priority: Minor Minor
Assignee: Bobby Bissett
Reporter: vanya_void
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
glassfish

More than 6 instances does not join to GMS group

Created: 24/Jun/11 10:19 AM   Updated: 05/Jul/11 01:15 PM   Resolved: 05/Jul/11 01:15 PM
Component/s: configuration, failover, grizzly-kernel, group_management_service
Affects Version/s: 3.1.1_b08
Fix Version/s: None

Time Tracking:
Not Specified

Environment:

Linux 2.6.18-164.el5PAE RHEL
2xXeon L7555
Red Hat Enterprise Linux Server release 5.4 (Tikanga)


Tags: 3_1_1-scrubbed cluster clustered
Participants: Bobby Bissett and vanya_void


 Description  « Hide

I have 5 node cluster with 2 instances running on each. When im running start-cluster command, only 6 of them, joins to gms group at same time, and information that represented by get-health command is looks like this:

portal-instance1 failed since Fri Jun 24 20:00:57 MSD 2011
portal-instance12 not started
portal-instance2 started since Fri Jun 24 20:01:16 MSD 2011
portal-instance22 started since Fri Jun 24 20:01:16 MSD 2011
portal-instance3 started since Fri Jun 24 20:01:16 MSD 2011
portal-instance32 started since Fri Jun 24 20:01:16 MSD 2011
portal-instance4 not started
portal-instance42 not started
portal-instance5 failed since Thu Jun 23 19:54:10 MSD 2011
portal-instance52 failed since Thu Jun 23 21:04:20 MSD 2011

Maybe you have an information, why this situation may be?

[#|2011-06-24T20:08:39.443+0400|INFO|glassfish3.1|ShoalLogger|_ThreadID=12;_ThreadName=Thread-1;|GMS1092: GMS View Change Received for group: portal-cluster : Members in view for ADD_EVENT(before change analysis) are :
1: MemberId: portal-instance1, MemberType: CORE, Address: 192.168.101.31:9188:228.9.96.158:20796:portal-cluster:portal-instance1
2: MemberId: portal-instance12, MemberType: CORE, Address: 192.168.101.31:9091:228.9.96.158:20796:portal-cluster:portal-instance12
3: MemberId: portal-instance4, MemberType: CORE, Address: 192.168.101.34:9096:228.9.96.158:20796:portal-cluster:portal-instance4
4: MemberId: portal-instance42, MemberType: CORE, Address: 192.168.101.34:9146:228.9.96.158:20796:portal-cluster:portal-instance42
5: MemberId: portal-instance5, MemberType: CORE, Address: 192.168.101.35:9102:228.9.96.158:20796:portal-cluster:portal-instance5
6: MemberId: portal-instance52, MemberType: CORE, Address: 192.168.101.35:9129:228.9.96.158:20796:portal-cluster:portal-instance52



Bobby Bissett added a comment - 24/Jun/11 10:41 AM

So far I don't see any bug here. Clusters are known to work.

Please give the output of asadmin list-instances as well as asadmin get-health. If both commands agree that some instances are stopped, look at the logs to figure out why. If the commands don't agree, follow all the steps of this blog to make sure your network supports what you're doing:

http://blogs.oracle.com/bobby/entry/validating_multicast_transport_where_d

It would be better to discuss this on the users list and then file an issue after we find out there's a bug.


Bobby Bissett added a comment - 05/Jul/11 01:15 PM

Glad you got the network issues figured out.