[GLASSFISH-14664] ability to configure GMS member to use SSL for p2p communication Created: 12/Nov/10  Updated: 19/Sep/14

Status: In Progress
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1
Fix Version/s: 4.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 14,664
Tags: 3_1-exclude, 3_1_1-scrubbed, 3_1_2-exclude

 Description   

See shoal issue 112 for details
http://java.net/jira/browse/SHOAL-112

Since failover system uses GMS messaging to replicate session data
over GMS over Grizzly TCP transport, it is desirable to have a means
to configure that this data is transferred to other members of the
cluster in a secured transport.



 Comments   
Comment by Joe Fialli [ 12/Nov/10 ]

additional information:

replication of session data only takes place between clustered instances on same
subnet behind a firewall.





[GLASSFISH-14663] capability to configure authentication for GMS members Created: 12/Nov/10  Updated: 19/Sep/14

Status: In Progress
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1
Fix Version/s: 4.1

Type: Improvement Priority: Critical
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 14,663
Tags: 3_1-exclude, 3_1_2-exclude, 3_2-exclude

 Description   

Each clustered application server instance is a GMS Member. Shoal Group Management Service(GMS) allows for group members to dynamically locate each other via common multicast address and port OR via virtual member list of IP Addresses (when not relying on multicast). The goal of addressing this issue is to authenticate that application servers trying to join as GMS members are allowed to join the cluster.

see details at shoal.dev.java.net issue 111
http://java.net/jira/browse/SHOAL-111






[GLASSFISH-4367] Make clustering infrastructure firewall friendly Created: 02/Mar/08  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: v2.1
Fix Version/s: not determined

Type: Improvement Priority: Critical
Reporter: km Assignee: Joe Fialli
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 4,367
Status Whiteboard:

v3-prd-item

Tags: 3_1-exclude

 Description   

See:
http://wiki.glassfish.java.net/Wiki.jsp?page=V3CoreInfrastructureImprovements

CoreInfra-009.

It is OK if a known and fixed number of holes are poked in the firewall.



 Comments   
Comment by km [ 02/Mar/08 ]

...

Comment by ijuma [ 17/Oct/08 ]

Adding myself to cc list.

Comment by kumara [ 01/Sep/09 ]

Changing version from 9.1.1 to v2.1 to reflect new name/version.

Comment by Tom Mueller [ 17/May/10 ]

This is planned for the 3.1 implementation of clustering.

From the req spec:
In other words, remove use of RMI from infrastructure. Consider using Grizzly
in Node Agent. This enables a configuration where admin server is deployed
inside firewall and application server nodes are available outside.

The actual implementation will not use a node agent at all. Synchronization
between the instances and the DAS will use the admin port (HTTP/TCP).

Comment by Tom Mueller [ 02/Jul/10 ]

Synchronization and command replication are now being accomplished via fixed ports
(the admin-listener port), so it is possible to set up instances outside a
firewall and the DAS inside the firewall.

Comment by Tom Mueller [ 02/Jul/10 ]

From Shreedhar:
We may have constraints around this in GMS since DAS participates in the cluster
and potentially with standalone instances and requires multicast.

Comment by Tom Mueller [ 02/Jul/10 ]

Please analyze this issue from the perspective of GMS. Admin is ok for this.

Comment by Joe Fialli [ 07/Jul/10 ]

GMS depends on DAS participating in cluster as Master via udp multicast broadcast.
The DAS as GMS master both broadcasts to members in the cluster and receives
heartbeat broadcasts from the other members of the cluster. Since udp multicast
is typically not enabled through a firewall, this is problematic to have DAS
inside firewall and the app servers outside the firewall.

Would it not be sufficient to have LoadBalancer outside the firewall and have
the app server instances and DAS all behind the firewall.

In order for GMS to work in this environment, it would have to implement virtual
multicast feature that removes GMS reliance on udp multicast broadcast. This
functionality is not scheduled for v3.1.

Comment by Joe Fialli [ 04/Oct/10 ]

Changed target milestone to 3.2 based on the
following past comment.

In order for GMS to work in this environment, it would have to implement virtual
multicast feature that removes GMS reliance on udp multicast broadcast. This
functionality is not scheduled for v3.1.





[GLASSFISH-16415] Administrators shall be able to configure clustered instances to potentially be separated by a firewall. Created: 21/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 4.0
Fix Version/s: future release

Type: New Feature Priority: Critical
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_2prd

 Description   

Support for hybrid cloud (part private and part public cloud).
Default heartbeat failure detection configuration will require adjustment to account for potentially slower network throughput across the firewall.



 Comments   
Comment by shreedhar_ganapathy [ 27/Oct/11 ]

Changed AffectsVersion to 4.0





[GLASSFISH-16413] Administrators shall be able to configure a GMS group discovery mechanism for a site. Created: 21/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 4.0
Fix Version/s: not determined

Type: New Feature Priority: Critical
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_2prd

 Description   

Mechanism is used to enable a GMS cluster when UDP multicast is unavailable between clustered instances.
Provide CLI to install a group discovery service as an OS service at a Well Known Address.
Provide CLI to configure VM template to reference a site-wide group discovery mechanism.
Provide CLI to configure S3-based group discovery.
(See GLASSFISH-3636 for issue that GMS requires UDP multicast.)



 Comments   
Comment by shreedhar_ganapathy [ 27/Oct/11 ]

Changed AffectsVersion to 4.0

Comment by Bobby Bissett [ 07/Dec/11 ]

Moving to Joe since I'm no longer on project.





[GLASSFISH-12194] Monitoring Stats Provider Created: 09/Jun/10  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1
Fix Version/s: future release

Type: New Feature Priority: Major
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 12,194

 Description   

message throughput, thread utilitization, number of detect SUSPECTED, number of
FAILURES



 Comments   
Comment by Joe Fialli [ 18/Aug/10 ]

deferred to ms5

Comment by Joe Fialli [ 15/Sep/10 ]

will implement to be used for development testing.

expose to end users in 3.2.





[GLASSFISH-13056] Add validate-nodes-multicast command Created: 20/Aug/10  Updated: 07/Dec/11

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1
Fix Version/s: future release

Type: New Feature Priority: Major
Reporter: Tom Mueller Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issuezilla Id: 13,056
Tags: 3_1_2-exclude

 Description   

This is a request for adding a validate-nodes-multicast command. This would be
a remote command (running in the DAS) that would use the SSH information in SSH
nodes to start validate-multicast on each node, and then would collect the
output and give a picture of what the multicast situation is for the entire
collection of nodes that are defined for the domain.

For example, if all nodes can communicate with each other via multicast (the
ideal situation for GMS), the output might be:

Multicast Groups:
1: localhost (DAS), n1, n2, n3, n4

However, if we have only n1<->n2 and n3<->n4 communicating, and the DAS can't
multicast to any of them, then the output could be:

Multicast Groups:
1: n1, n2
2: n3, n4

Isolated Nodes:
localhost (DAS)

If we had the situation where multicast doesn't work at all, the output could
be:

Isolated Nodes:
localhost (DAS), n1, n2, n3, n4

This information can currently be derived by running "asadmin validate-
multicast" on all of the nodes, and then analyzing the output. The idea of this
command is to automate the running of the command on all the nodes and to
analyze the output for the user.



 Comments   
Comment by Joe Fialli [ 25/Mar/11 ]

Recommend broadening this command to encompass validate-cluster.

In GlassFish 3.2, there will exist a mode to enable GMS without UDP multicast.
It would be helpful if this command could verify GMS discovery based on current cluster configuration,
independent of whether multicast is enabled or not.

Comment by Bobby Bissett [ 25/Apr/11 ]

The command should definitely use the cluster configuration information in domain.xml, such as multicast address/port, or whatever is being used for non-multicast setups. Based on user feedback, the command should also give some warning about settings that are NOT specified in the config. For instance, if no network adapter is specified for a node, the tool should let the user know that it's not specifying one when run.

Comment by Bobby Bissett [ 07/Dec/11 ]

Moving to Joe since I'm no longer on project.





[GLASSFISH-16420] New GMS configuration info on cluster and group-management-service element in domain.xml Created: 21/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: None
Fix Version/s: future release

Type: New Feature Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

New heartbeat failure detection implementation may need alternative configuration parameters. (given different algorithm) (unknown at this point)
SSL configuration for GMS TCP.
GMS Member authentication.
(Impacts GMSAdapterImpl config processing, asadmin create-cluster subcommand parameters)



 Comments   
Comment by Bobby Bissett [ 07/Dec/11 ]

Moving to Joe since I'm no longer on project.





[GLASSFISH-16419] Virtual multicast optimization to send messages concurrently. Created: 21/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: None
Fix Version/s: future release

Type: New Feature Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

Only reason to wait for completion of send is to be notified of failed delivery.
With Grizzly 2.0 using async send, should be easy to have a nowait mode for delivery.
Point 2 point messges could be sent synchronous and unicast sends that are
part of a broadcast could be sent without waiting for send to complete.






[GLASSFISH-16418] New Heartbeat Failure Detection implementation optimized for non-multicast and no DAS Created: 21/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: None
Fix Version/s: future release

Type: New Feature Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Self-configuring cluster case. (Note: This item's priority should track priority of self-configuring cluster priority.)






[GLASSFISH-16416] heartbeats over UDP unicast Created: 21/Apr/11  Updated: 21/Oct/11

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Bobby Bissett Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

Administrator shall be able to configure heartbeats to be sent over UDP unicast transport when multicast is disabled.






[GLASSFISH-17458] in non-multicast mode, one failed to connect per cluster instance at startup Created: 22/Oct/11  Updated: 07/Mar/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1.2_b05
Fix Version/s: not determined

Type: Bug Priority: Minor
Reporter: zorro Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 2.6.18-164.0.0.0.1.el5



 Description   

Glassfish version 3.1.2 build 5

Lots of the following exception are being seen the the server logs.

Expected: All exceptions to be handled

http://aras2.us.oracle.com:8080/logs/gf31/gms//set_10_20_11_t_11_56_02/scenario_0012_Thu_Oct_20_12_00_37_PDT_2011.html

11-10-20T18:56:57.888+0000|INFO|glassfish3.1.2|ShoalLogger.nomcast|_ThreadID=84;_ThreadName=Thread-2;|failed to send message to a virtual multicast endpoint[10.133.184.137:9090:230.30.1.1:9090:clusterz1:Unknown_10.133.184.137_9090] message=[MessageImpl[v1:MASTER_NODE_MESSAGE: NAD, Target: 10.133.184.137:9090:230.30.1.1:9090:clusterz1:Unknown_10.133.184.137_9090 , Source: 10.133.184.207:9090:230.30.1.1:9090:clusterz1:server, MQ, ]
java.io.IOException: failed to connect to 10.133.184.137:9090:230.30.1.1:9090:clusterz1:Unknown_10.133.184.137_9090
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper.send(GrizzlyTCPConnectorWrapper.java:132)
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper.doSend(GrizzlyTCPConnectorWrapper.java:96)
at com.sun.enterprise.mgmt.transport.AbstractMessageSender.send(AbstractMessageSender.java:74)
at com.sun.enterprise.mgmt.transport.VirtualMulticastSender.doBroadcast(VirtualMulticastSender.java:134)
at com.sun.enterprise.mgmt.transport.AbstractMulticastMessageSender.broadcast(AbstractMulticastMessageSender.java:70)
at com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.broadcast(GrizzlyNetworkManager.java:295)
at com.sun.enterprise.mgmt.MasterNode.send(MasterNode.java:1338)
at com.sun.enterprise.mgmt.MasterNode.discoverMaster(MasterNode.java:382)
at com.sun.enterprise.mgmt.MasterNode.startMasterNodeDiscovery(MasterNode.java:1235)
at com.sun.enterprise.mgmt.MasterNode.run(MasterNode.java:1204)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at com.sun.grizzly.TCPConnectorHandler.finishConnect(TCPConnectorHandler.java:297)
at com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.finishConnect(CacheableConnectorHandler.java:230)
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper$CloseControlCallbackHandler.onConnect(GrizzlyTCPConnectorWrapper.java:185)
at com.sun.grizzly.CallbackHandlerContextTask.doCall(CallbackHandlerContextTask.java:70)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
... 1 more

#]


 Comments   
Comment by Joe Fialli [ 24/Oct/11 ]

This is only occuring in Glassfish Shoal QE test when running with GMS_DISCOVERY_URI_LIST set to
a list on instances that have not been created (or started yet) and the DAS initially joins the cluster.

There is only one exception per cluster member listed in GMS_DISCOVERY_URI_LIST.
For the test case this is reported against, there are 9 instances, so there are nine connection
failed when DAS joins cluster initially and those instances have yet to been created and started.
When DAS first joins cluster and no instance has even been created yet,
the DISCOVERY_URI_LIST contains connection info to yet to be created instances.

We will demote the failed connections during discovery from WARNING to FINE, this
will enable us to debug network configuration issues (such as firewalls) without
the nusance of always seeing one failure per cluster member referenced in GMS_DISCOVERY_URI_LIST.

Note: this issue does not apply to GMS_DISCOVERY_URI_LIST set to "generate" or to group discovery
via UDP multicast.

Comment by Tom Mueller [ 07/Mar/12 ]

Bulk update to set Fix Version to "not determined" for issues that had it set to a version that has already been released.





[GLASSFISH-16568] GMS can select incorrect network interface when a Virtual Machine created bridge n/w interface (virbr0) exists Created: 06/May/11  Updated: 14/Oct/11

Status: In Progress
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1.1_b04
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: varunrupela Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux [FQDN removed] 2.6.18-164.el5 #1 SMP Thu Sep 3 04:15:13 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Virtual Machine has created a network interface virbr0 on each of 3 machines.
java.net.NetworkInterface.getNetworkInterfaces() is returning virbr0 as first interface.
This interface was not working for TCP point to point messaging in GMS.

Here is network interface config from ifconfig -a.
The virbr0 configuration (same on all 3 machines so the IP address not being unique is a big problem)

eth0 Link encap:Ethernet HWaddr 00:16:36:FF:D5:C8
inet addr:10.12.153.53 Bcast:10.12.153.255 Mask:255.255.255.0
inet6 addr: fe80::216:36ff:feff:d5c8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:163499623 errors:0 dropped:0 overruns:0 frame:0
TX packets:164695644 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:40318720962 (37.5 GiB) TX bytes:68091586600 (63.4 GiB)
Interrupt:66 Memory:fdff0000-fe000000

<deleted eth1 - eth3, none were UP>
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:276021187 errors:0 dropped:0 overruns:0 frame:0
TX packets:276021187 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:47722695747 (44.4 GiB) TX bytes:47722695747 (44.4 GiB)

sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:137 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:40467 (39.5 KiB)


Attachments: Zip Archive fine-shoal-logs.zip     Zip Archive logs.zip    
Issue Links:
Dependency
blocks GLASSFISH-15425 [STRESS][umbrella] 24x7 RichAccess ru... Open
Related
is related to GLASSFISH-16570 [regression w.r.t 3.1] Classloading i... Resolved
is related to GLASSFISH-16631 resource bundle resolution failing in... Resolved
Tags: 3_1-next, 3_1_1-scrubbed

 Description   

Please see the parent bug http://java.net/jira/browse/GLASSFISH-15425 for scenario details.

On running the RichAccess Big App test the instance logs are observed to be filled with Grizzly and Shoal logger messages of the following type:

******
[#|2011-05-06T11:26:53.439+0530|SEVERE|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=29;_ThreadName=Thread-1;|Connection refused|#]

[#|2011-05-06T11:26:53.445+0530|SEVERE|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=30;_ThreadName=Thread-1;|Connection refused|#]

[#|2011-05-06T11:26:53.447+0530|WARNING|glassfish3.1|ShoalLogger|_ThreadID=31;_ThreadName=Thread-1;|Error during groupHandle.sendMessage(instance103, /richAccess; size=30672|#]

*******

  • No http failures were on observed on the client.
  • 2 sets of logs are being attached. 1 with Shoal logger set to fine and 1 without. Unzip and look under "logs/st-cluster" for the instance logs.
  • This issue appears with both Sun JDK and JRockit JDK


 Comments   
Comment by varunrupela [ 12/May/11 ]

Marked the issue as blocking. Its hard to analyze the logs and extract information useful to debug the run.

Comment by Joe Fialli [ 13/May/11 ]

Perhaps there is firewall configuration preventing connections. GMS is using UDP multicast to find all instances and communicate
GMS notifications. All of that is working just fine.
However, I have not seen any TCP connections succeed.
The gms send messages that are failing are over TCP and they are HA replication sends.

Below is a pure Grizzly connection that that does not have anything to do with GMS. GMS names all of its threads with "gms" in them (even the thread pool given to Grizzly to run gms handlers uses "gms" in the thread name.

There are 1574 of the following failures.

server.log:[#|2011-05-06T11:28:11.435+0530|SEVERE|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|
_ThreadID=40;_ThreadName=Thread-1;|Connection refused|#]

Such a failure looks similar to what would happen if a firewall was blocking ports. It is suspcious that the instances
are all running on same machine and the connections are failing. Thus, firewall is probably blocking intermachine TCP communication.

*************

It was not stated but I have observed that 3 instances and das are all running on one machine.
I have not observed it yet, but if there is not sufficient memory on the machine, the instances
could start running out of memory. My past experience with rich access not all instances where run on one machine.

[#|2011-05-06T11:20:37.552+0530|INFO|glassfish3.1|ShoalLogger|_ThreadID=16;_ThreadName=Thread-1;|GMS1092: GMS View Change Received for group: st-cluster : Members in view for JOINED_AND_READY_EVENT(before change analysis) are :
1: MemberId: instance101, MemberType: CORE, Address: 192.168.122.1:9186:228.9.30.160:9176:st-cluster:instance101
2: MemberId: instance102, MemberType: CORE, Address: 192.168.122.1:9163:228.9.30.160:9176:st-cluster:instance102
3: MemberId: instance103, MemberType: CORE, Address: 192.168.122.1:9091:228.9.30.160:9176:st-cluster:instance103
4: MemberId: server, MemberType: SPECTATOR, Address: 192.168.122.1:9114:228.9.30.160:9176:st-cluster:server

#]

More analysis to come. Just wanted to pass this along.

Comment by Joe Fialli [ 13/May/11 ]

The tcp ports that GMS needs to not be blocked by a firewall are between 9090 and 9200.
For the above run, the ports used were randomly selected from the above range and are
9186, 9163, 9091 ad 9114. The next run will have different ports so unblocking
the tcp port range from 9090 to 9200 is necessary.

Comment by Joe Fialli [ 13/May/11 ]

There would be no log message printed out for this gms send message failure since the logging is set to FINE.
However, the lookup of the Logger is failing. (something different in jrockit environment is causing this.)

The following stack trace is occurring trying to get a resource bundle for a java.util.Logger.
The call is java.util.Logger.getLogger("ShoalLogger.send", "com.sun.enterprise.ee.cms.logging.LogStrings");

The resource bundle in question is com.sun.enterprise.ee.cms.logging.LogStrings.properties.

The error message is incorrectly stating that it is looking for a class called com.sun.enterprise.ee.cms.logging.LogStrings.
No such class exists. We need assistance from someone with class loading/resource bundle knowledge to find out why this is going
wrong in jrockit environment.

[#|2011-05-06T11:26:53.369+0530|WARNING|glassfish3.1|javax.enterprise.system.core.classloading.com.sun.enterprise.loader|_ThreadID=28;_ThreadName=Thread-1;|LDR5207: ASURLClassLoader EarLibClassLoader :
doneCalled = true
doneSnapshot = ASURLClassLoader.done() called ON EarLibClassLoader :
urlSet = []
doneCalled = false
Parent -> org.glassfish.internal.api.DelegatingClassLoader@392aa3fb

AT Fri May 06 11:26:28 IST 2011
BY :java.lang.Throwable: printStackTraceToString
at com.sun.enterprise.util.Print.printStackTraceToString(Print.java:639)
at com.sun.enterprise.loader.ASURLClassLoader.done(ASURLClassLoader.java:211)
at com.sun.enterprise.loader.ASURLClassLoader.preDestroy(ASURLClassLoader.java:179)
at org.glassfish.javaee.full.deployment.EarClassLoader.preDestroy(EarClassLoader.java:114)
at org.glassfish.internal.data.ApplicationInfo.unload(ApplicationInfo.java:358)
at com.sun.enterprise.v3.server.ApplicationLifecycle.unload(ApplicationLifecycle.java:999)
at com.sun.enterprise.v3.server.ApplicationLifecycle.disable(ApplicationLifecycle.java:1970)
at com.sun.enterprise.v3.server.ApplicationConfigListener.disableApplication(ApplicationConfigListener.java:278)
at com.sun.enterprise.v3.server.ApplicationConfigListener.handleOtherAppConfigChanges(ApplicationConfigListener.java:198)
at com.sun.enterprise.v3.server.ApplicationConfigListener.transactionCommited(ApplicationConfigListener.java:146)
at org.jvnet.hk2.config.Transactions$TransactionListenerJob.process(Transactions.java:344)
at org.jvnet.hk2.config.Transactions$TransactionListenerJob.process(Transactions.java:335)
at org.jvnet.hk2.config.Transactions$ListenerNotifier$1.call(Transactions.java:211)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at org.jvnet.hk2.config.Transactions$Notifier$1$1.run(Transactions.java:165)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Parent -> org.glassfish.internal.api.DelegatingClassLoader@392aa3fb
was requested to find class com.sun.enterprise.ee.cms.logging.LogStrings after done was invoked from the following stack trace
java.lang.Throwable
at com.sun.enterprise.loader.ASURLClassLoader.findClassData(ASURLClassLoader.java:780)
at com.sun.enterprise.loader.ASURLClassLoader.findClass(ASURLClassLoader.java:696)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:296)
at java.lang.ClassLoader.loadClass(ClassLoader.java:296)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2289)
at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1364)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1328)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1282)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1282)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1224)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:952)
at java.util.logging.Logger.findResourceBundle(Logger.java:1280)
at java.util.logging.Logger.setupResourceInfo(Logger.java:1335)
at java.util.logging.Logger.getLogger(Logger.java:335)
at com.sun.enterprise.ee.cms.logging.GMSLogDomain.getSendLogger(GMSLogDomain.java:87)
at com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.logSendMessageException(GroupCommunicationProviderImpl.java:395)
at com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:366)
at com.sun.enterprise.ee.cms.impl.base.GroupHandleImpl.sendMessage(GroupHandleImpl.java:142)
at org.shoal.ha.group.gms.GroupServiceProvider.sendMessage(GroupServiceProvider.java:257)
at org.shoal.ha.cache.impl.interceptor.TransmitInterceptor.onTransmit(TransmitInterceptor.java:83)
at org.shoal.ha.cache.api.AbstractCommandInterceptor.onTransmit(AbstractCommandInterceptor.java:98)
:doneCalled = true
doneSnapshot = ASURLClassLoader.done() called ON EarLibClassLoader :
urlSet = []
doneCalled = false
Parent -> org.glassfish.internal.api.DelegatingClassLoader@392aa3fb

Comment by Joe Fialli [ 13/May/11 ]

The submitted fine server logging did not have any ShoalLogger of FINE level.
Only had FINE logging for org.shoal.ha.

To enable GMS ShoalLogger, one needs to specify ShoalLogger with FINE level.
(Shoal GMS does not use org.shoal.gms* for logger name, but still uses ShoalLogger.)

Since the connection refused is in grizzly, it might be of more use to
set grizzly logging to FINE if it turns out that there is no firewall protection
blocking GMS tcp communications on ports between 9090 and 9200. (the default
gms port ranges. User can override these defaults if necessary.)

[#|2011-05-06T12:48:40.087+0530|SEVERE|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=51;_ThreadName=Thread-1;|Connection refused|#]

So enabling logging of com.sun.grizzly to FINE my help find out why the Connection was refused.

Comment by varunrupela [ 16/May/11 ]

Clarification regarding the setup:

  • Multiple network interfaces are enabled on all the 3 machines on this setup and GMS on each seems to bind to the virtual n/w interface 192.168.122.1.
Comment by Joe Fialli [ 17/May/11 ]

Please follow documentation to configure gms to bind to a specific network interface.

http://download.oracle.com/docs/cd/E18930_01/html/821-2426/gjfnl.html#gjdlw

Also, recommend running "asadmin validate-multicast -bindaddress X.X.X.X" on all three machines
to double check that UDP multicast traffic is working properly on whatever subnet that you select.

Comment by Joe Fialli [ 17/May/11 ]

Removed blocking and regression from subject line and changed subject line to match what the issue was discovered to be.
This was not a regression, same issue would exist in 3.1 as 3.1.1. No changes were made in 3.1.1 that caused
this. There was a change in the configured environment that caused this issue to surface.

Simple workaround is to disable or down the virbr0 network interface that were not being used.

The following error messages were being repeated many times in server.log file.

[#|2011-05-06T11:26:53.445+0530|SEVERE|glassfish3.1|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=30;_ThreadName=Thread-1;|Connection refused|#]

[#|2011-05-06T11:26:53.447+0530|WARNING|glassfish3.1|ShoalLogger|_ThreadID=31;_ThreadName=Thread-1;|Error during groupHandle.sendMessage(instance103, /richAccess; size=30672|#]

We will add to the GMS log event message above with the IP address trying to be sent to assist in diagnosing this problem in future.

java.net.NetworkInterface.getNetworkInterfaces() was returning the virbr0 network interface as first interface
and that resulted in this issue. To resolve this issue, GMS will default initially to
network interface associated with InetAddress.getLocalHost(), (as long as that n/w interface is multicast enabled
and not a loopback address and UP.) This default would have avoided the reported issue.

When there are multiple n/w interfaces on a machine and the default is not the one desired to use for GMS,
the following documentation should be followed to configure GMS to use a specific n/w interface on each machine.

http://download.oracle.com/docs/cd/E18930_01/html/821-2426/gjfnl.html#gjdlw

Comment by Joe Fialli [ 17/May/11 ]

Also, add a configuration message to show the localPeerID and GMS system advertisement being sent to other machines to dynamically form the GMS group (glassfish cluster). This configuration message will show what IP address that GMS is telling other members of the cluster to contact it at.

Comment by Joe Fialli [ 13/Jun/11 ]

was unable to identify the non-functional virtual network interface using any of the java.network.NetworkInterface
methods. recommend postponing attempting to fix this issue in 3.1.1 time frame since changing the algorithm
on selecting the first network address could potentially introduce a regression for a previously
working existing network configuration. There is no means to just correct this issue without changing
how first network address is selected.

workaround did exist for this issue. simply disabled the virtual network interface that was not
being used.

Comment by Joe Fialli [ 14/Oct/11 ]

lowered priority to fix to minor since there is a workaround. Additionally this problem
is only the result of having a virtual network interface that was created by virtualbox but
was not being used. simply disabling the virb0 network interface that was not being used fixed
the problem. At this time, my recommendation is to document the issue and its workaround in release notes.





[GLASSFISH-17016] Inconsistency between validate-multicast and GMS picking interface for binding Created: 12/Jul/11  Updated: 07/Dec/11

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: arungupta Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

In Ubuntu 11.04, with eth0 disabled and no wireless connectivity ifconfig reports:

arun@ArunUbuntu:~/tools/glassfish-web$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:26:b9:f1:15:19
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:5698 errors:0 dropped:0 overruns:0 frame:0
TX packets:4575 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4717594 (4.7 MB) TX bytes:1129576 (1.1 MB)
Interrupt:20 Memory:f6900000-f6920000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:192770 errors:0 dropped:0 overruns:0 frame:0
TX packets:192770 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:197208585 (197.2 MB) TX bytes:197208585 (197.2 MB)

Explicitly enabled MULTICAST on lo as:

sudo ifconfig lo multicast

and then got:

arun@ArunUbuntu:~/tools/glassfish-web$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:26:b9:f1:15:19
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:5698 errors:0 dropped:0 overruns:0 frame:0
TX packets:4575 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4717594 (4.7 MB) TX bytes:1129576 (1.1 MB)
Interrupt:20 Memory:f6900000-f6920000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MULTICAST MTU:16436 Metric:1
RX packets:192914 errors:0 dropped:0 overruns:0 frame:0
TX packets:192914 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:197220833 (197.2 MB) TX bytes:197220833 (197.2 MB)

Explicitly added route as:

sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev lo

and then saw:

arun@ArunUbuntu:~/tools/glassfish-web$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.151.0.0 0.0.0.0 255.255.224.0 U 2 0 0 wlan0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlan0
224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 lo
0.0.0.0 10.151.0.1 0.0.0.0 UG 0 0 0 wlan0

Running validate-multicast command in two separate shells show:

arun@ArunUbuntu:~/tools/glassfish-web$ ./glassfish3/bin/asadmin validate-multicastWill use port 2048
Will use address 228.9.3.1
Will use bind interface null
Will use wait period 2,000 (in milliseconds)

Listening for data...
Sending message with content "ArunUbuntu" every 2,000 milliseconds
Received data from ArunUbuntu (loopback)
Received data from ArunUbuntu
Exiting after 20 seconds. To change this timeout, use the --timeout command line option.
Command validate-multicast executed successfully.

Creating a cluster with 2 instances and starting it shows the following log message:

Caused by: com.sun.enterprise.ee.cms.core.GMSException: initialization failure
at com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:142)
at com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.initializeGroupCommunicationProvider(GroupCom
municationProviderImpl.java:164)
at com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.join(GMSContextImpl.java:176)
... 22 more
Caused by: java.io.IOException: can not find a first InetAddress
at com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.start(GrizzlyNetworkManager.java:376)
at com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:140)
... 24 more

Even though validate-multicast is working the instances are not able to join the cluster.

Here is what Joe mentioned in an email thread:

– cut here –
Validate-multicast is not using NetworkUtility.getFirstInetAddress(false).
validate-multicast is not specifying any IP address by default when creating the multicast socket.
Just to remind you, validate-multicast is only creating a MulticastSocket and only communicating
over UDP. While the getFirstInetAddress(false) is being used to compute the IP address that
another instance can communicate via TCP to an instance. That is totally different.
We are trying to use same IP address for both TCP and UDP in GMS. We need to revisit
this logic. We will need to remove the check for multicast enabled in selecting network interface
now since we are working on supporting non-multicast mode.
– cut here –

Explicitly setting GMS_BIND_INTERFACE_ADDRESS-c1 property to "127.0.0.1" in each instance and DAS and then restarting the DAS and cluster makes sure the instances can join the cluster.



 Comments   
Comment by Bobby Bissett [ 20/Oct/11 ]

Assigning to me.

Comment by Bobby Bissett [ 07/Dec/11 ]

Moving to Joe (hi) since I'm not on the GF project any more. The work for this is mostly done, and Joe knows what change to make in the mcast sender thread so it mirrors what GMS proper is doing.





[GLASSFISH-17798] get-health always say instance as not started Created: 22/Nov/11  Updated: 17/Oct/12

Status: In Progress
Project: glassfish
Component/s: group_management_service
Affects Version/s: 4.0
Fix Version/s: not determined

Type: Bug Priority: Minor
Reporter: Anissa Lam Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File clustersetup.sh     Text File server.log    
Tags: 3_1_2-exclude, 3_1_x-exclude

 Description   

This is on latest workspace,rev# 51051 on the 3.1.2 branch.
Tried several times, and always reproducible.
I created a cluster (clusterABC) with 4 instances, all using the localhost-domain1 node.
I can start the instances, but get-health always says they are not started.

Here is the copy&paste of my commands. I will attach server.log as well.

~/Awork/V3/3.1.2/3.1.2 1)  cd $AS3/bin
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 2)  asadmin list-clusters
clusterABC not running
Command list-clusters executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 3)  asadmin list-instances --long
NAME   HOST       PORT   PID  CLUSTER     STATE         
ABC-4  localhost  24848  --   clusterABC   not running  
ABC-3  localhost  24849  --   clusterABC   not running  
ABC-2  localhost  24850  --   clusterABC   not running  
ABC-1  localhost  24851  --   clusterABC   not running  
Command list-instances executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 4)  asadmin start-instance ABC-1
Waiting for ABC-1 to start ..........
Successfully started the instance: ABC-1
instance Location: /Users/anilam/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/nodes/localhost-domain1/ABC-1
Log File: /Users/anilam/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/nodes/localhost-domain1/ABC-1/logs/server.log
Admin Port: 24851
Command start-local-instance executed successfully.
The instance, ABC-1, was started on host localhost
Command start-instance executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 5)  asadmin start-instance ABC-2
Waiting for ABC-2 to start ..........
Successfully started the instance: ABC-2
instance Location: /Users/anilam/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/nodes/localhost-domain1/ABC-2
Log File: /Users/anilam/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/nodes/localhost-domain1/ABC-2/logs/server.log
Admin Port: 24850
Command start-local-instance executed successfully.
The instance, ABC-2, was started on host localhost
Command start-instance executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 6)  asadmin list-instances --long
NAME   HOST       PORT   PID    CLUSTER     STATE         
ABC-4  localhost  24848  --     clusterABC   not running  
ABC-3  localhost  24849  --     clusterABC   not running  
ABC-2  localhost  24850  12517  clusterABC   running      
ABC-1  localhost  24851  12507  clusterABC   running      
Command list-instances executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 7)  asadmin get-health clusterABC
ABC-1 not started
ABC-2 not started
ABC-3 not started
ABC-4 not started
Command get-health executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 8)  asadmin start-cluster clusterABC
Command start-cluster executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 9)  asadmin list-instances --long
NAME   HOST       PORT   PID    CLUSTER     STATE     
ABC-4  localhost  24848  12540  clusterABC   running  
ABC-3  localhost  24849  12541  clusterABC   running  
ABC-2  localhost  24850  12517  clusterABC   running  
ABC-1  localhost  24851  12507  clusterABC   running  
Command list-instances executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 10)  asadmin get-health clusterABC
ABC-1 not started
ABC-2 not started
ABC-3 not started
ABC-4 not started
Command get-health executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 11)  



 Comments   
Comment by Joe Fialli [ 22/Nov/11 ]

Unable to recreate reported issue with build 51075.
Attached a shell script called clustersetup.sh to standardize HOW the cluster and instances are created.
(must configure GF_HOME to point to a valid 3.1.2 installation)
My results of running the script are counter to reported issue.

$GF_HOME/bin/asadmin list-instances
instance01 running
instance02 running
instance03 running
Command list-instances executed successfully.
$GF_HOME/bin/asadmin get-health myCluster
instance01 started since Tue Nov 22 11:38:25 EST 2011
instance02 started since Tue Nov 22 11:38:25 EST 2011
instance03 started since Tue Nov 22 11:38:25 EST 2011
Command get-health executed successfully.

***********
Analysis:

there is no evidence that multicast is working from the submitted DAS server.log.
Is it possible that this was attempted while connected with VPN?
VPN will interfer with multicast working.

Please submit output of "ifconfig -a" and also follow HA admin guide instructions for validating
that multicast is working properly for your system.
http://download.oracle.com/docs/cd/E18930_01/html/821-2426/gjfnl.html#gklhd
The instructions assume two different machines but you can check if multicast is working between processes
on same machine by opening two terminal windows on same machine.
Note that multicast does not work when one is connected via VPN.
(it disables multicast as a protection mechanism).

Specifying bindinterfaceaddress of 127.0.0.1 allows one to work with clusters on one machine while
connected via VPN.

Comment by Joe Fialli [ 22/Nov/11 ]

The attached shell script creates a domain, a cluster and 3 instances for the cluster and starts
up the cluster. Validates that cluster started using "asadmin get-health" and "asadmin list-instances".
User must edit script variable GF_HOME to point to a valid GF v3.1.2 installation.

Comment by Anissa Lam [ 22/Nov/11 ]

Yes, I saw the issue when I was working from home and using VPN.
So, is this a known issue that get-health will NOT provide a correct state of the instance when it is on VPN ?

I think that since there is no way to fix the code if one is on VPN, then even though you cannot gives the exact state like 'FAILED', 'STOPPED' and the timestamp, it should at least report the correct status. It shouldn't just say 'not started', instead, it should at least report the instance is running or not. Can the code detect that multicast is not working and code it like list-instances to find out the status of the instance and return that ?

Console is displaying whatever get-health returns, and telling user that the instance is 'not running' when it actually is doesn't sound acceptable. Especially when the Status from list-instance is displayed on the same screen, that says 'RUNNING', and the next line says 'not running' giving conflicting information.

Comment by Joe Fialli [ 22/Nov/11 ]

get-health reports the status of GMS.
GMS in multicast mode (the default) only works when multicast is working.

Please see bobby's blog, you are misinterpreting results.
asadmin get-health only works correctly when GMS is working correctly.
(asadmin get-health is a GMS client and it can only work as well as GMS subsystem is working)

http://blogs.oracle.com/bobby/entry/validating_multicast_transport_where_d

Comment by Anissa Lam [ 22/Nov/11 ]

As a user, when i am seeing the following:

~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 8) asadmin start-cluster clusterABC
Command start-cluster executed successfully.

~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 9) asadmin list-instances --long
NAME HOST PORT PID CLUSTER STATE
ABC-4 localhost 24848 12540 clusterABC running
ABC-3 localhost 24849 12541 clusterABC running
ABC-2 localhost 24850 12517 clusterABC running
ABC-1 localhost 24851 12507 clusterABC running
Command list-instances executed successfully.

~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 10) asadmin get-health clusterABC
ABC-1 not started
ABC-2 not started
ABC-3 not started
ABC-4 not started
Command get-health executed successfully.
~/Awork/V3/3.1.2/3.1.2/dist-gf/glassfish/bin 11)

I can only say that 'get-health' is giving me the wrong information. The server instance is obviously running, why 'get-health' says it is not started ?
If there is any issue that prevents "get-health" to give the correct information, then it should return an error informing the user what the problem is. Giving the wrong info and says executed successfully is not acceptable.

Comment by Joe Fialli [ 22/Nov/11 ]

reduced priority from critical to minor.

My recommendation is to change "not started" to "unknown".
The asadmin get-health command tells the state of the cluster
from the GMS point of view. If multicast is not working properly
and cluster is not forming properly, that is what the command should relay.

Comment by Bobby Bissett [ 23/Nov/11 ]

"I can only say that 'get-health' is giving me the wrong information. The server instance is obviously running, why 'get-health' says it is not started ?
If there is any issue that prevents "get-health" to give the correct information, then it should return an error informing the user what the problem is. Giving the wrong info and says executed successfully is not acceptable."

That's the way it is. The whole POINT of get-health is to tell you the state of the cluster. If the instances are up, but can't communicate, then there's a serious problem and the only way the user will know it is by running get-health and seeing the wrong result. This is all documented.

In the admin console, you can say whatever you want. The enum name is "NOT_RUNNING" but you can say whatever you want.

Comment by Bobby Bissett [ 23/Nov/11 ]

When the admin console gets the output from the get health command, it's getting the enum name from this enumeration:

// NOT_RUNNING means there is no time information associated
public static enum STATE {
NOT_RUNNING (strings.getString("state.not_running")),
RUNNING (strings.getString("state.running")),
REJOINED (strings.getString("state.rejoined")),
FAILURE (strings.getString("state.failure")),
SHUTDOWN (strings.getString("state.shutdown"));

private final String stringVal;

STATE(String stringVal)

{ this.stringVal = stringVal; }

@Override
public String toString()

{ return stringVal; }

}

There is no point in changing the name of the state in the enum, it's separate from the i18n'ed value that is presented to the user. So when the admin console sees that state, it can output anything you want. Are you using the LocalStrings.properties file in the gms-bootstrap module to get the actual text to use? If so, we can change that to say "not joined" instead. Otherwise, this issue doesn't really affect gms since you can use whatever text you want.

Just wanted to check to see if you're using our props file or your own for the text the user sees.

Comment by Anissa Lam [ 23/Nov/11 ]

I get it now.
I feel that it will be very nice if user can perform validate-multicast on the console.
Will it be possible to make validate-multicast a remote command so that console can call that ? Or its too much to ask for 3.1.2 ?
thanks Joe and Bobby for helping me to understand this.

Comment by Bobby Bissett [ 23/Nov/11 ]

Nope, validate-multicast has to be a local command only because it needs to be run on each machine that will host an instance. In fact, it's better if the server is not up when the command is run. If you're bored, you can watch a screen cast with the details

http://www.youtube.com/watch?v=sJTDao9OpWA

There is an RFE for a tool that's more centralized, which I think fits what you're looking for. It won't happen for 3.1.2, but it's possible it could happen later: GLASSFISH-13056

Comment by Joe Fialli [ 23/Nov/11 ]

Too big a change for 3.1.2 release to change the output of asadmin get-health that
is documented in asadmin get-health --help documentation.

Recommend considering fixing this in a major release.

We could release note in 3.1.2 that "asadmin get-health" "not started" status applies
to both the instance not running or the instance is running but the current configuration
is not allowing GMS communications. (could be multicast is not enabled properly or
non-multicast GMS mode is misconfigured.)

Comment by Joe Fialli [ 23/Nov/11 ]

Exclude changing asadmin get-health output in a minor release.





[GLASSFISH-18047] specifying a network interface name for gms-bind-interface-address does not work correctly on Linux or Windows Created: 19/Dec/11  Updated: 04/Jan/12

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1.2_b14
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Joe Fialli Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Initially discovered on Linux 2.6.18-164.0.0.0.1.el5 #1 SMP Thu Sep 3 00:21:28 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Confirmed also to occur on Windows XP
Did not occur on dual stack Mac OS X 10.6.8 or IPv4 only Solaris 5.10.


Issue Links:
Dependency
blocks GLASSFISH-18024 virtual network interfaces introduced... Resolved
Tags: 3_1_2-exclude

 Description   

Specifying network interface "eth0" on linux OS is not working correctly. (confirmed same failure on Windows)

Specified this issue as minor since documentation does not state that it is valid to specify a network interface
name for gms-bind-interface-address. This capability was added to assist in machine network configuration setups
where some machines are multihomed and we were not consistently selecting appropriate network interface on all machines
in cluster. Specifying the network interface for the cluster to use bypasses the automated selection of the first network address to use.

The binding address returned by InetAddress.getByName() is returning "eth0/127.0.0.1".
The loopback interface is not appropriate for GMS inter-machine commmunications (only
when all instances are on one machine, only used for development.)

com.sun.enterprise.mgmt.transport.NetworkUtility identifies this issue exists.

%java -classpath shoal-gms-impl.jar com.sun.enterprise.mgmt.transport.NetworkUtility

Display name: eth0
Name: eth0
PreferIPv6Addresses: false
InetAddress: /fe80:0:0:0:223:8bff:fe64:7a56%7
InetAddress: /10.133.184.160
Up? true
Loopback? false
PointToPoint? false
Supports multicast? true
Virtual? false
Hardware address: [0, 35, -117, 100, 122, 86]
MTU: 1500
Network Inet Address (preferIPV6=false) /10.133.184.160
Network Inet Address (preferIPV6=true) /fe80:0:0:0:223:8bff:fe64:7a56%7
resolveBindInterfaceName(eth0)=127.0.0.1 /* this value should be 10.133.184.160 */

This issue did not occur on Mac or Solaris.



 Comments   
Comment by Joe Fialli [ 19/Dec/11 ]

A fix is completed for this issue.

Here are network utility results with fix.

**************************************************
Display name: eth0
Name: eth0
PreferIPv6Addresses: false
InetAddress: /fe80:0:0:0:223:8bff:fe64:7ac4%2
InetAddress: /10.133.184.158
Up? true
Loopback? false
PointToPoint? false
Supports multicast? true
Virtual? false
Hardware address: [0, 35, -117, 100, 122, -60]
MTU: 1500
Network Inet Address (preferIPV6=false) /10.133.184.158
Network Inet Address (preferIPV6=true) /fe80:0:0:0:223:8bff:fe64:7ac4%2
Dec 19, 2011 8:22:23 AM com.sun.enterprise.mgmt.transport.NetworkUtility resolveBindInterfaceName
INFO: Inet4Address.getByName(eth0) returned a local address eth0/127.0.0.1 so ignoring it
Dec 19, 2011 8:22:23 AM com.sun.enterprise.mgmt.transport.NetworkUtility resolveBindInterfaceName
INFO: Inet6Address.getByName(eth0) returned a local address eth0/127.0.0.1 so ignoring it
resolveBindInterfaceName(eth0)=10.133.184.158

The INFO message confirming the fix will be deleted before put back.

Comment by Joe Fialli [ 19/Dec/11 ]

The issue that is blocked required to specify gms-bind-interface-address
as network interface due to some machines in cluster having virtual software XEN
creating virtual network interfaces that are interfering with the automated selection
of an IP address to represent a machine.

Comment by Joe Fialli [ 04/Jan/12 ]

Did not feel comfortable including this fix at late stages of 3.1.2.
This functionality is not explicitly documented and this method was suggested as
an easier configuration alternative than what is documented.

Here is link to documented way to specify which network interface on
a multi-home machine to use for GMS.
Link: http://docs.oracle.com/cd/E18930_01/html/821-2426/gjfnl.html#gjdlw





[GLASSFISH-17195] GMS fails to initialize due to GMSException: can not find a first InetAddress Created: 16/Aug/11  Updated: 18/Aug/11

Status: Open
Project: glassfish
Component/s: group_management_service
Affects Version/s: 3.1.1
Fix Version/s: None

Type: Bug Priority: Trivial
Reporter: arungupta Assignee: Joe Fialli
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 7, JDK 7, Wireless Network Interface, GlassFish 3.1.1


Attachments: File ipconfig.out     File ListNetsEx.out     Zip Archive log_2011-08-18_07-41-35.zip    

 Description   

Created a 2-instance cluster using GlassFish 3.1.1 on Windows 7/JDK7 and starting the cluster/instances gives the following error:

[#|2011-08-16T10:59:17.116-0700|CONFIG|glassfish3.1.1|ShoalLogger|_ThreadID=1;_ThreadName=Thread-2;|
GrizzlyNetworkManager Configuration
BIND_INTERFACE_ADDRESS:null NetworkInterfaceName:null
TCPSTARTPORT..TCPENDPORT:9090..9200
MULTICAST_ADDRESS:MULTICAST_PORT:228.9.143.78:9635 MULTICAST_PACKET_SIZE:65536 MULTICAST_TIME_TO_LIV
E: default
FAILURE_DETECT_TCP_RETRANSMIT_TIMEOUT(ms):10000
ThreadPool CORE_POOLSIZE:20 MAX_POOLSIZE:50 POOL_QUEUE_SIZE:4096 KEEP_ALIVE_TIME(ms):60000
HIGH_WATER_MARK:1024 NUMBER_TO_RECLAIM:10 MAX_PARALLEL:15
START_TIMEOUT(ms):15000 WRITE_TIMEOUT(ms):10000
MAX_WRITE_SELECTOR_POOL_SIZE:30
VIRTUAL_MULTICAST_URI_LIST:null

#]

[#|2011-08-16T10:59:17.157-0700|INFO|glassfish3.1.1|grizzly|_ThreadID=20;_ThreadName=Thread-2;|GRIZZ
LY0001: Starting Grizzly Framework 1.9.36 - 8/16/11 10:59 AM|#]

[#|2011-08-16T10:59:17.184-0700|CONFIG|glassfish3.1.1|ShoalLogger|_ThreadID=1;_ThreadName=Thread-2;|
Grizzly controller listening on /0:0:0:0:0:0:0:0:9179. Controller started in 37 ms|#]

[#|2011-08-16T10:59:17.422-0700|SEVERE|glassfish3.1.1|javax.org.glassfish.gms.org.glassfish.gms|_Thr
eadID=1;_ThreadName=Thread-2;|GMSAD1017: GMS failed to start. See stack trace for additional informa
tion.
com.sun.enterprise.ee.cms.core.GMSException: failed to join group c1
at com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.join(GMSContextImpl.java:181)
at com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.join(GroupManagementServ
iceImpl.java:382)
at org.glassfish.gms.GMSAdapterImpl.initializeGMS(GMSAdapterImpl.java:576)
at org.glassfish.gms.GMSAdapterImpl.initialize(GMSAdapterImpl.java:199)
at org.glassfish.gms.bootstrap.GMSAdapterService.loadModule(GMSAdapterService.java:218)
at org.glassfish.gms.bootstrap.GMSAdapterService.checkCluster(GMSAdapterService.java:192)
at org.glassfish.gms.bootstrap.GMSAdapterService.postConstruct(GMSAdapterService.java:136)
at com.sun.hk2.component.AbstractCreatorImpl.inject(AbstractCreatorImpl.java:131)
at com.sun.hk2.component.ConstructorCreator.initialize(ConstructorCreator.java:91)
at com.sun.hk2.component.AbstractCreatorImpl.get(AbstractCreatorImpl.java:82)
at com.sun.hk2.component.SingletonInhabitant.get(SingletonInhabitant.java:67)
at com.sun.hk2.component.EventPublishingInhabitant.get(EventPublishingInhabitant.java:139)
at com.sun.hk2.component.AbstractInhabitantImpl.get(AbstractInhabitantImpl.java:76)
at com.sun.enterprise.v3.server.AppServerStartup.run(AppServerStartup.java:253)
at com.sun.enterprise.v3.server.AppServerStartup.doStart(AppServerStartup.java:145)
at com.sun.enterprise.v3.server.AppServerStartup.start(AppServerStartup.java:136)
at com.sun.enterprise.glassfish.bootstrap.GlassFishImpl.start(GlassFishImpl.java:79)
at com.sun.enterprise.glassfish.bootstrap.GlassFishDecorator.start(GlassFishDecorator.java:6
3)
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishImpl.start(OSGiGlassFishImpl.jav
a:69)
at com.sun.enterprise.glassfish.bootstrap.GlassFishMain$Launcher.launch(GlassFishMain.java:1
17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.sun.enterprise.glassfish.bootstrap.GlassFishMain.main(GlassFishMain.java:97)
at com.sun.enterprise.glassfish.bootstrap.ASMain.main(ASMain.java:55)
Caused by: com.sun.enterprise.ee.cms.core.GMSException: initialization failure
at com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:142)
at com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.initializeGroupCommuni
cationProvider(GroupCommunicationProviderImpl.java:164)
at com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.join(GMSContextImpl.java:175)
... 25 more
Caused by: java.io.IOException: can not find a first InetAddress
at com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.start(GrizzlyNetworkManag
er.java:376)
at com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:140)
... 27 more

#]

Here are the wireless network settings:

D:\tools\glassfish\3.1.1\ose-glassfish3-full>ipconfig /all

Windows IP Configuration

Host Name . . . . . . . . . . . . : ARUNGUP-LAP
Primary Dns Suffix . . . . . . . : st-users.us.oracle.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : st-users.us.oracle.com
us.oracle.com

Ethernet adapter Bluetooth Network Connection:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Bluetooth Device (Personal Area Network)
Physical Address. . . . . . . . . : 70-F1-A1-9B-D6-3C
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

Wireless LAN adapter Wireless Network Connection:

Connection-specific DNS Suffix . : us.oracle.com
Description . . . . . . . . . . . : Intel(R) Centrino(R) Advanced-N 6200 AGN
Physical Address. . . . . . . . . : 00-27-10-17-FB-9C
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.151.1.82(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.224.0
Lease Obtained. . . . . . . . . . : Tuesday, August 16, 2011 9:18:51 AM
Lease Expires . . . . . . . . . . : Tuesday, August 16, 2011 2:35:41 PM
Default Gateway . . . . . . . . . : 10.151.0.1
DHCP Server . . . . . . . . . . . : 10.196.255.250
DNS Servers . . . . . . . . . . . : 148.87.1.22
148.87.112.101
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Local Area Connection:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . : us.oracle.com
Description . . . . . . . . . . . : Intel(R) 82577LM Gigabit Network Connection
Physical Address. . . . . . . . . : 00-26-B9-F1-15-19
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

validate-multicast with/without --bindaddress passes.

The cluster could be successfully started with the wired network.

Tried the same steps on home wired/wireless network and got the same results.



 Comments   
Comment by Joe Fialli [ 17/Aug/11 ]

There is insufficient information to evaluate this issue. The submittd ipconfig output from windows is missing whether multicast is enabled or not for the network interface.

GMS will not automatically select a network interface when NetworkInterface.supportsMulticast() does not
return true.

Please submit the output of running following command to confirm whether NetworkInterface.supportsMulticast()
is returning true for the wireless network interface.

cd to glassfish installation directory and run the following command:

$ java -classpath glassfish3/glassfish/modules/shoal-gms-impl.jar com.sun.enterprise.mgmt.transport.NetworkUtility

Here is the output from my mac from running this.

AllLocalAddresses() = [/fe80:0:0:0:223:32ff:fe97:5cf7%5, /10.152.23.224, /fe80:0:0:0:0:0:0:1%1]
getFirstNetworkInterface() = name:en0 (en0)
getFirstInetAddress( true ) = /fe80:0:0:0:223:32ff:fe97:5cf7%5
getFirstInetAddress( false ) = /10.152.23.224
getFirstNetworkInteface() = name:en0 (en0)
getFirstInetAddress(firstNetworkInteface, true) = /fe80:0:0:0:223:32ff:fe97:5cf7%5
getFirstInetAddress(firstNetworkInteface, false) = /10.152.23.224

The issue is that automatic selection of the network interface is failing, the above is the unit test for this case.
Did you try the workaround of explicitly setting BIND_INTERFACE_ADDRESS ?

Comment by Joe Fialli [ 17/Aug/11 ]

lowered priority since explicitly setting BIND_INTERFACE_ADDRESS will work around the issue.
(See link http://download.oracle.com/docs/cd/E18930_01/html/821-2426/gjfnl.html#gjdlw
for details on how to configure this property.

Additionally, while inconvenient that automatic selection of a network interface is not working properly,
more info is needed to verify network interface configuration since the submitted info does
not contain MULTICAST enabled info. The request for additional info in comments section will resolve
the shortage of information and allow us to determine whether this is a truely a blocking issue
for wireless networks on windows 7 using jdk 7.

Comment by Joe Fialli [ 17/Aug/11 ]

Awaiting confirmation from reporter if this issue is due to windows firewall and requires
network configuration by user to enable network communications between processes.

Specifically, create an inbound rule in Windows Firewall that allows all connections from all
other members of the cluster.

Comment by arungupta [ 17/Aug/11 ]

The output from the command is:

D:\tools\glassfish\3.1.1\ose-glassfish3-full>java -classpath glassfish3\glassfish\modules\shoal-gms-impl.jar com.sun.enterprise.mgmt.transport.NetworkUtility
AllLocalAddresses() = [/127.0.0.1, /0:0:0:0:0:0:0:1]
getFirstNetworkInterface() = name:lo (Software Loopback Interface 1)
getFirstInetAddress( true ) = null
getFirstInetAddress( false ) = null
getFirstNetworkInteface() = name:lo (Software Loopback Interface 1)
getFirstInetAddress(firstNetworkInteface, true) = null
getFirstInetAddress(firstNetworkInteface, false) = null

Explicitly setting GMS-BIND-INTERFACE-ADDRESS-c1 as a system property is the workaround.

The firewall rules will be required even if the DAS/instances are all on the local machine ?

Comment by Joe Fialli [ 18/Aug/11 ]

Attempted to recreate reported issue to determine if this is a general issue that all configurations
of glassfish 3.1.1, windows 7 and jdk 7 would hit running using wireless network.

Downloaded JDK 7 and GlassFish 3.1.1 to a HP Windows 7 Professional Laptop with only wireless network connection.
(This laptop was running Norton Security instead of MacAfee Security in this report.)

Was able to create a GlassFish cluster and instances were able to see each other.
Key Log info.

Aug 18, 2011 7:36:01 AM com.sun.enterprise.admin.launcher.GFLauncherLogger info
INFO: JVM invocation command line:
C:\Program Files\Java\jdk1.7.0\bin\java.exe

#|2011-08-18T07:37:21.524-0400|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092: GMS View Change Received for group: mycluster : Members in view for JOINED_AND_READY_EVENT(before change analysis) are :
1: MemberId: instance01, MemberType: CORE, Address: 10.0.1.11:9145:228.9.29.50:30647:mycluster:instance01
2: MemberId: instance02, MemberType: CORE, Address: 10.0.1.11:9122:228.9.29.50:30647:mycluster:instance02
3: MemberId: server, MemberType: SPECTATOR, Address: 10.0.1.11:9116:228.9.29.50:30647:mycluster:server

#]

Additionally, ran both ipconfig and ListNetsEx program that uses java.net.NetworkInterface methods used by
GMS to locate the first inet address. Attaching complete output, here is key output from those two commands.

Wireless LAN adapter Wireless Network Connection:

Connection-specific DNS Suffix . : hsd1.ma.comcast.net.
Description . . . . . . . . . . . : Atheros AR9285 802.11b/g/n WiFi Adapter
Physical Address. . . . . . . . . : C4-17-FE-2C-6C-51
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::4839:4e5d:143c:ea11%11(Preferred)
IPv4 Address. . . . . . . . . . . : 10.0.1.11(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Monday, August 15, 2011 9:41:36 AM
Lease Expires . . . . . . . . . . : Monday, August 22, 2011 6:30:28 AM
Default Gateway . . . . . . . . . : 10.0.1.1
DHCP Server . . . . . . . . . . . : 10.0.1.1
DHCPv6 IAID . . . . . . . . . . . : 314841086
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-12-EB-25-F6-00-26-9E-EC-3C-66
DNS Servers . . . . . . . . . . . : 10.0.1.1
NetBIOS over Tcpip. . . . . . . . : Enabled

ListNetsEx output:
Display name: Atheros AR9285 802.11b/g/n WiFi Adapter
Name: net3
InetAddress: /10.0.1.11
InetAddress: /fe80:0:0:0:4839:4e5d:143c:ea11%11
Up? true
Loopback? false
PointToPoint? false
Supports multicast? true
Virtual? false
Hardware address: [-60, 23, -2, 44, 108, 81]
MTU: 1500

Note that unlike submitted case, that the network interface "up" is returning true, which allows automatic
detection of inet address to be found.

There is something about the submitter's configuration that is resulting in java.net.NetworkInterface.isUp()
to incorrectly return false for the wireless adapter. Can not be sure what it is since was unable to recreate the
failure. Downgrading this issue to trivial since there is a workaround and we were not able to recreate
the failure with the provided configuration information.





Generated at Mon Apr 27 06:25:17 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.