glassfish
  1. glassfish
  2. GLASSFISH-13530

GMS_LISTENER port conflict when start-cluster of multiple instances on 1 machine

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: V3
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: Sun

      Description

      I have 3 instance cluster, all instances(instance101, instance102, instance103)
      on the same box

      One of the instance(instance102) is not detected by GMS

      All logs attached

      1. server.log.das
        70 kB
        gopaljorapur
      2. server.log.instance101
        39 kB
        gopaljorapur
      3. server.log.instance102
        30 kB
        gopaljorapur
      4. server.log.instance103
        38 kB
        gopaljorapur

        Activity

        Hide
        gopaljorapur added a comment -

        Created an attachment (id=4915)
        Server log of DAS

        Show
        gopaljorapur added a comment - Created an attachment (id=4915) Server log of DAS
        Hide
        gopaljorapur added a comment -

        Created an attachment (id=4916)
        Instance log

        Show
        gopaljorapur added a comment - Created an attachment (id=4916) Instance log
        Hide
        gopaljorapur added a comment -

        Created an attachment (id=4917)
        instance log

        Show
        gopaljorapur added a comment - Created an attachment (id=4917) instance log
        Hide
        gopaljorapur added a comment -

        Created an attachment (id=4918)
        i

        Show
        gopaljorapur added a comment - Created an attachment (id=4918) i
        Hide
        Bobby Bissett added a comment -

        Looking at the instance logs, it looks like you're using the same TCP port range
        used by Grizzly for all instances. Sometimes this will work, and other times
        (like this case it won't). Instance 1 and 2 have both tried to use port 9091 for
        communication, and so instance 2 is blocked from being able to communicate with
        GMS. This is only a problem when you run more than one instance on the same machine.

        You could recreate the instances and specify the port ranges to make sure there
        are no conflicts. But Joe has made changes in GMS so that you don't have to, so
        you might as well try with the new GMS jars rather than change the way you're
        setting up your system. I'm trying to promote a new version of Shoal now – if I
        can, I'll send a link to the new Shoal bits in Maven. Currently I can't get
        Hudson to respond to me. If this keeps up, I'll just attach a temporary
        shoal-gms-impl jar to the bug report (but I'd rather point to a more official one).

        One very simple workaround, in the mean time, is to start each instance
        individually rather than using asadmin start-cluster. That should avoid the
        concurrent-port-grabbing issue.

        Show
        Bobby Bissett added a comment - Looking at the instance logs, it looks like you're using the same TCP port range used by Grizzly for all instances. Sometimes this will work, and other times (like this case it won't). Instance 1 and 2 have both tried to use port 9091 for communication, and so instance 2 is blocked from being able to communicate with GMS. This is only a problem when you run more than one instance on the same machine. You could recreate the instances and specify the port ranges to make sure there are no conflicts. But Joe has made changes in GMS so that you don't have to, so you might as well try with the new GMS jars rather than change the way you're setting up your system. I'm trying to promote a new version of Shoal now – if I can, I'll send a link to the new Shoal bits in Maven. Currently I can't get Hudson to respond to me. If this keeps up, I'll just attach a temporary shoal-gms-impl jar to the bug report (but I'd rather point to a more official one). One very simple workaround, in the mean time, is to start each instance individually rather than using asadmin start-cluster. That should avoid the concurrent-port-grabbing issue.
        Hide
        Joe Fialli added a comment -

        Looking at the submitted logs, I confirmed that GMS_LISTENER_PORT-clustername is
        not being set on each clustered server instance. Thus, the DAS and all clustered
        instances are using the default Shoal gms port range of 9090 to 9120.

        The reason that this issue just started happening in glassfish v3.1 was "asadmin
        start-cluster" just started beginning all clustered instances at the same time
        instead of serially.
        This issue is already fixed in shoal gms workspace. So when the next shoal-gms
        integration occurs, one will not have to do the workaround described below.
        Before the integration occurs, below is the workaround so you will not
        be blocked anymore.

        Simplest workaround is to not use "asadmin start-cluster" and simply use
        "asadmin start-instance" and start clustered instances serially rather than
        concurrently.

        A workaround that enables use of "asadmin start-cluster"
        ith the current implementation, when running multiple clustered instances on one
        machine, one must set GMS_LISTENER_PORT-<clustername>. Here is a script of how
        one can do this in most convenient manner possible.

        $GF_HOME/bin/asadmin create-domain --nopassword=true mydomain
        $GF_HOME/bin/asadmin start-domain mydomain
        $GF_HOME/bin/asadmin create-cluster myCluster

        1. need to set unique GMS_LISTENER_PORT when running multiple instances on same
          machine. When instances are all started at once, there was a bug in shoal gms
        2. that many instances will try to use the same first port in the default range.
          #commonly DAS uses default port 9090 and the failure to start is over
          #contention for port 9091
        3. no need to set GMS_LISTENER_PORT when running one instance on each machine
          (includes DAS running on its own machine)
          $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster
          --systemproperties "GMS_LISTENER_PORT-myCluster=9491" instance1
          $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster
          --systemproperties "GMS_LISTENER_PORT-myCluster=9492" instance2
          $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster
          --systemproperties "GMS_LISTENER_PORT-myCluster=9493" instance3
          instance1
          $GF_HOME/bin/asadmin start-cluster myCluster
        Show
        Joe Fialli added a comment - Looking at the submitted logs, I confirmed that GMS_LISTENER_PORT-clustername is not being set on each clustered server instance. Thus, the DAS and all clustered instances are using the default Shoal gms port range of 9090 to 9120. The reason that this issue just started happening in glassfish v3.1 was "asadmin start-cluster" just started beginning all clustered instances at the same time instead of serially. This issue is already fixed in shoal gms workspace. So when the next shoal-gms integration occurs, one will not have to do the workaround described below. Before the integration occurs, below is the workaround so you will not be blocked anymore. Simplest workaround is to not use "asadmin start-cluster" and simply use "asadmin start-instance" and start clustered instances serially rather than concurrently. A workaround that enables use of "asadmin start-cluster" ith the current implementation, when running multiple clustered instances on one machine, one must set GMS_LISTENER_PORT-<clustername>. Here is a script of how one can do this in most convenient manner possible. $GF_HOME/bin/asadmin create-domain --nopassword=true mydomain $GF_HOME/bin/asadmin start-domain mydomain $GF_HOME/bin/asadmin create-cluster myCluster need to set unique GMS_LISTENER_PORT when running multiple instances on same machine. When instances are all started at once, there was a bug in shoal gms that many instances will try to use the same first port in the default range. #commonly DAS uses default port 9090 and the failure to start is over #contention for port 9091 no need to set GMS_LISTENER_PORT when running one instance on each machine (includes DAS running on its own machine) $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster --systemproperties "GMS_LISTENER_PORT-myCluster=9491" instance1 $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster --systemproperties "GMS_LISTENER_PORT-myCluster=9492" instance2 $GF_HOME/bin/asadmin create-instance --node localhost --cluster myCluster --systemproperties "GMS_LISTENER_PORT-myCluster=9493" instance3 instance1 $GF_HOME/bin/asadmin start-cluster myCluster
        Hide
        Joe Fialli added a comment -

        Removed blocking since there is multiple workarounds available
        for one to proceed.

        As soon as next shoal-gms integration occurs, this will be marked fixed.

        We have already performed extensive testing with latest shoal-gms jar
        and confirmed that one will no longer need to set GMS_LISTENER_PORT-clustername
        when running multiple instances on one machine.

        Show
        Joe Fialli added a comment - Removed blocking since there is multiple workarounds available for one to proceed. As soon as next shoal-gms integration occurs, this will be marked fixed. We have already performed extensive testing with latest shoal-gms jar and confirmed that one will no longer need to set GMS_LISTENER_PORT-clustername when running multiple instances on one machine.
        Hide
        Joe Fialli added a comment -

        Altered subject to describe what is occurring.

        GMS did not detect an instance since the instance failed to start with this
        SEVERE warning.

        [#|2010-09-17T14:03:52.579-0700|SEVERE|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|Exception
        during starting the controller
        java.net.BindException: No free port within range:
        9091=com.sun.grizzly.ReusableTCPSelectorHandler@40a47f
        at com.sun.grizzly.TCPSelectorHandler.initSelector(TCPSelectorHandler.java:430)
        at com.sun.grizzly.TCPSelectorHandler.preSelect(TCPSelectorHandler.java:376)
        at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:186)
        at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
        at
        java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

        #]

        [#|2010-09-17T14:03:52.581-0700|WARNING|glassfish3.1|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=15;_ThreadName=Thread-1;|GMSAD1008:
        GMSException occurred : failed to join group st-cluster|#]

        Show
        Joe Fialli added a comment - Altered subject to describe what is occurring. GMS did not detect an instance since the instance failed to start with this SEVERE warning. [#|2010-09-17T14:03:52.579-0700|SEVERE|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|Exception during starting the controller java.net.BindException: No free port within range: 9091=com.sun.grizzly.ReusableTCPSelectorHandler@40a47f at com.sun.grizzly.TCPSelectorHandler.initSelector(TCPSelectorHandler.java:430) at com.sun.grizzly.TCPSelectorHandler.preSelect(TCPSelectorHandler.java:376) at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:186) at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) #] [#|2010-09-17T14:03:52.581-0700|WARNING|glassfish3.1|javax.org.glassfish.gms.org.glassfish.gms|_ThreadID=15;_ThreadName=Thread-1;|GMSAD1008: GMSException occurred : failed to join group st-cluster|#]
        Hide
        Joe Fialli added a comment -

        shoal-gms with fix for this issue confirmed to be integrated in
        b21.

        Show
        Joe Fialli added a comment - shoal-gms with fix for this issue confirmed to be integrated in b21.

          People

          • Assignee:
            Joe Fialli
            Reporter:
            gopaljorapur
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: