glassfish
  1. glassfish
  2. GLASSFISH-17116

list-instances lets asadmin timeout when an instance is hung

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.1
    • Fix Version/s: 3.1.2_b15, 4.0_b15
    • Component/s: admin
    • Labels:
      None

      Description

      When an instance is hung, then the list-instances command hangs too until asadmin finally times out after 600 seconds.

      The reason for this is that list-instances (via InstanceState) uses InstanceCommandExecutor to run the __locations command on the instance. This class runs the command without any timeout.

      To fix this bug, the connection to the instance should timeout after some reasonable interval (less than the time asadmin is waiting).

      This issue is being raised due to hang problems that have been experienced with AIX testing. With these hangs, it is possible to initiate a TCP connection to the process, but the connection attempt just hangs; it isn't processed and it isn't refused. To simulate this, set a breakpoint in the __locations command of the instance and see what list-instances does.

      The desirable output from list-instances in this situation is that the state of the instance would be reported as "non-responsive" or "hung".

        Issue Links

          Activity

          Hide
          Byron Nevins added a comment -

          Since it is so trivial, and instructive, here's the fix:

          BEFORE:
          InstanceCommandResult r = future.get(timeoutInMsec, TimeUnit.SECONDS);

          AFTER:
          InstanceCommandResult r = future.get(timeoutInMsec, TimeUnit.MILLISECONDS);

          Show
          Byron Nevins added a comment - Since it is so trivial, and instructive, here's the fix: BEFORE: InstanceCommandResult r = future.get(timeoutInMsec, TimeUnit.SECONDS); AFTER: InstanceCommandResult r = future.get(timeoutInMsec, TimeUnit.MILLISECONDS);
          Hide
          Byron Nevins added a comment -

          Here is the checkin to 4.0

          Waiting for approval for 3.1.2

          d:\gf\branches\3.1.2\cluster>svn commit D:/gf/trunk/main/nucleus/cluster/common/src/main/java/com/sun/enterprise/util/cluste
          Sending D:\gf\trunk\main\nucleus\cluster\common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java
          Transmitting file data .
          Committed revision 51569.

          Show
          Byron Nevins added a comment - Here is the checkin to 4.0 Waiting for approval for 3.1.2 d:\gf\branches\3.1.2\cluster>svn commit D:/gf/trunk/main/nucleus/cluster/common/src/main/java/com/sun/enterprise/util/cluste Sending D:\gf\trunk\main\nucleus\cluster\common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java Transmitting file data . Committed revision 51569.
          Hide
          Byron Nevins added a comment -

          Checked into 3.1.2 branch:

          d:\gf\branches\3.1.2\cluster>svn commit common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java
          Sending common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java
          Transmitting file data .
          Committed revision 51596.

          Show
          Byron Nevins added a comment - Checked into 3.1.2 branch: d:\gf\branches\3.1.2\cluster>svn commit common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java Sending common\src\main\java\com\sun\enterprise\util\cluster\InstanceInfo.java Transmitting file data . Committed revision 51596.
          Hide
          Byron Nevins added a comment -

          Changing the timeout from 2 seconds to 60 seconds as requestd by Tom Mueller.

          Code Review: Tom

          Show
          Byron Nevins added a comment - Changing the timeout from 2 seconds to 60 seconds as requestd by Tom Mueller. Code Review: Tom
          Hide
          Byron Nevins added a comment -

          Now it is a 60 second timeout.
          Note that if there is a Zombie server – you'll have to wait the full 60 seconds for the command to complete.

          Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ListInstancesCommand.java
          Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ListInstancesCommand.java
          Transmitting file data ..
          Committed revision 51791.

          Show
          Byron Nevins added a comment - Now it is a 60 second timeout. Note that if there is a Zombie server – you'll have to wait the full 60 seconds for the command to complete. Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ListInstancesCommand.java Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ListInstancesCommand.java Transmitting file data .. Committed revision 51791.

            People

            • Assignee:
              Byron Nevins
              Reporter:
              Tom Mueller
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: