glassfish
  1. glassfish
  2. GLASSFISH-16331

stop/start/restart domain local commands are yet not perfect

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.1_dev, 4.0
    • Component/s: admin
    • Labels:
      None

      Description

      On (my) Windows QuickLook fails 100% of the time with one failure [1].

      Here is what is happening:

      (0) There is an empty (zero-byte) local-password file in the domain's config directory and the domain is definitely running.
      (1) restart-domain is a subclass of StopDomain. StopDomain.executeCommand() sees that there is no valid local-password
      (2) by DEFINITION, (1) means that the domain is NOT running.
      (3) So now asadmin thinks that there is no DAS running – so it tries the improvement that was added --> start the domain
      (4) (3) is done inside RestartDomain's dasNotRunning() method. It calls start-domain
      (5) start-domain detects that its admin port is in use – and fails because of that.

      Result – impossible to restart the running DAS without a forcible OS-level kill. It can not be restarted locally. It can not be stopped locally the normal way.

      =============================

      Question: What happened to the local-password file? Unknown. I didn't look into that. QL tests are hostile to debugging – it takes forever.

      ==============================
      But this is a good bug that QL uncovered. This behavior is bad. It is fragile (brittle?). stopdomain shouldn't merely look for the magic file. It ought to also check and see if the admin port is in use. If so – it can call the _localdirectories(sp?) command on that port and see if it REALLY is the domain. Then emit a severe/warning message about the weird empty local password file.

      ======================================================

      Another EASY way to reproduce this bug –
      1) start a domain
      2) edit the local-password file. Make it empty and save it
      3) restart-domain

      ======================================================

      Also –
      restart-domain --kill is NOT working:

      D:\glassfish3\glassfish\domains\domain1\config>asadmin restart-domain --kill
      Server is not running, will attempt to start it...
      There is a process already using the admin port 4848 – it probably is another instance of a GlassFish server.

      ========================================================

      [1]
      test.admincli.RestartDomainTests:restartDomainTest

      Restart domain failed. expected:<true> but was:<false>
      org.testng.Assert.fail(Assert.java:84)
      at org.testng.Assert.failNotEquals(Assert.java:438)
      at org.testng.Assert.assertEquals(Assert.java:108)
      at org.testng.Assert.assertEquals(Assert.java:239)
      at test.admincli.RestartDomainTests.parseTestResults(RestartDomainTests.java:90)
      at test.admincli.RestartDomainTests.restartDomainTest(RestartDomainTests.java:62)
      23 lines not shown

        Issue Links

          Activity

          Hide
          Byron Nevins added a comment -

          Wait a minute. Not so fast. I just ran

          stop-domain --kill
          start-domain

          And there is a zero-byte local-password file! Maybe this was caused by Tim's recent changes?

          Show
          Byron Nevins added a comment - Wait a minute. Not so fast. I just ran stop-domain --kill start-domain And there is a zero-byte local-password file! Maybe this was caused by Tim's recent changes?
          Hide
          Tim Quinn added a comment -

          I am continuing this discussion in e-mail with Byron. Someone will post the net result here.

          Show
          Tim Quinn added a comment - I am continuing this discussion in e-mail with Byron. Someone will post the net result here.
          Hide
          Byron Nevins added a comment -

          LocalPasswordImpl runs before the logging service starts.
          LocalPasswordImpl makes logging calls. These messages disappear into thin air.

          This is what happens on my computer (Windows 7)

          if (!(
          localPasswordFile.setWritable(false, false) && // take from all
          localPasswordFile.setWritable(true, true) && // owner only
          localPasswordFile.setReadable(false, false) && // take from all
          localPasswordFile.setReadable(true, true)
          ))

          { // owner only logger.log(Level.WARNING, "localpassword.cantchmod", localPasswordFile.toString()); // if we can't protect it, don't write it return; }

          The if fires - a log message is written to thin air and it returns without writing the file.
          The file will never fix itself – this will continue to fail forever - until the user deletes it, I suppose.

          Catastrophic! I recommend deleting it in ALL cases – if possible. Then log to EarlyLogger with a *severe* message.

          Even more curious is that the code above the chunk I pasted calls delete() on the file and true is returned but the file was NOT deleted.

          Show
          Byron Nevins added a comment - LocalPasswordImpl runs before the logging service starts. LocalPasswordImpl makes logging calls. These messages disappear into thin air. This is what happens on my computer (Windows 7) if (!( localPasswordFile.setWritable(false, false) && // take from all localPasswordFile.setWritable(true, true) && // owner only localPasswordFile.setReadable(false, false) && // take from all localPasswordFile.setReadable(true, true) )) { // owner only logger.log(Level.WARNING, "localpassword.cantchmod", localPasswordFile.toString()); // if we can't protect it, don't write it return; } The if fires - a log message is written to thin air and it returns without writing the file. The file will never fix itself – this will continue to fail forever - until the user deletes it, I suppose. Catastrophic! I recommend deleting it in ALL cases – if possible. Then log to EarlyLogger with a * severe * message. Even more curious is that the code above the chunk I pasted calls delete() on the file and true is returned but the file was NOT deleted.
          Hide
          Byron Nevins added a comment -

          46057 is the culprit.

          The security checks always fail on Windows.

          Raising the priority for now. It is catastrophic for asadmin on windows

          Show
          Byron Nevins added a comment - 46057 is the culprit. The security checks always fail on Windows. Raising the priority for now. It is catastrophic for asadmin on windows
          Hide
          Bill Shannon added a comment -

          Went back to ignoring the return values from these filesystem operations.
          Fixed in trunk and 3.1.1 branch.

          Show
          Bill Shannon added a comment - Went back to ignoring the return values from these filesystem operations. Fixed in trunk and 3.1.1 branch.

            People

            • Assignee:
              Bill Shannon
              Reporter:
              Byron Nevins
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: