glassfish
  1. glassfish
  2. GLASSFISH-16311

Improve operating service (OS) integration

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.1
    • Fix Version/s: future release
    • Component/s: admin
    • Labels:
      None
    • Environment:

      Windows, Linux, Solaris

      Description

      This RFE is for improving the operating system (OS) service integration for GlassFish. Here are the requirements:

      1. Expose the asadmin delete-service interface as a public interface (rather than being hidden as it is currently).

      2. Modify the service implementation so that it acts as a monitor on all operating systems. By monitor, this means that if the service is started, then if the GlassFish process exits, it should be restarted automatically. The Windows service currently doesn't work this way - I'm not sure for Linux and Solaris.

      3. Modify the service implementation and/or the various start/stop commands so that they interact correctly. This includes:

      3a. If the OS service is started, and the user runs stop-local-instance or stop-domain, then the OS service should be stopped too.

      3b. If the server is started using start-domain or start-local-instance, and the OS service is not started, the OS service should not be started. However, if the user then later starts the OS service, the OS service must recognize that the server is already running and monitor the already running server (See req. #2).

      4. Any sequence of OS service commands and asadmin start/stop-domain or start/stop-local-instance commands must not cause a failure of the command. For example, currently, if you start the Windows OS service for a domain, and then run stop-domain, and then try to stop the Windows OS service, you get an error message from Windows. Also on Windows, if you run start-domain, and then start the OS service, you get an error message saying that "The process terminated unexpectedly." and the service isn't started.

      5. The service should be able to be deleted using either the operating system command for deleting services or the asadmin delete-service command. So the following sequences should work:

      asadmin create-service
      OS delete service command
      asadmin create-service
      asadmin delete-service

      Currently, the 2nd create-service in this sequence may require a --force option. Ideally, it shouldn't.

      6. These requirements apply to the officially supported operating systems for GlassFish for Windows, Linux, and Solaris.

        Issue Links

          Activity

          Hide
          jclingan added a comment -

          Good RFE write-up. One additional bullet:

          6) Currently we have a feature gap on Linux, where no "watchdog" or "monitor" role is offered using our linux service template. Upstart is available on RHEL 6 and OL 6 (timeline for SuSE Enterprise Linux is TBD). We should investigate if an upstart job definition can be created to fulfill the watchdog role on supported OSs. As a heads-up, it looks like Fedora may move to systemd in the future, so watchdog approach may change in the future.

          Show
          jclingan added a comment - Good RFE write-up. One additional bullet: 6) Currently we have a feature gap on Linux, where no "watchdog" or "monitor" role is offered using our linux service template. Upstart is available on RHEL 6 and OL 6 (timeline for SuSE Enterprise Linux is TBD). We should investigate if an upstart job definition can be created to fulfill the watchdog role on supported OSs. As a heads-up, it looks like Fedora may move to systemd in the future, so watchdog approach may change in the future.
          Hide
          Byron Nevins added a comment -

          Problems with #2
          =================== problem #1 ================
          Mainly the problem of getting into an infinite loop trying to start an unstable server. Windows allows you to set what to do –
          You can set any of the allowed 3 occurrences to handle any of the 4 allowed responses.

          1. First Failure
          2. Second Failure
          3. Subsequent Failures

          1. restart the service
          2. reboot
          3. run some other program
          4. ignore it

          As you can see just figuring out how to allow users to configure these options and then implementing in 3 main platforms that are totally different is quite a task!

          Of course – why is the server crashing? I have V2 running at home now for several years. It never crashes. Do we really want to do all of this complicated and expensive work for something that should be exceedingly rare?

          ======= problem #2 =========== (This caused many support problems with V2)
          User kills the server forcefully (e.g. using "kill" or "taskkill.exe")
          A moment later it is running again.
          He scratches his head kills it again
          A moment later it is running again.
          "Hello. Customer support? ...."

          Show
          Byron Nevins added a comment - Problems with #2 =================== problem #1 ================ Mainly the problem of getting into an infinite loop trying to start an unstable server. Windows allows you to set what to do – You can set any of the allowed 3 occurrences to handle any of the 4 allowed responses. 1. First Failure 2. Second Failure 3. Subsequent Failures 1. restart the service 2. reboot 3. run some other program 4. ignore it As you can see just figuring out how to allow users to configure these options and then implementing in 3 main platforms that are totally different is quite a task! Of course – why is the server crashing? I have V2 running at home now for several years. It never crashes. Do we really want to do all of this complicated and expensive work for something that should be exceedingly rare? ======= problem #2 =========== (This caused many support problems with V2) User kills the server forcefully (e.g. using "kill" or "taskkill.exe") A moment later it is running again. He scratches his head kills it again A moment later it is running again. "Hello. Customer support? ...."
          Hide
          Byron Nevins added a comment -

          Comment from Bill Shannon about #3B

          I agree about 3b. If you start the server without using the OS service
          mechanism, you're probably not going to be able to get the service mechanism
          to monitor that service later. Mostly this should be as expected. The key
          is to get stopping and restarting to integrate properly with the service
          mechanism. Possibly also start-instance. I'm not sure we want to make
          start-local-instance or start-domain just be front-ends to the service
          mechanism (if the server is configured to be handled by the service mechanism).

          Show
          Byron Nevins added a comment - Comment from Bill Shannon about #3B I agree about 3b. If you start the server without using the OS service mechanism, you're probably not going to be able to get the service mechanism to monitor that service later. Mostly this should be as expected. The key is to get stopping and restarting to integrate properly with the service mechanism. Possibly also start-instance. I'm not sure we want to make start-local-instance or start-domain just be front-ends to the service mechanism (if the server is configured to be handled by the service mechanism).
          Hide
          Bill Shannon added a comment -

          Ok, apparently we're going to be using Jira as a discussion forum...

          For problem #1, we should just pick some particular combination of
          options and allow users to customize it using the OS-specific mechanisms.

          For problem #2, this is the normal behavior for any service right?
          If the user chooses to have the service mechanism manage his server,
          this is what he should expect, and what he would get with any other
          server managed by the service mechanism, right? For that matter,
          this is what he would get with the old node agent - kill the server
          instance and the node agent would restart it because it "crashed",
          right?

          Show
          Bill Shannon added a comment - Ok, apparently we're going to be using Jira as a discussion forum... For problem #1, we should just pick some particular combination of options and allow users to customize it using the OS-specific mechanisms. For problem #2, this is the normal behavior for any service right? If the user chooses to have the service mechanism manage his server, this is what he should expect, and what he would get with any other server managed by the service mechanism, right? For that matter, this is what he would get with the old node agent - kill the server instance and the node agent would restart it because it "crashed", right?
          Hide
          Tom Mueller added a comment -

          The changes to OS service integration should also implement those suggested in issue 11692.

          Show
          Tom Mueller added a comment - The changes to OS service integration should also implement those suggested in issue 11692.
          Hide
          Tom Mueller added a comment -

          This issue should resolve issue 16140 also.

          Show
          Tom Mueller added a comment - This issue should resolve issue 16140 also.
          Hide
          Byron Nevins added a comment -
          Show
          Byron Nevins added a comment - One-Pager Added here: http://wikis.sun.com/display/GlassFish/3.2PlatformServices
          Hide
          Byron Nevins added a comment -

          Pasted the description here. Lines that start with **** are my comments.

          1. Expose the asadmin delete-service interface as a public interface (rather than being hidden as it is currently).

                • Yes - Also add start|stop|list

          2. Modify the service implementation so that it acts as a monitor on all operating systems. By monitor, this means that if the service is started, then if the GlassFish process exits, it should be restarted automatically. The Windows service currently doesn't work this way - I'm not sure for Linux and Solaris.

                • Yes. I will choose reasonable defaults for each platform – as long as the platform supports it.

          3. Modify the service implementation and/or the various start/stop commands so that they interact correctly. This includes:

          3a. If the OS service is started, and the user runs stop-local-instance or stop-domain, then the OS service should be stopped too.

                • Misunderstanding of what a Service is. GF-instance and the "service" are the same thing, sort of. Once the service has started - when you stop the server you have stopped the "service". Compare to a SMTP server. When you stop the SMTP server process you have also stopped the SMTP "service".

          3b. If the server is started using start-domain or start-local-instance, and the OS service is not started, the OS service should not be started.
          ****It can't be started. GF won't allow the same server to be started twice.

          However, if the user then later starts the OS service, the OS service must recognize that the server is already running and monitor the already running server (See req. #2).

                • Impossible - will not do.

          4. Any sequence of OS service commands and asadmin start/stop-domain or start/stop-local-instance commands must not cause a failure of the command. For example, currently, if you start the Windows OS service for a domain, and then run stop-domain, and then try to stop the Windows OS service, you get an error message from Windows. Also on Windows, if you run start-domain, and then start the OS service, you get an error message saying that "The process terminated unexpectedly." and the service isn't started.

                • This is all expected and what we want!

          5. The service should be able to be deleted using either the operating system command for deleting services or the asadmin delete-service command. So the following sequences should work:

          asadmin create-service
          OS delete service command
          asadmin create-service
          asadmin delete-service

          Currently, the 2nd create-service in this sequence may require a --force option. Ideally, it shouldn't.

              • will do

          6. These requirements apply to the officially supported operating systems for GlassFish for Windows, Linux, and Solaris.

              • indeed.
          Show
          Byron Nevins added a comment - Pasted the description here. Lines that start with **** are my comments. 1. Expose the asadmin delete-service interface as a public interface (rather than being hidden as it is currently). Yes - Also add start|stop|list 2. Modify the service implementation so that it acts as a monitor on all operating systems. By monitor, this means that if the service is started, then if the GlassFish process exits, it should be restarted automatically. The Windows service currently doesn't work this way - I'm not sure for Linux and Solaris. Yes. I will choose reasonable defaults for each platform – as long as the platform supports it. 3. Modify the service implementation and/or the various start/stop commands so that they interact correctly. This includes: 3a. If the OS service is started, and the user runs stop-local-instance or stop-domain, then the OS service should be stopped too. Misunderstanding of what a Service is. GF-instance and the "service" are the same thing, sort of. Once the service has started - when you stop the server you have stopped the "service". Compare to a SMTP server. When you stop the SMTP server process you have also stopped the SMTP "service". 3b. If the server is started using start-domain or start-local-instance, and the OS service is not started, the OS service should not be started. ****It can't be started. GF won't allow the same server to be started twice. However, if the user then later starts the OS service, the OS service must recognize that the server is already running and monitor the already running server (See req. #2). Impossible - will not do. 4. Any sequence of OS service commands and asadmin start/stop-domain or start/stop-local-instance commands must not cause a failure of the command. For example, currently, if you start the Windows OS service for a domain, and then run stop-domain, and then try to stop the Windows OS service, you get an error message from Windows. Also on Windows, if you run start-domain, and then start the OS service, you get an error message saying that "The process terminated unexpectedly." and the service isn't started. This is all expected and what we want! 5. The service should be able to be deleted using either the operating system command for deleting services or the asadmin delete-service command. So the following sequences should work: asadmin create-service OS delete service command asadmin create-service asadmin delete-service Currently, the 2nd create-service in this sequence may require a --force option. Ideally, it shouldn't. will do 6. These requirements apply to the officially supported operating systems for GlassFish for Windows, Linux, and Solaris. indeed.
          Hide
          Byron Nevins added a comment -

          Just thinking out loud here.

          1) services actually call

          asadmin start-XXX --verbose

          2) if the server crashes – asadmin knows about it and is capable of restarting it itself without the platform doing anything at all!

          3) if the server is stopped in an orderly way, asadmin knows this also and it can tell the difference from a crash.

          ==============

          Show
          Byron Nevins added a comment - Just thinking out loud here. 1) services actually call asadmin start-XXX --verbose 2) if the server crashes – asadmin knows about it and is capable of restarting it itself without the platform doing anything at all! 3) if the server is stopped in an orderly way, asadmin knows this also and it can tell the difference from a crash. ==============
          Hide
          Byron Nevins added a comment -

          Please see the One Pager:

          http://wikis.sun.com/display/GlassFish/3.2PlatformServices

          Note that I have added a feature – we will support services on all GlassFish-supported Platforms.

          Show
          Byron Nevins added a comment - Please see the One Pager: http://wikis.sun.com/display/GlassFish/3.2PlatformServices Note that I have added a feature – we will support services on all GlassFish-supported Platforms.
          Hide
          Byron Nevins added a comment -

          THIS IS THE UMBRELLA ISSUE FOR IMPROVED PLATFORM SERVICES for 3.2

          Show
          Byron Nevins added a comment - THIS IS THE UMBRELLA ISSUE FOR IMPROVED PLATFORM SERVICES for 3.2
          Hide
          mkarg added a comment -

          I want to recommend not having two different JVMs involved or using scripts at all on Windows to implement services.

          See, on Windows, a real service is implementing an API defined by Microsoft, which lets Windows monitor the service on its own - there is no need for an additional Watchdog, as Windows is a service watchdog (it even comes with configurable rules what to do when the service fails and can restart it etc)! This is the most clean solution and it could be done very easily by just a few lines of JNA code within the GlassFish kernel.

          See http://msdn.microsoft.com/en-us/library/ms685141(v=vs.85).aspx

          Show
          mkarg added a comment - I want to recommend not having two different JVMs involved or using scripts at all on Windows to implement services. See, on Windows, a real service is implementing an API defined by Microsoft, which lets Windows monitor the service on its own - there is no need for an additional Watchdog, as Windows is a service watchdog (it even comes with configurable rules what to do when the service fails and can restart it etc)! This is the most clean solution and it could be done very easily by just a few lines of JNA code within the GlassFish kernel. See http://msdn.microsoft.com/en-us/library/ms685141(v=vs.85).aspx
          Hide
          mkarg added a comment -

          I want to suggest that asadmin create-service provides a slightly changes configuration:

          • Since Windows 2008 there are special account types for services due to security reasons. I want to suggest that the service is not created to be run as the local SYSTEM account (= with highest possible access rights) but instead the installer should create a local service type account and register on the most essential access rights with that. In a productive system it is not appropriate to run as local SYSTEM account, and the administrator doesn't know what access rights GF will need, so he cannot change it.
          • Windows has a built-in watchdog facility. The configuration should be made up in a way that automatically restarts after first fail, runs some kind of domain repair at the second fail (if there at all is something that GF can repair), and restarts the host at the third fail. The failure counter should be reset after one week. This stuff already is there, so please use it.

          Windows is Windows, not UNIX plus a GUI.

          Show
          mkarg added a comment - I want to suggest that asadmin create-service provides a slightly changes configuration: Since Windows 2008 there are special account types for services due to security reasons. I want to suggest that the service is not created to be run as the local SYSTEM account (= with highest possible access rights) but instead the installer should create a local service type account and register on the most essential access rights with that. In a productive system it is not appropriate to run as local SYSTEM account, and the administrator doesn't know what access rights GF will need, so he cannot change it. Windows has a built-in watchdog facility. The configuration should be made up in a way that automatically restarts after first fail, runs some kind of domain repair at the second fail (if there at all is something that GF can repair), and restarts the host at the third fail. The failure counter should be reset after one week. This stuff already is there, so please use it. Windows is Windows, not UNIX plus a GUI.
          Hide
          Byron Nevins added a comment -

          "it could be done very easily by just a few lines of JNA code"

          Please to provide these very easy few lines of code.

          Show
          Byron Nevins added a comment - "it could be done very easily by just a few lines of JNA code" Please to provide these very easy few lines of code.
          Hide
          Bill Shannon added a comment -

          Markus, this seems to be important to you, and you seem to know more about it
          than most of us. Perhaps you'd be interested in implementing it and contributing
          it to the GlassFish community? While we'd all like to see these sorts of
          improvements, I doubt that we would do as good a job as you would.

          Show
          Bill Shannon added a comment - Markus, this seems to be important to you, and you seem to know more about it than most of us. Perhaps you'd be interested in implementing it and contributing it to the GlassFish community? While we'd all like to see these sorts of improvements, I doubt that we would do as good a job as you would.
          Hide
          mkarg added a comment -

          Bill,

          I would be happy to provide an implementation, but I have no strong GlassFish internals background, so I will need help with that. What I can provide is the complete JNA or C++ code for a "good and complete" "real" Windows service, but what I need it someone that will tell me (a) how to build GlassFish from scratch and (b) the few Java lines needed to issue a GF startup / shutdown / status-query. If we can organize this then I will be glad to provide the complete Windows part.

          Regards
          Markus

          Show
          mkarg added a comment - Bill, I would be happy to provide an implementation, but I have no strong GlassFish internals background, so I will need help with that. What I can provide is the complete JNA or C++ code for a "good and complete" "real" Windows service, but what I need it someone that will tell me (a) how to build GlassFish from scratch and (b) the few Java lines needed to issue a GF startup / shutdown / status-query. If we can organize this then I will be glad to provide the complete Windows part. Regards Markus
          Hide
          Bill Shannon added a comment -

          Ideally you would just issue the "asadmin start-domain" command to
          start the app server. Is there some reason you need to start it
          "in process"? If so, you'll need to duplicate the environment setup
          from asadmin.bat and then start a JVM using the same arguments that
          asadmin.bat does.

          In any event, Byron is the expert on starting GlassFish.

          Also, hopefully, you wouldn't need to build GlassFish in order to do
          this, but if you did there's build instructions on the wiki.

          Show
          Bill Shannon added a comment - Ideally you would just issue the "asadmin start-domain" command to start the app server. Is there some reason you need to start it "in process"? If so, you'll need to duplicate the environment setup from asadmin.bat and then start a JVM using the same arguments that asadmin.bat does. In any event, Byron is the expert on starting GlassFish. Also, hopefully, you wouldn't need to build GlassFish in order to do this, but if you did there's build instructions on the wiki.
          Hide
          Byron Nevins added a comment -

          I'm not sure what you mean by "the few Java lines needed to issue a GF startup / shutdown / status-query". Perhaps you mean this?

          How to start, stop, check status of domain

          java -jar "%GF_HOME%\modules\admin-cli.jar" start-domain
          java -jar "%GF_HOME%\modules\admin-cli.jar" stop-domain
          java -jar "%GF_HOME%\modules\admin-cli.jar" list-domains

          How to start, stop, check status of instance

          java -jar "%GF_HOME%\modules\admin-cli.jar" start-local-instance instance1
          java -jar "%GF_HOME%\modules\admin-cli.jar" stop-local-instance instance1
          java -jar "%GF_HOME%\modules\admin-cli.jar" list-instances --long

          ------------------
          If you mean the source lines that run from the above calls – they are definitely not "few". There are thousands of lines required. They are located in admin/launcher, core/kernel, core/bootstrap, cluster/cli, cluster/admin and common/common-util (off the top of my head)

          Show
          Byron Nevins added a comment - How to build glassfish from scratch ** ======================================= svn co https://svn.java.net/svn/glassfish~svn/trunk/main cd appserver mvn install ======================================= I'm not sure what you mean by "the few Java lines needed to issue a GF startup / shutdown / status-query". Perhaps you mean this? How to start, stop, check status of domain java -jar "%GF_HOME%\modules\admin-cli.jar" start-domain java -jar "%GF_HOME%\modules\admin-cli.jar" stop-domain java -jar "%GF_HOME%\modules\admin-cli.jar" list-domains How to start, stop, check status of instance java -jar "%GF_HOME%\modules\admin-cli.jar" start-local-instance instance1 java -jar "%GF_HOME%\modules\admin-cli.jar" stop-local-instance instance1 java -jar "%GF_HOME%\modules\admin-cli.jar" list-instances --long ------------------ If you mean the source lines that run from the above calls – they are definitely not "few". There are thousands of lines required. They are located in admin/launcher, core/kernel, core/bootstrap, cluster/cli, cluster/admin and common/common-util (off the top of my head)
          Hide
          mkarg added a comment -

          My idea is basing on in-process because it is the natural way on Windows to implement services, and it simplifies the complexity by far, as there is no more watchdog asadmin needed. GlassFish will just feel and behave as a native service, so no more scripts are involved. Windows admins don't like scripts, as Windows has a completely different architecture compared to UNIX. UNIX does everything in scripts, Windows does virtually nothin in scripts. So the target is, to get rid of scripts.

          Thank you Byron for the hint. Actually I looked for the single entry points in Java source code that make the following happen (in pseudo code):

          • GlassFish.start!
          • GlassFish.stop!
          • (opt.) GlassFish.pause!
          • (opt.) GlassFish.resume!
          • GlassFish.state?

          If that is not existing, I will inspect what the scripts do and repeat that in pure Java (hence my question about building from scratch; got that meanwhile using svn and mvn BTW). My target is to provide java source that is using / implementing the native Windows API that directory executes this commands in-process.

          Show
          mkarg added a comment - My idea is basing on in-process because it is the natural way on Windows to implement services, and it simplifies the complexity by far, as there is no more watchdog asadmin needed. GlassFish will just feel and behave as a native service, so no more scripts are involved. Windows admins don't like scripts, as Windows has a completely different architecture compared to UNIX. UNIX does everything in scripts, Windows does virtually nothin in scripts. So the target is, to get rid of scripts. Thank you Byron for the hint. Actually I looked for the single entry points in Java source code that make the following happen (in pseudo code): GlassFish.start! GlassFish.stop! (opt.) GlassFish.pause! (opt.) GlassFish.resume! GlassFish.state? If that is not existing, I will inspect what the scripts do and repeat that in pure Java (hence my question about building from scratch; got that meanwhile using svn and mvn BTW). My target is to provide java source that is using / implementing the native Windows API that directory executes this commands in-process.
          Hide
          Tom Mueller added a comment -

          Marking the fix version field as "future-release". This is based on an evaluation by John, Michael, and Tom WRT to the PRD for the Java EE 7 RI/SDK. This issues was deemed to not be a P1 for that release. If this is in error or there are other reasons why this RFE should be targeted for the Java EE 7 RI/SDK release, then change the fix version field back to an appropriate build.

          Show
          Tom Mueller added a comment - Marking the fix version field as "future-release". This is based on an evaluation by John, Michael, and Tom WRT to the PRD for the Java EE 7 RI/SDK. This issues was deemed to not be a P1 for that release. If this is in error or there are other reasons why this RFE should be targeted for the Java EE 7 RI/SDK release, then change the fix version field back to an appropriate build.

            People

            • Assignee:
              Byron Nevins
              Reporter:
              Tom Mueller
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: