[GLASSFISH-16311] Improve operating service (OS) integration Created: 04/Apr/11  Updated: 17/Oct/12

Status: Open
Project: glassfish
Component/s: admin
Affects Version/s: 3.1
Fix Version/s: future release

Type: Improvement Priority: Critical
Reporter: Tom Mueller Assignee: Byron Nevins
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows, Linux, Solaris


Issue Links:
Dependency
depends on GLASSFISH-16525 Add Auto Restarting Capability to Pla... Open
depends on GLASSFISH-10266 provide command to delete services Open
depends on GLASSFISH-11692 create-service improvements Open
depends on GLASSFISH-16522 Add Command for list-services Open
depends on GLASSFISH-16523 Platform Services - Add Support for A... Open
depends on GLASSFISH-16526 Add restart-service command Open
depends on GLASSFISH-16140 server running as a service on Window... Reopened
depends on GLASSFISH-5263 Platform Services and the -Xrs option Resolved
depends on GLASSFISH-12363 The error message of "asadmin create-... Closed
Duplicate
is duplicated by GLASSFISH-17126 -Xrs is missing, so on 100% of all Wi... Closed
Tags: ee7ri_cleanup_deferred

 Description   

This RFE is for improving the operating system (OS) service integration for GlassFish. Here are the requirements:

1. Expose the asadmin delete-service interface as a public interface (rather than being hidden as it is currently).

2. Modify the service implementation so that it acts as a monitor on all operating systems. By monitor, this means that if the service is started, then if the GlassFish process exits, it should be restarted automatically. The Windows service currently doesn't work this way - I'm not sure for Linux and Solaris.

3. Modify the service implementation and/or the various start/stop commands so that they interact correctly. This includes:

3a. If the OS service is started, and the user runs stop-local-instance or stop-domain, then the OS service should be stopped too.

3b. If the server is started using start-domain or start-local-instance, and the OS service is not started, the OS service should not be started. However, if the user then later starts the OS service, the OS service must recognize that the server is already running and monitor the already running server (See req. #2).

4. Any sequence of OS service commands and asadmin start/stop-domain or start/stop-local-instance commands must not cause a failure of the command. For example, currently, if you start the Windows OS service for a domain, and then run stop-domain, and then try to stop the Windows OS service, you get an error message from Windows. Also on Windows, if you run start-domain, and then start the OS service, you get an error message saying that "The process terminated unexpectedly." and the service isn't started.

5. The service should be able to be deleted using either the operating system command for deleting services or the asadmin delete-service command. So the following sequences should work:

asadmin create-service
OS delete service command
asadmin create-service
asadmin delete-service

Currently, the 2nd create-service in this sequence may require a --force option. Ideally, it shouldn't.

6. These requirements apply to the officially supported operating systems for GlassFish for Windows, Linux, and Solaris.



 Comments   
Comment by jclingan [ 04/Apr/11 ]

Good RFE write-up. One additional bullet:

6) Currently we have a feature gap on Linux, where no "watchdog" or "monitor" role is offered using our linux service template. Upstart is available on RHEL 6 and OL 6 (timeline for SuSE Enterprise Linux is TBD). We should investigate if an upstart job definition can be created to fulfill the watchdog role on supported OSs. As a heads-up, it looks like Fedora may move to systemd in the future, so watchdog approach may change in the future.

Comment by Byron Nevins [ 04/Apr/11 ]

Problems with #2
=================== problem #1 ================
Mainly the problem of getting into an infinite loop trying to start an unstable server. Windows allows you to set what to do –
You can set any of the allowed 3 occurrences to handle any of the 4 allowed responses.

1. First Failure
2. Second Failure
3. Subsequent Failures

1. restart the service
2. reboot
3. run some other program
4. ignore it

As you can see just figuring out how to allow users to configure these options and then implementing in 3 main platforms that are totally different is quite a task!

Of course – why is the server crashing? I have V2 running at home now for several years. It never crashes. Do we really want to do all of this complicated and expensive work for something that should be exceedingly rare?

======= problem #2 =========== (This caused many support problems with V2)
User kills the server forcefully (e.g. using "kill" or "taskkill.exe")
A moment later it is running again.
He scratches his head kills it again
A moment later it is running again.
"Hello. Customer support? ...."

Comment by Byron Nevins [ 04/Apr/11 ]

Comment from Bill Shannon about #3B

I agree about 3b. If you start the server without using the OS service
mechanism, you're probably not going to be able to get the service mechanism
to monitor that service later. Mostly this should be as expected. The key
is to get stopping and restarting to integrate properly with the service
mechanism. Possibly also start-instance. I'm not sure we want to make
start-local-instance or start-domain just be front-ends to the service
mechanism (if the server is configured to be handled by the service mechanism).

Comment by Bill Shannon [ 04/Apr/11 ]

Ok, apparently we're going to be using Jira as a discussion forum...

For problem #1, we should just pick some particular combination of
options and allow users to customize it using the OS-specific mechanisms.

For problem #2, this is the normal behavior for any service right?
If the user chooses to have the service mechanism manage his server,
this is what he should expect, and what he would get with any other
server managed by the service mechanism, right? For that matter,
this is what he would get with the old node agent - kill the server
instance and the node agent would restart it because it "crashed",
right?

Comment by Tom Mueller [ 05/Apr/11 ]

The changes to OS service integration should also implement those suggested in issue 11692.

Comment by Tom Mueller [ 05/Apr/11 ]

This issue should resolve issue 16140 also.

Comment by Byron Nevins [ 15/Apr/11 ]

One-Pager Added here:
http://wikis.sun.com/display/GlassFish/3.2PlatformServices

Comment by Byron Nevins [ 15/Apr/11 ]

Pasted the description here. Lines that start with **** are my comments.

1. Expose the asadmin delete-service interface as a public interface (rather than being hidden as it is currently).

        • Yes - Also add start|stop|list

2. Modify the service implementation so that it acts as a monitor on all operating systems. By monitor, this means that if the service is started, then if the GlassFish process exits, it should be restarted automatically. The Windows service currently doesn't work this way - I'm not sure for Linux and Solaris.

        • Yes. I will choose reasonable defaults for each platform – as long as the platform supports it.

3. Modify the service implementation and/or the various start/stop commands so that they interact correctly. This includes:

3a. If the OS service is started, and the user runs stop-local-instance or stop-domain, then the OS service should be stopped too.

        • Misunderstanding of what a Service is. GF-instance and the "service" are the same thing, sort of. Once the service has started - when you stop the server you have stopped the "service". Compare to a SMTP server. When you stop the SMTP server process you have also stopped the SMTP "service".

3b. If the server is started using start-domain or start-local-instance, and the OS service is not started, the OS service should not be started.
****It can't be started. GF won't allow the same server to be started twice.

However, if the user then later starts the OS service, the OS service must recognize that the server is already running and monitor the already running server (See req. #2).

        • Impossible - will not do.

4. Any sequence of OS service commands and asadmin start/stop-domain or start/stop-local-instance commands must not cause a failure of the command. For example, currently, if you start the Windows OS service for a domain, and then run stop-domain, and then try to stop the Windows OS service, you get an error message from Windows. Also on Windows, if you run start-domain, and then start the OS service, you get an error message saying that "The process terminated unexpectedly." and the service isn't started.

        • This is all expected and what we want!

5. The service should be able to be deleted using either the operating system command for deleting services or the asadmin delete-service command. So the following sequences should work:

asadmin create-service
OS delete service command
asadmin create-service
asadmin delete-service

Currently, the 2nd create-service in this sequence may require a --force option. Ideally, it shouldn't.

      • will do

6. These requirements apply to the officially supported operating systems for GlassFish for Windows, Linux, and Solaris.

      • indeed.
Comment by Byron Nevins [ 15/Apr/11 ]

Just thinking out loud here.

1) services actually call

asadmin start-XXX --verbose

2) if the server crashes – asadmin knows about it and is capable of restarting it itself without the platform doing anything at all!

3) if the server is stopped in an orderly way, asadmin knows this also and it can tell the difference from a crash.

==============

Comment by Byron Nevins [ 27/Apr/11 ]

Please see the One Pager:

http://wikis.sun.com/display/GlassFish/3.2PlatformServices

Note that I have added a feature – we will support services on all GlassFish-supported Platforms.

Comment by Byron Nevins [ 02/May/11 ]

THIS IS THE UMBRELLA ISSUE FOR IMPROVED PLATFORM SERVICES for 3.2

Comment by mkarg [ 28/Jul/11 ]

I want to recommend not having two different JVMs involved or using scripts at all on Windows to implement services.

See, on Windows, a real service is implementing an API defined by Microsoft, which lets Windows monitor the service on its own - there is no need for an additional Watchdog, as Windows is a service watchdog (it even comes with configurable rules what to do when the service fails and can restart it etc)! This is the most clean solution and it could be done very easily by just a few lines of JNA code within the GlassFish kernel.

See http://msdn.microsoft.com/en-us/library/ms685141(v=vs.85).aspx

Comment by mkarg [ 28/Jul/11 ]

I want to suggest that asadmin create-service provides a slightly changes configuration:

  • Since Windows 2008 there are special account types for services due to security reasons. I want to suggest that the service is not created to be run as the local SYSTEM account (= with highest possible access rights) but instead the installer should create a local service type account and register on the most essential access rights with that. In a productive system it is not appropriate to run as local SYSTEM account, and the administrator doesn't know what access rights GF will need, so he cannot change it.
  • Windows has a built-in watchdog facility. The configuration should be made up in a way that automatically restarts after first fail, runs some kind of domain repair at the second fail (if there at all is something that GF can repair), and restarts the host at the third fail. The failure counter should be reset after one week. This stuff already is there, so please use it.

Windows is Windows, not UNIX plus a GUI.

Comment by Byron Nevins [ 28/Jul/11 ]

"it could be done very easily by just a few lines of JNA code"

Please to provide these very easy few lines of code.

Comment by Bill Shannon [ 28/Jul/11 ]

Markus, this seems to be important to you, and you seem to know more about it
than most of us. Perhaps you'd be interested in implementing it and contributing
it to the GlassFish community? While we'd all like to see these sorts of
improvements, I doubt that we would do as good a job as you would.

Comment by mkarg [ 29/Jul/11 ]

Bill,

I would be happy to provide an implementation, but I have no strong GlassFish internals background, so I will need help with that. What I can provide is the complete JNA or C++ code for a "good and complete" "real" Windows service, but what I need it someone that will tell me (a) how to build GlassFish from scratch and (b) the few Java lines needed to issue a GF startup / shutdown / status-query. If we can organize this then I will be glad to provide the complete Windows part.

Regards
Markus

Comment by Bill Shannon [ 29/Jul/11 ]

Ideally you would just issue the "asadmin start-domain" command to
start the app server. Is there some reason you need to start it
"in process"? If so, you'll need to duplicate the environment setup
from asadmin.bat and then start a JVM using the same arguments that
asadmin.bat does.

In any event, Byron is the expert on starting GlassFish.

Also, hopefully, you wouldn't need to build GlassFish in order to do
this, but if you did there's build instructions on the wiki.

Comment by Byron Nevins [ 29/Jul/11 ]

I'm not sure what you mean by "the few Java lines needed to issue a GF startup / shutdown / status-query". Perhaps you mean this?

How to start, stop, check status of domain

java -jar "%GF_HOME%\modules\admin-cli.jar" start-domain
java -jar "%GF_HOME%\modules\admin-cli.jar" stop-domain
java -jar "%GF_HOME%\modules\admin-cli.jar" list-domains

How to start, stop, check status of instance

java -jar "%GF_HOME%\modules\admin-cli.jar" start-local-instance instance1
java -jar "%GF_HOME%\modules\admin-cli.jar" stop-local-instance instance1
java -jar "%GF_HOME%\modules\admin-cli.jar" list-instances --long

------------------
If you mean the source lines that run from the above calls – they are definitely not "few". There are thousands of lines required. They are located in admin/launcher, core/kernel, core/bootstrap, cluster/cli, cluster/admin and common/common-util (off the top of my head)

Comment by mkarg [ 30/Jul/11 ]

My idea is basing on in-process because it is the natural way on Windows to implement services, and it simplifies the complexity by far, as there is no more watchdog asadmin needed. GlassFish will just feel and behave as a native service, so no more scripts are involved. Windows admins don't like scripts, as Windows has a completely different architecture compared to UNIX. UNIX does everything in scripts, Windows does virtually nothin in scripts. So the target is, to get rid of scripts.

Thank you Byron for the hint. Actually I looked for the single entry points in Java source code that make the following happen (in pseudo code):

  • GlassFish.start!
  • GlassFish.stop!
  • (opt.) GlassFish.pause!
  • (opt.) GlassFish.resume!
  • GlassFish.state?

If that is not existing, I will inspect what the scripts do and repeat that in pure Java (hence my question about building from scratch; got that meanwhile using svn and mvn BTW). My target is to provide java source that is using / implementing the native Windows API that directory executes this commands in-process.

Comment by Tom Mueller [ 17/Oct/12 ]

Marking the fix version field as "future-release". This is based on an evaluation by John, Michael, and Tom WRT to the PRD for the Java EE 7 RI/SDK. This issues was deemed to not be a P1 for that release. If this is in error or there are other reasons why this RFE should be targeted for the Java EE 7 RI/SDK release, then change the fix version field back to an appropriate build.

Generated at Mon Jul 25 19:02:21 UTC 2016 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.