[GLASSFISH-18451] install-node-dcom does not function Created: 05/Mar/12  Updated: 26/Jun/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: jp2011 Assignee: Byron Nevins
Resolution: Unresolved Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows server 2008 R2 Sp1
Glassfish 3.1.2 Release


Tags: dcom

 Description   

I initially tried using a windows domain account to install the node and that didn't work as per GLASSFISH-18327. That issue was also incorrectly marked as resolved. Network captures show that the release version of Glassfish 3.1.2 still does not use the domain account, but attempts to use the local account.

After giving up with this, I created a new local account on the remote machine called glassfish. Granted full access to the 2 required registry keys and added the account to the administrators group. Attempting to install the remote node using this account still fails with the following message:

Successfully verified that the host, hostname, is not the local machine as required. Successfully resolved host name to: hostname/10.65.30.xxx Successfully connected to DCOM Port at port 135 on host hostname. Successfully connected to NetBIOS Session Service at port 139 on host hostname. Successfully connected to Windows Shares at port 445 on host hostname. The remote file, C: doesn't exist on hostname: Access is denied.

I performed a network capture and can tell you the following:

1. The user account is successfully authenticated with STATUS_SUCCESS (0x00000000)
2. SMB is attempting to access \\hostname\C$ no matter what I set the remote test directory to.
3. NT Status: STATUS_FS_DRIVER_REQUIRED (0xc000019c) is returned from the remote host but I suspect this is normal and used for dynamic library loading for the file system.

4. NT Status: STATUS_ACCESS_DENIED (0xc0000022) is returned on attempting to connect to \\hostname\C$

The documentation does not state any other prerequisite or permissions that need to be setup for this to function. What is missing?



 Comments   
Comment by Byron Nevins [ 06/Mar/12 ]

what is the exact command you're running?

Comment by jp2011 [ 06/Mar/12 ]

To make things even simpler, it is reproducible by the validate-dcom command alone.

Password file contains the following line: AS_ADMIN_WINDOWSPASSWORD=$

{ALIAS=glassfish-alias}

I have setup the alias already in asadmin as per the documentation.

c:\glassfish3\bin>asadmin --passwordfile passwordfile.txt validate-dcom -w glassfish remotehost
remote failure:
Successfully verified that the host, remotehost, is not the local machine as required.
Successfully resolved host name to: remotehost/10.65.30.187
Successfully connected to DCOM Port at port 135 on host remotehost.
Successfully connected to NetBIOS Session Service at port 139 on host remotehost
nc.
Successfully connected to Windows Shares at port 445 on host remotehost.
The remote file, C: doesn't exist on remotehost: Access is denied.

Command validate-dcom failed.

I can speak to the network capture I took as well, but that would be easier offline to this web portal.

Comment by Byron Nevins [ 06/Mar/12 ]

Can you access the c$ share from another computer – say

net use X: \\other\c$

?

Comment by Byron Nevins [ 06/Mar/12 ]

Please make sure theses items are setup correctly, especially the third one:

1. Server service is in the started state and is set to start automatically.
2. Remote Registry service is also in the started state and is set to start automatically.
3. Set the Local Policy for Network Access:Control Panel" > "Administrative Tools" -> "Local Security Policy"> "Local Policies" -> "Security Options" -> "Network Access: Sharing security model for local accounts" Make sure it is set to Classic

Comment by ljnelson [ 28/Mar/12 ]

I have exactly the same problem.

I installed and ran setup-local-dcom on the remote machine as an administrator. It claimed it ran successfully.

Then I made sure that your steps 1-3 above were taken. I had to manually start the remote registry service.

My remote machine is running Windows 7 Professional on a 64-bit machine with all updates installed.

Here is my command and output:

ljnelson$ asadmin --passwordfile ~/.glassfish.passwords --port=9048 validate-dcom --windowsuser lnelson --windowsdomain jenzabar --remotetestdir 'C:\crap' --verbose true 10.63.4.42
remote failure: 
Successfully verified that the host, 10.63.4.42, is not the local machine as required.
Successfully resolved host name to: /10.63.4.42
Successfully connected to DCOM Port at port 135 on host 10.63.4.42.
Successfully connected to NetBIOS Session Service at port 139 on host 10.63.4.42.
Successfully connected to Windows Shares at port 445 on host 10.63.4.42.
The remote file, C:\crap doesn't exist on 10.63.4.42 : The parameter is incorrect.

Command validate-dcom failed.

C:\crap is a directory present on the remote machine. I haven't set it up to be shared in any way, but I haven't done anything else to it, either. Any path supplied to --remotetestdir is considered to not exist. I've tried moving slashes around and doubling up backslashes in case it's a path issue; it's not.

Hope this data point helps.

Comment by lb54 [ 11/Apr/12 ]

Hi.
I have also this issue:
Win 2003 SP2 (Domain Admin Server, GF 3.1.1, updated to 3.1.2)
Win 2008 Server R2 Enterprise SP1 (node, formerly connected through SSH via cygwin)
User is authorized for both machines. DCOM is planned to replace the SSH-communication.

Message is from Web Console is:
Successfully verified that the host, myserver.host.xx, is not the local machine as required. Successfully resolved host name to: myserver.host.xx/<IP-Address> Successfully connected to DCOM Port at port 135 on host myserver.host.xx. Successfully connected to NetBIOS Session Service at port 139 on host gibson-10.tecis.hh. Successfully connected to Windows Shares at port 445 on host myserver.host.xx. The remote file, C: doesn't exist on myserver.host.xx : Logon failure: unknown user name or bad password.

The CLI also fails with:
remote failure: Command install-node-dcom failed.

com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure: unknown user name or bad password.
Command create-node-dcom failed.

Is there a way to "workaround" this or do I have to wait for an update?

Comment by jp2011 [ 11/Apr/12 ]

There has been no fix for this because the cause is still unknown to Oracle. The workaround is do not use DCOM. We have personally abandoned Windows as a platform for production/QA in favour of RHEL 5 Linux distro. SSH is built in, and the cluster runs a lot faster with less overhead. The downside is that you have to learn Linux commands. But really, is this that bad?

Comment by lb54 [ 13/Apr/12 ]

I agree with you.
BUT: Telling my company to use Linux Servers instead of Windows will not work, they don't want to hear that.
Using SSH Nodes on Windows System with cygwin seems to be an alternative. But I used Glassfish 3.1.1 with ssh (cygwin) already, the communication seems to be not very stable (long running startup processes and long loading "Clusters" page with the Web Console).

@Byron: Is there a plan for this bugfix so far?

Comment by mr_daemon [ 16/Apr/12 ]

I did some incredibly tedious debugging and was able to get it to work:

For the validate-dcom test to pass, since it seems to ignore the parameter for the test directory entirely and always use C:\ regardless, you must disable the new (vista+) policy that prevents users from elevating their privileges over the network by navigatinig to

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System

and creating a new DWORD named LocalAccountTokenFilterPolicy of 1. This will allow the delete-me.bat file to be created there.

However this then breaks again:

PS D:\private> d:\glassfish3\bin\asadmin.bat --passwordfile dcom-pw.txt validate-dcom -w glassfish -v=true qlsvrnode2
remote failure:
Successfully verified that the host, qlsvrnode2, is not the local machine as required.
Successfully resolved host name to: qlsvrnode2/192.168.9.11
Successfully connected to DCOM Port at port 135 on host qlsvrnode2.
Successfully connected to NetBIOS Session Service at port 139 on host qlsvrnode2.
Successfully connected to Windows Shares at port 445 on host qlsvrnode2.
Successfully accessed C: on qlsvrnode2 using DCOM.
Successfully wrote delete_me.bat to C: on qlsvrnode2 using DCOM.
Could not connect to WMI (Windows Management Interface) on qlsvrnode2. : Error setting up remote connection to WMI

This is not mentionned at all in the documentation, but turns out you also need to change ownership and set permissions to the following registry key, in addition to the ones already listed:

HKEY_CLASSES_ROOT\CLSID\{76A64158-CB41-11D1-8B02-00600806D9B6}

Once this is accomplished, everything works as advertised.

I am not fond of the security implications but at least it works and is at least more reliable than Cygwin+sshd.

Comment by Byron Nevins [ 19/Apr/12 ]

Thanks for the excellent comments and work everyone. I'll try and address this problem soon.

Comment by lb54 [ 26/Apr/12 ]

Hi Byron.
Are there any plans to release this fix so far? Or is the "hack" described above the official solution?

Thanks for info.

Best wishes.

Basti

Comment by lb54 [ 16/May/12 ]

Hi there.
It seems that no one is working on this ticket right now.
Is there a chance to get a fix for this in the near future?
Unfortunatly the "quick fix" described above does not work for me, so I need another workaround or this bug fixed.

Can anyone help me?

Thanks.

Basti

Comment by mtobler [ 25/Jun/12 ]

I have not been able to get this to work on a set of 2008 R2 Servers
which I am trying to cluster. Unfortunately I am unable to get the ssh
functionality to work as well which leaves me with no clustering
capability and wondering why we used Glassfish.
Is anyone going to work on this anytime soon?

I added the following to 18327 but am adding it here as requested:
asadmin> validate-dcom --passwordfile do-not-delete gf01
remote failure:
Successfully verified that the host, gf01, is not the local machine as required.
Successfully resolved host name to: gf01/172.18.11.169
Successfully connected to DCOM Port at port 135 on host gf01.
Successfully connected to NetBIOS Session Service at port 139 on host gf01.
Successfully connected to Windows Shares at port 445 on host gf01.
The remote file, C: doesn't exist on gf01 : Logon failure: unknown user name or bad password.

I am using a domain and the user is a domain user.

I have gone through every document I can find on this issue and have verified all settings/registry keys/etc are correct. I have tried this via asdamin and via the console and get the same result.

Comment by Byron Nevins [ 26/Jun/12 ]

Sorry I overlooked the activity on this issue. I'll try to look into it soon. mtobler – please document what you did/what happened etc. Are you using a Windows Domain?





[GLASSFISH-17983] Automate DCOM setup. Created: 13/Dec/11  Updated: 14/Dec/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b13
Fix Version/s: None

Type: Improvement Priority: Critical
Reporter: easarina Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2_review

 Description   

Current DCOM instruction includes a step that requires manual windows registry editing. I don't think, that it would be good to recommend customers to edit registry manually. So, it looks for me, that this step has to be automated.



 Comments   
Comment by sb110099 [ 14/Dec/11 ]

Upgrading the bug to P2, as it needs some evaluation and attention for 3.1.2 .
This manual step for DCOM support will need to be automated for customers from usability perspective.

-Sudipa





[GLASSFISH-18084] das.properties needs work Created: 23/Dec/11  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b15, 4.0_b16
Fix Version/s: future release

Type: Bug Priority: Critical
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-18094 Need Status if _create-instance-files... Resolved
Tags: 3_1_2-exclude

 Description   

Say I have this setup:

DAS is running on my laptop which is connected via VPN to a huge corporation. Let's say Oracle. My official hostname is "laptop"
The remote computer that will host instances is sitting at the Corporation behind the firewall.

Now I create a node and an instance on the remote computer using SSH or DCOM. That creates a file called

das.properties

Inside das.properties is the information to call back to DAS. In my case here the hostname is laptop. There are three problems:

(1) the hostname "laptop" is useless. There is NO WAY the remote machine can find its way to my laptop with that name. It would have to have an IP address.

(2) There was NO HANDSHAKE when the instance was created! The command should have failed. The user has no idea that there is no way for the remote to call DAS back ever.

(3) this also happens across domains. E.g. if the 2 machines have these names that are in DNS – it still won't work:
somehost.in.oracle.com
another.us.oracle.com

Why? The domain gets chopped off. If 'another' is the remote it will look for DAS at
somehost.us.oracle.com

(4) I ran it with a secure DAS – yet isSecure is set to false in DAS.properties.
(5) The protocol is set http. Shouldn't it be https?

WORKAROUND:
After das.properties is created, hand-edit the hostname to something that the remote machine can access.



 Comments   
Comment by kshitiz_saxena [ 04/Jan/12 ]

Instead of manual edit, set address for admin-listener in DAS :
asadmin set configs.config.server-config.network-config.network-listeners.network-listener.admin-listener.address=<YOUR IP ADDRESS>

This works.

Comment by Joe Di Pol [ 30/Jan/12 ]

Not a 3.1.2 stopper





[GLASSFISH-17327] Add devtests for update-node-ssh Created: 21/Sep/11  Updated: 09/Oct/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

There are no devtests for update-node-ssh. I don't know about the other commands but we ought to have tests for all of them.

I found this out the usual way. I ran update-node-ssh and discovered I had broken something. I assumed the SSH devtests was exercising the command. Not so.

create-node-ssh
delete-node-ssh
list-nodes-ssh
ping-node-ssh
setup-ssh
update-node-ssh






[GLASSFISH-17911] update-node-com error message refers to SSH Created: 06/Dec/11  Updated: 06/Mar/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: not determined

Type: Bug Priority: Major
Reporter: Paul Davies Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

An unsuccessful attempt to update a DCOM node displays an error message that refers to SSH:

asadmin> update-node-dcom -w hudson --nodehost  host.example.com xkyd
remote failure: Warning: some parameters appear to be invalid.
SSH node not updated. To force an update of the node with these parameters rerun
the command using the --force option.
com.sun.enterprise.universal.process.WindowsException: org.jinterop.dcom.common.
JIException: Access is denied, please check whether the [domain-username-password]
are correct. Also, if not already done please check the GETTING STARTED and
FAQ sections in readme.htm. They provide information on how to correctly configure
the Windows machine for DCOM access, so as to avoid such exceptions.  [0x00000005]
Command update-node-dcom failed.


 Comments   
Comment by Joe Di Pol [ 30/Jan/12 ]

Not a 3.1.2 stopper.

Comment by Tom Mueller [ 06/Mar/12 ]

Bulk update to change fix version to "not determined" for all issues still open but with a fix version for a released version.





[GLASSFISH-18707] delete-local-instance should NOT have a --domain parameter Created: 09/May/12  Updated: 19/Sep/14

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b38_ms2
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

delete-local-instance is NOT a true local command. It demands that the instance's DAS be running. If it is not running then the command fails and does nothing.

Therefore it is just busy-work for the user to specify the name of the domain since we KNOW which domain it is by the usual host and port args.

Currently this issue is only on das-branch



 Comments   
Comment by Tom Mueller [ 07/Feb/13 ]

Setting the target fix version to 4.0.1 since this is not needed for the Java EE 7 RI/SDK release.





[GLASSFISH-19057] Potential file descriptor leaks in SSHLauncher Created: 05/Sep/12  Updated: 05/Sep/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Joe Di Pol Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

From inspection it appears as though SSHLauncher has potential file descriptor leaks, especially with the use of SCPClient. This is mainly concerning the "connection" field which sometimes is closed via SSHUtil.unregister(), but other times does not appear to be.

For example, if a consumer of SSHLauncher calls getSCPClient(), there does not appear to be a way to ever close the connection opened by that method.

In general the management of connections in SSHLauncher seem rather haphazard, and should be looked at more closely.



 Comments   
Comment by Joe Di Pol [ 05/Sep/12 ]

One quick safety net may be to put a finalizer on SSHUtil to close all connections in activeConnections.

We may also want to put a close() method on SSHLauncher that closes all active connections in SSHUtil.





[GLASSFISH-18371] SSH: Do not allow running DAS on 4.0 and Remote Instances on 3.x Created: 15/Feb/12  Updated: 15/Apr/14

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b24
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-18366 create-dcom should detect GlassFish v... Resolved

 Description   

I have not checked but this is almost certainly the case. See 18366.

It is visible by nadmin not existing in a 3.x installation.
But the real problem is that we should forbid hetero-version clustering.



 Comments   
Comment by Byron Nevins [ 15/Feb/12 ]

the DCOM version of this issue





[GLASSFISH-18469] On the local and remote hosts was used the same user, but install-node-dcom failed without -w <user_name> Created: 07/Mar/12  Updated: 15/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b21
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

3.1.2 bui8ld 21. Window 2008 machines.

I believe that this is a regression issue. install-node-dcom failed, because was not used -w <user_name> option. See, for example:
asadmin install-node-dcom -W password1.txt bigapp-oblade-3
com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure:
unknown user name or bad password.
Command install-node-dcom failed.

But the command: "asadmin install-node-dcom -w aroot -W password1.txt bigapp-oblade-3" was executed successfully.
On the DAS machine and on the remote host was used the same user aroot, and according to the help: "The default is the user that is running this subcommand.". I.e. in my case aroot.






[GLASSFISH-18437] install-node --force is too slow Created: 01/Mar/12  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: future release
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Tom Mueller Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The install-node --force command takes too long to remove all of the files from the remote host. It prints a "Force removing file..." message for each file that it is removing, and it appears that maybe it is doing a separate "scp" command to remove each file. It appears that it is removing the entire installation (including the domains and nodes directories, which it should not do). If so, a "rm -r" would be better than removing each file individually.

I was tempted to write this as an RFE, but the current implementation is too slow to be useful.

I observed this with an SSH node.






[GLASSFISH-18209] Endless SSH Network Timeout Created: 19/Jan/12  Updated: 14/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b19
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-18185 'update-node-ssh' command hangs when ... Open

 Description   

Endlessly long wait to connect to SSH – when there is no SSH daemon running.

SSHLauncher.java:

{{
private void openConnection() throws IOException

{ boolean isAuthenticated = false; String message= ""; connection = new Connection(host, port); connection.connect(new HostVerifier(knownHostsDatabase)); }}

the connection.connect() call is endless.

While looking at 18185 i saw this as follows:

The remote system is Windows. It is a DCOM node and update-node-ssh is called on it. THe above method tries to connect to an sshd at host:135

It seems to take "forever". I quit waiting and forcibly killed GF to get out of the state.

Recommendation – for a "ssh-ping" 5 or 10 seconds should be plenty.






[GLASSFISH-18206] Endless Network Timeout Created: 18/Jan/12  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17, 4.0_b19
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

GF Admin clustering tasks are far too subject to ridiculously long network timneouts. E.g. in this bug we wait a full 10 minutes to get output from "asadmin version".

How to reproduce:

0. Remote windows box has glassfish installed.
1. validate-dcom works fine from (different) DAS machine
2. Sabotage asadmin.bat on the remote machine so that it hangs. Easy! See [1] below
3. Note how the command hangs for a very very long time.

[1] Add this to asadmin.bat's java call
-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1234

=========================
How did I find this out? Adding the line in[1] is an essential trick for debugging remote GF calls. I forgot about it and left it in.






[GLASSFISH-18185] 'update-node-ssh' command hangs when ssh port is not provided. Created: 13/Jan/12  Updated: 25/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b17.zip


Attachments: JPEG File dcom-to-ssh-error.JPG     Text File server.log.txt    
Issue Links:
Related
is related to GLASSFISH-18209 Endless SSH Network Timeout Open
Tags: 312_gui_new, 312_qa, 3_1_2-exclude, 3_1_2-release-note-added, 3_1_2-release-notes

 Description   

Currently it's not possible to convert a DCOM node to an SSH node in Admin Console. For an existing DCOM node, when user changes to SSH (and selects password authentication in my case), and hits Save, the long running process popup is there for a long time, about 15 minutes. Then the following error is displayed:

An error has occurred
Check server log for more information.

The server.log file contains:

[#|2012-01-12T17:46:55.953-0800|WARNING|glassfish3.1.2|javax.enterprise.system.t
ools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=97;_ThreadName=Thread-2
;|Could not connect to host jed-asqe-43 using SSH.: There was a problem while co
nnecting to jed-asqe-43:135: Operation interrupted: host=jed-asqe-43 port=135 us
er=j2eetest password=<concealed> keyFile=/export/home/j2eetest/.ssh/id_rsa keyPa
ssPhrase=<concealed> authType=null knownHostFile=/export/home/j2eetest/.ssh/know
n_hosts|#]

[#|2012-01-12T17:46:55.953-0800|SEVERE|glassfish3.1.2|org.glassfish.admingui|_Th
readID=98;_ThreadName=Thread-2;|java.io.InterruptedIOException: Operation interr
upted;
java.io.InterruptedIOException: Operation interrupted;
restRequest: endpoint=https://localhost:4848/management/domain/nodes/node/jedy/u
pdate-node-ssh
attrs={sshpassword=*******, installdir=C:\as\dcomtest\glassfish3, nodehost=jed-a
sqe-43, sshuser=${user.name}}
method=POST|#]

The conversion works in CLI with the following command:

asadmin update-node-ssh --nodehost <host name> --sshport 22 --sshuser <user name> <node name>

However, if I execute the following in CLI, to convert DCOM to SSH node, it also hangs and then fails:

asadmin update-node-ssh --nodehost <host name> <node name>

Thus my guess is that Admin Console does not pass along the other two options (that should not be required). Assigning to Admin Console to verify and pass on to CLI, if this is the case. I understand this will most likely not get fixed for this release.



 Comments   
Comment by Anissa Lam [ 13/Jan/12 ]

As shown in the log Lidia pasted, the REST request sent is:

restRequest: endpoint=https://localhost:4848/management/domain/nodes/node/jedy/update-node-ssh
attrs={sshpassword=*******, installdir=C:\as\dcomtest\glassfish3, nodehost=jed-asqe-43, sshuser=${user.name}}
method=POST|#]
so, console is sending in all the info that user enters. At that time, Lidia probably didn't set the ssh-port. I verified that if sshport is specified, it will be sent in as well. I believed console is doing the correct thing.

Transfer to backend for evaluation.

Comment by lidiam [ 13/Jan/12 ]

That's correct, I did not set the ssh port since the default is always set to 22 and there was no need to change that.

Interesting thing to note is that if I create a new SSH node ssh port field is prepopulated to 22. If I choose to convert CONFIG node in Admin Console, ssh port is populated with 22, however, when I choose to convert DCOM node to SSH, ssh port field is not populated, so there is an inconsistency here. In fact if I enter ssh port in Admin Console when converting DCOM node to SSH, it works fine - we can document it as a workaround.

There are two issues here then:

1. ssh port is not populated when switching from DCOM to SSH node.
2. update-node-ssh command hangs when ssh port is not provided.

Comment by Anissa Lam [ 13/Jan/12 ]

I will file a separate P4 bug about sshport not populated when converting from DCOM to SSH.

I changed the summary of this bug to correctly reflect the issue.

This affects both CLI and GUI. I don't know how often user will convert DCOM node to SSH node, but if we decide to release note this, the work around is

  • ensure 'sshport' param is specified if using CLI
  • fill in port number (default is 22) when doing that in GUI.
Comment by Byron Nevins [ 18/Jan/12 ]

The problem is that a DCOM node has the "sshport" set to 135. When you run update-node-ssh it can't tell apart these 2 scenarios:

1) You're running the command on an existing ssh node that happens to use port 135 instead of 2
2) You're converting from a DCOM node that always has 135 set as the port number.

============

This is a gray area. Technically the software did precisely what you asked it to do – it updated the node config with the data you gave it and only the data you gave it. That's how it was designed.

It should be fixed in 4.0.

Recommended Fix

DCOM shouldn't bother with the port setting of 135. That's in a much lower abstraction - we never have to deal with the port number. DCOM should simply never use the sshport field.





[GLASSFISH-18083] Help can be wrong Created: 23/Dec/11  Updated: 14/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.1, 4.0_b01
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-17421 DAS sometimes uses wrong DAS hostname... Open
Tags: 3_1_2-exclude

 Description   

I ran create-instance from a remote computer using a "config" node (not DCOM or SSH!)

The instance goo1 was registered with the DAS. You must run the following command on the instance host to complete the instance creation:
bin/asadmin --host WNEVINS-LAP --port 4848 create-local-instance --node mynode goo1
Command create-instance executed successfully.

============

Unfortunately "wnevins-lap" is greek to the remote computer. It could find DAS via IP address but there is ZERO chance of finding it with that name.

Scenario DAS is on a laptop connected to OWAN via VPN
remote is hard-wired on OWAN

recommend changing or supplanting the host address.



 Comments   
Comment by Joe Di Pol [ 29/Dec/11 ]

The value for the --host option is returned by:

Server.getAdminHost()

It appears as though in the submitters case this method is not returning a valid hostname for their Windows latop OS configuration. For Windows machines configured to operate as a server (static IP and hostname assigned), this usually works fine.

This problem is related to issue GLASSFISH-17421. The work-around is to configure the IP address explicitly for the admin adaptor on the DAS (instead of using 0.0.0.0).

Will not address in 3.1.2

Comment by gfuser9999 [ 03/Apr/12 ]

Actually the admin-listener is the one that determines
the contact name of the DAS. So all the
information is there since the admin-listener
permit one to set the "server-name"
Unfortunately even if this is set, the code
does not take/make use of this.

In the same token, when accessing the say
http://das.foo.com:4848 it will issue a redirect
and this does a redirect without FQDN
and goes to "https://das:4848" IF HTTP/1.0
protocol is forces. (same issue here
server-name not made use of)





[GLASSFISH-18121] Intermittent: cannot start instance on a remote node: server requires a password Created: 05/Jan/12  Updated: 10/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: lidiam
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b16.zip, DAS on solaris, node on WinXP


Attachments: Text File server.log     Text File server.log.clusteredinstance.txt    
Tags: 312_gui_new, 312_qa, 3_1_2-exclude

 Description   

This is an intermittent issue but it happened twice already. I create a standalone instance on a DCOM node and cannot start it with the following in the server.log of the instance:

[#|2012-01-04T15:44:32.265-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.to
ols.admin.com.sun.enterprise.container.common|_ThreadID=10;_ThreadName=Thread-2;

The server requires a valid admin password to be set before it can start. Pleas
e set a password using the change-admin-password command.
#]

I have a cluster with two instances on the same node running fine. I also created another standalone instance and it started fine. I'm accessing Admin Console on solaris from the windows box, so secure admin and domain password are both set. I'll attach server.log.



 Comments   
Comment by lidiam [ 05/Jan/12 ]

I just got this error for the 3rd time but this time with an SSH node. I had a cluster with two instances, one on localhost one on an ssh node (solaris). The cluster was running fine. I stopped it and added another instance on the ssh node. When I tried to start cluster the newly added server instance failed to start with the following error in Admin Console:

Command succeeded with Warning
clt2: Could not start instance clt2 on node tuppy (tuppy). Command failed on node tuppy (tuppy): Warning: Synchronization with DAS failed, continuing startup... Waiting for clt2 to start .....................................Command start-local-instance failed. Error starting instance clt2. The server exited prematurely with exit code 0. Before it died, it produced the following output: Launching GlassFish on Felix platform Jan 4, 2012 10:31:54 PM com.sun.common.util.logging.LoggingConfigImpl copyLoggingPropertiesFile WARNING: Logging.properties file not found, creating new file using DAS logging.properties [#|2012-01-04T22:31:55.037-0800|INFO|glassfish3.1.2|com.sun.enterprise.server.logging.GFFileHandler|_ThreadID=1;_ThreadName=main;|Running GlassFish Version: GlassFish Server Open Source Edition 3.1.2-b16 (build 16)|#] [#|2012-01-04T22:31:57.647-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.transaction.com.sun.jts.CosTransactions|_ThreadID=10;_ThreadName=main;|JTS5014: Reco .... msg.seeServerLog

Instance's server.log contained the same error and exception:

[#|2012-01-04T22:31:59.029-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.to
ols.admin.com.sun.enterprise.container.common|_ThreadID=10;_ThreadName=Thread-2;

The server requires a valid admin password to be set before it can start. Pleas
e set a password using the change-admin-password command.
#]

[#|2012-01-04T22:31:59.031-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.co
re.com.sun.enterprise.v3.services.impl|_ThreadID=10;_ThreadName=Thread-2;|Unable
to start v3. Closing all ports
org.jvnet.hk2.component.ComponentException: injection failed on com.sun.enterpri
se.v3.admin.AdminAdapter.authenticator with interface org.glassfish.internal.api
.AdminAccessController
at org.jvnet.hk2.component.InjectionManager.error_injectionException(Inj
ectionManager.java:284)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java
:165)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java
:93)
at com.sun.hk2.component.AbstractCreatorImpl.inject(AbstractCreatorImpl.
java:126)
at com.sun.hk2.component.ConstructorCreator.initialize(ConstructorCreato
r.java:91)

Comment by lidiam [ 05/Jan/12 ]

Attaching log file for clustered instance on an ssh node that fails to start.

Comment by Byron Nevins [ 09/Jan/12 ]

This will take more time to hunt-down than is available before HCF for 3.1.2.

Comment by Byron Nevins [ 10/Jan/12 ]

I can't reproduce this. Please do the following. We need to make sure it's a Dcom issue.

When it happens again simply run start-local-instance directly on the remote machine. Use the --verbose option

What does it say?





[GLASSFISH-17311] SSH - Junk Processes Can Pile Up Created: 16/Sep/11  Updated: 16/Nov/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

CreateRemoteNodeCommand (which until recently was CreateSshNodeCommand) creates an asadmin process, and sets a (huge!) timeout.

ProcessManager does not kill the process. The caller needs to do that. But CreateRemoteNodeCommand does NOT kill the hung process. If you start something that hangs then the process will live on forever – or at least until the next reboot.

Solution:

Easy! Don't just catch ProcessManagerException. Also catch ProcessManagerTimeoutException – this tells you that it timed out. Now destroy the spawned process.

– I'd just add the change but I'm not 100% positive if you had some reason for letting the process run on forever???



 Comments   
Comment by Byron Nevins [ 16/Sep/11 ]

Note that this is easy to reproduce – simply call

asadmin create-node-ssh

on a Windows machine that has no SSH daemon

Comment by Byron Nevins [ 03/Nov/11 ]

Note that the issue only occurs when the --install option is given.

Comment by Byron Nevins [ 03/Nov/11 ]

ProcessManager definitely calls destroy() on the Process object before throwing a timeout exception.

I think I originally saw this problem when I was killing asadmin – the timeout is a full 5 minutes after all! There is no code that
kills launched processes when the caller is abruptly killed. On WIndows.

To fix this well would require a shutdown hook.

Comment by Byron Nevins [ 03/Nov/11 ]

This may be obscure enough and difficult enough to drop to P4. I.e. the payoff isn't worth the effort, IMO.





[GLASSFISH-17739] create-instance fails when DAS on Linux, instance on Windows and using --nodedir Created: 15/Nov/11  Updated: 30/Nov/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.1_b12
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: bthalmayr Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2 64bit, Java(TM) SE Runtime Environment (build 1.7.0_01-b08), cygwin



 Description   

First of all 'ssh' and 'scp' works fine using public key auth from 'DAS' server to 'node' server. ('DAS' receides on RHEL, 'node' on Windows 2008 R2 with cygwin sshd).

'asadmin create-node-ssh' works fine...
asadmin --user <user> --passwordfile=<pwd-file> --port <port> create-node-ssh --nodehost <nodehost> --installdir c:/sun/glassfish3 --nodedir c:/sun/glassfish3/nodes --sshuser <runtime-user> --sshkeyfile <keyfile-for-runtime-user> --install=false <node-name>

'asadmin create-instance' fails ...

asadmin --user <user> --passwordfile=<pwd-file> --port <port> create-instance --node <node-name> --config <config> <instance-name>
Successfully created instance <instance-name> in the DAS configuration, but failed to create the instance files on node <node-name> (<node-fqdn>).

Command failed on node <node-name> (<node-fqdn>): cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Node directory c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes does not exist or is not a directory
Command _create-instance-filesystem failed.

admin-logger on 'DAS' shows...
..

[#|2011-11-15T18:18:07.167+0100|FINE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.connect.NodeRunner;MethodName=trace;|NodeRunner: Running command on <node-fqdn>: c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _validate-das-options --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>|#]

...

[#|2011-11-15T18:18:07.170+0100|FINER|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.launcher.SSHLauncher;MethodName=runCommand;|Running command c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port>_validate-das-options --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name> on host: <node-fqdn>|#]

...

[#|2011-11-15T18:18:08.534+0100|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;|cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Command _validate-das-options executed successfully.|#]

...

[#|2011-11-15T18:18:08.538+0100|FINE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.connect.NodeRunner;MethodName=trace;|NodeRunner: Running command on <node-fqdn>: c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>|#]

...

[#|2011-11-15T18:18:08.540+0100|FINER|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.launcher.SSHLauncher;MethodName=runCommand;|Running command c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port 4849 _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name> on host: <node-fqdn>|#]

...

[#|2011-11-15T18:18:09.915+0100|WARNING|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;|Successfully created instance <instance-name> in the DAS configuration, but failed to create the instance files on node <node-name> (<node-fqdn>).: Command ' c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>' failed on node <node-name> (<node-fqdn>): cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Node directory c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes does not exist or is not a directory^M

of course the directory 'c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes' does not exist.

It seems that the value for '-installdir' and '-nodedir' are concatenated



 Comments   
Comment by bthalmayr [ 15/Nov/11 ]

using 'cygwin'-style pathnames 'create-instance' fails as well, but different error.

'DAS' log shows ...

[#|2011-11-15T18:58:14.418+0100|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=4735;_ThreadName=Thread-2;|Using DAS host <das-fqdn> and port <das-port> from existing das.properties for node
<node-name>. To use a different DAS, create a new node using create-node-ssh or
create-node-config. Create the instance with the new node and correct
host and port:
asadmin --host das_host --port das_port create-local-instance --node node_name instance_name.
Command _create-instance-filesystem executed successfully.|#]

[#|2011-11-15T18:58:15.323+0100|SEVERE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=4735;_ThreadName=Thread-2;|Successfully created instance <instance-name> in the DAS configuration, but failed to install configuration files for the instance on node <node-fqdn> during bootstrap.

SSH configuration information

Additional failure info: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/gf-wnode1/.
com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$BootstrapException: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/<node-name>/
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.bootstrapInstance(SecureAdminBootstrapHelper.java:172)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.bootstrapSecureAdminRemotely(CreateInstanceCommand.java:337)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.createInstanceFilesystem(CreateInstanceCommand.java:432)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.execute(CreateInstanceCommand.java:239)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$1.execute(CommandRunnerImpl.java:355)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:370)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:1045)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.access$1200(CommandRunnerImpl.java:96)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1244)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1232)
at com.sun.enterprise.v3.admin.AdminAdapter.doCommand(AdminAdapter.java:459)
at com.sun.enterprise.v3.admin.AdminAdapter.service(AdminAdapter.java:209)
at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:168)
at com.sun.enterprise.v3.server.HK2Dispatcher.dispath(HK2Dispatcher.java:117)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:238)
at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828)
at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725)
at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019)
at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225)
at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/<node-name>/
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$RemoteHelper.mkdirs(SecureAdminBootstrapHelper.java:268)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.mkdirs(SecureAdminBootstrapHelper.java:178)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.bootstrapInstance(SecureAdminBootstrapHelper.java:168)
... 28 more
Caused by: com.trilead.ssh2.SFTPException: No such file (SSH_FX_NO_SUCH_FILE: A reference was made to a file which does not exist.)
at com.trilead.ssh2.SFTPv3Client.statBoth(SFTPv3Client.java:441)
at com.trilead.ssh2.SFTPv3Client.lstat(SFTPv3Client.java:471)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$RemoteHelper.mkdirs(SecureAdminBootstrapHelper.java:266)
... 30 more

#]

'/cygdrive/c/sun/glassfish3/nodes/<node-name>/' does not exist on the 'Windows' server

unsing 'ls -ld /cygdrive/c/sun/glassfish3/nodes' shows it is owned by the 'runtime-user' which is/should be used by GlassFish

Comment by Byron Nevins [ 16/Nov/11 ]

SSH issue

Comment by Joe Di Pol [ 29/Nov/11 ]

Running instances and the DAS on systems with different OS types is not supported. That may be contributing to the problem. That said we should investigate this to see what is going on. One workaround to try is to not specify the nodedir at all – it will default to the nodes directory under the installdir.

I'm also surprised using the cygwin posix path did not work.

I'm lower the priority because this is technically not a supported configuration.

Comment by bthalmayr [ 29/Nov/11 ]

I don't think it matters if the DAS receides on a different OS or not. Cygwin (or MKS) have to be used anyway in Windows environment.

Every sample I've seen so far (on the numerous wikis) does not specify --nodedir option. Has it really been tested?

I can confirm that using Cygin-style path and not specifying --nodedir option works.

BTW could you please point me to the location in the docs where it's mentioned that running the servers on different OSes is not a supported configuration?

Comment by Joe Di Pol [ 29/Nov/11 ]

The Deployment Planning Guide at http://docs.oracle.com/cd/E18930_01/html/821-2419/abfay.html#abfbc
has this note:

"Note - All hosts in a cluster on which the DAS and GlassFish Server instances are running must have the same operating system."

And yes we have tests that run with nodedir so it does work, at least in the scenarios we are testing.

I'll investigate this further.

Comment by Joe Di Pol [ 30/Nov/11 ]

What is happening is that the DAS (running on unix) interprets the DOS style nodedir path, "c:\sun\glassfish3\nodes", as a relative path. So it prepends the installdir to it and ends up with a bogus path.

Using the cygwin path works around this problem, but then fails because (I think) the DAS uses the scp or sftp client to copy over some data to the instance, and scp/sftp may not result in a cygwin shell at the DOS end and therefore the cygwin path is not understood.

Running the DAS on the same OS as the instance avoids these problems, and that's why that is the supported configuration.

We could use a heuristic to detect DOS paths when running on Unix, but that would be a bit fragile.

In any case, this is a lower priority bug since it is essentially an unsupported configuration.





[GLASSFISH-18441] install-node doesn't handle bin/pkg and bin/updatetool symlinks Created: 01/Mar/12  Updated: 01/Mar/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Joe Di Pol Assignee: Yamini K B
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to UPDATECENTER2-2212 Remove use of symlinks Open

 Description   

On unix platforms the following files are symlinks (after update center bootstrap):

glassfish3/bin/pkg
glassfish3/bin/updatetool

If you then do an install-node these symlinks are replaced with the actual files in ../pkg/bin/pkg/ on the remote system. That means any relative paths referred to in the scripts are not found. For example:

To reproduce:

  • Unzip glassfish.zip
  • Run bin/pkg to install the pkg packages
  • Run install-node to install GlassFish on another system
  • On the other system try to run glassfish3/bin/pkg and you'll get an error:
    $ glassfish3/bin/pkg
    cd glassfish3
    bin/pkg[228]: /var/tmp/dipol/glassfish3/bin/../python2.4-minimal/bin/python: 
        not found [No such file or directory]
    

The work-around is to use the actual scripts in glassfish3/pkg/bin:

cd glassfish3
pkg/bin/pkg list
...

You can also repair the broken file by runing "fix" on the pkg packages:

cd glassfish3
pkg/bin/pkg fix pkg

This will complain about a bunch of permission stuff, but will in the end repair the file.



 Comments   
Comment by Joe Di Pol [ 01/Mar/12 ]

One option to fix this is to change UC to not use symlinks.





[GLASSFISH-18182] Error message too long, hard to read in Admin Console Created: 13/Jan/12  Updated: 13/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: lidiam Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

build ogs-3.1.2-b17.zip


Attachments: JPEG File node-create-error.JPG    
Tags: 312_qa

 Description   

Currently when user tries to install a node on a remote host to a directory where glassfish is already installed the following is printed in Admin Console:

An error has occurred
Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed. Ignoring unrecognized element schedules at Line number = 57 Column number = 18 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3425 Ignoring unrecognized element backup-configs at Line number = 62 Column number = 23 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3634 The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

It is hard to see the actual cause of the problem. We should, 1. in the least print the last sentence on a line by itself but 2. ideally not include the information in between. Hence it would be:

1.
An error has occurred
Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed. Ignoring unrecognized element schedules at Line number = 57 Column number = 18 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3425 Ignoring unrecognized element backup-configs at Line number = 62 Column number = 23 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3634

The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

2.
An error has occurred

Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed.

The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

I'm attaching screenshot of the error message. I understand it is too late to fix this issue for this release, hence logging as minor. This issue surfaced after fixing http://java.net/jira/browse/GLASSFISH-18037.






[GLASSFISH-17947] Add Copious Output Text with How-To-Config-Windows Info Created: 09/Dec/11  Updated: 18/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b13, 4.0_b14
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Byron Nevins Assignee: Paul Davies
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Windows Config is difficult and DOC should be easy to find.

Paul - what do you think? Details upon failed running of the command? Or details with the --help.
Both?

Let's discuss. Perhaps I should spit out some doc when the command fails. Can you spruce up what I've documented?






[GLASSFISH-18078] glassfish doesn't detect version differences when running remote commands via SSH Created: 22/Dec/11  Updated: 22/Dec/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2, 4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Tom Mueller Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If a 4.0 DAS tries to create an instance on a node where 3.1.2 is installed, it will try to use "nadmin" rather than "asadmin" to run the _create-instance-filesystem command. This of course will fail because 3.1.2 doesn't have nadmin. The error message is:

Command failed on node node2 (asqe-sblade-2): bash: /home/hudson/workspace/Cluster/glassfish3/glassfish/lib/nadmin: No such file or directory
Command create-instance completed with warnings.

It would be better if this error message indicated that the software version on the node is not supported.

It might also be better if this was detected when the node is created rather than when an instance is created.

There might also be other compatibility constraints even within the 3.1.x line. Is it supported to have a 3.1.2 DAS with 3.1.1 or 3.1 instances? If it isn't, this should be detected.






Generated at Mon Jul 06 05:28:50 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.