[GLASSFISH-19718] Cluster management related modules are active or resolved after DAS startup with no clusters/instances Created: 22/Feb/13  Updated: 06/Mar/13  Resolved: 06/Mar/13

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b77
Fix Version/s: 4.0

Type: Bug Priority: Critical
Reporter: Tom Mueller Assignee: jwells
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: devx_web

 Description   

The following modules are in the given OSGi state after a simple asadmin start-domain. These should all be in the Installed state:

Cluster SSH Provisioning : Resolved
GMS Module : Active
cluster-admin : Active
cluster-common : Resolved
j-interop repackaged as a module : Resolved
shoal-gms-impl : Resolved
trilead-ssh2 repackaged as a module : Resolved

This is contributing to additional startup time and memory footprint for the developer scenario performance benchmark.

This issue must be fixed for the 4.0 release.

What typically causes this type of issue is that there is some other active module that has a reference to one of these modules, such as an @Inject of a service. To fix the problem, the @Inject needs to be changed so that the reference is deferred until the cluster management feature is actually used.

The challenge in fixing this issue is to figure out what other modules have the references to these modules.



 Comments   
Comment by Byron Nevins [ 26/Feb/13 ]

How much additional time?
How much additional footprint?

Comment by Tom Mueller [ 27/Feb/13 ]

Non-zero time and non-zero footprint .

Seriously, modules that are active or resolved are read into memory, while those that are installed are not. Given that the class files in these modules add up to over 6 MB, I expect the footprint difference to be at least that. Currently the footprint regression is about 40 MB, to 6MB is a big part of that.

I have not measured the time to activate these modules.

Comment by Byron Nevins [ 01/Mar/13 ]

What packages are importing ssh ?
26|Resolved | 1|file:/Users/wnevins/glassfish4/glassfish/modules/cluster-ssh.jar
242|Resolved | 1|file:/Users/wnevins/glassfish4/glassfish/modules/trilead-ssh2-repackaged.jar
~/tmp/19718> asadmin osgi inspect p c 26
org.glassfish.main.cluster.ssh [26] exports packages:
-----------------------------------------------------
org.glassfish.cluster.ssh.connect; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]
org.glassfish.cluster.ssh.launcher; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]
org.glassfish.cluster.ssh.sftp; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]
org.glassfish.cluster.ssh.util; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]

Comment by Byron Nevins [ 01/Mar/13 ]

org.glassfish.main.external.trilead-ssh2-repackaged [242] exports packages:
---------------------------------------------------------------------------
com.trilead.ssh2.crypto.digest; version=0.0.0 UNUSED
com.trilead.ssh2; version=0.0.0 imported by:
org.glassfish.main.cluster.ssh [26]
org.glassfish.main.cluster.admin [24]
com.trilead.ssh2.crypto.dh; version=0.0.0 UNUSED
com.trilead.ssh2.auth; version=0.0.0 UNUSED
com.trilead.ssh2.channel; version=0.0.0 UNUSED
com.trilead.ssh2.log; version=0.0.0 UNUSED
com.trilead.ssh2.packets; version=0.0.0 UNUSED
com.trilead.ssh2.crypto; version=0.0.0 UNUSED
com.trilead.ssh2.signature; version=0.0.0 UNUSED
com.trilead.ssh2.util; version=0.0.0 UNUSED
com.trilead.ssh2.transport; version=0.0.0 UNUSED
com.trilead.ssh2.crypto.cipher; version=0.0.0 UNUSED
com.trilead.ssh2.sftp; version=0.0.0 UNUSED

Comment by Byron Nevins [ 01/Mar/13 ]

GMS –

It's imported by 4 non-cluster modules
~/tmp/19718> asadmin osgi inspect p c 105
org.glassfish.main.cluster.gms-bootstrap [105] exports packages:
----------------------------------------------------------------
org.glassfish.gms.bootstrap; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]
org.glassfish.main.cluster.gms-adapter [104]
org.glassfish.main.ejb.ejb-container [72]
org.glassfish.main.transaction.jts [185]
org.glassfish.main.web.ha [257]
org.glassfish.main.orb.iiop [200]

Comment by Byron Nevins [ 01/Mar/13 ]

GMS –

It's imported by 4 non-cluster modules
~/tmp/19718> asadmin osgi inspect p c 105
org.glassfish.main.cluster.gms-bootstrap [105] exports packages:
----------------------------------------------------------------
org.glassfish.gms.bootstrap; version=0.0.0 imported by:
org.glassfish.main.cluster.admin [24]
org.glassfish.main.cluster.gms-adapter [104]
org.glassfish.main.ejb.ejb-container [72]
org.glassfish.main.transaction.jts [185]
org.glassfish.main.web.ha [257]
org.glassfish.main.orb.iiop [200]

Comment by Byron Nevins [ 01/Mar/13 ]

Handy tools:

asadmin osgi lb -l # gets a list of the bundle-id ::::: module name mappings
a
sadmin osgi inspect p c <bundle id> #This tells packages exported by a bundle and what is using those packages.

asadmin osgi inspect p r <bundle id> #this tells packages imported by a bundle and from where.

Comment by Tom Mueller [ 01/Mar/13 ]

I suspect that orb.iiop is being brought in by the config modularity feature (see issue GLASSFISH-19719).

Of the rest, since cluster-admin is Active, it is probably the one that is causing the rest of them to be Resolved. So we need to figure out why cluster-admin is Active.

Comment by Byron Nevins [ 05/Mar/13 ]

I give up. Sahoo or Richard Hall perhaps?

Here is what I did:

I added an osgi.bundle file to cluster-admin and told it to export NOTHING.

===========
~/dev/cl/nucleus/core/kernel> asadmin osgi inspect p c 24
org.glassfish.main.cluster.admin [24] exports packages:
-------------------------------------------------------
Nothing
===========

~/dev/cl/nucleus/core/kernel> asadmin osgi lb -l | grep 24
24|Active | 1|file:/Users/wnevins/glassfish4/glassfish/modules/cluster-admin.jar

==========
Then I checked through nucleus for every pom.xml that lists it as a dependency:
~/dev/cl/nucleus> tg -f -e pom.xml cluster-admin
/Users/wnevins/dev/cl/nucleus/cluster/admin/pom.xml
/Users/wnevins/dev/cl/nucleus/cluster/admin-l10n/pom.xml
/Users/wnevins/dev/cl/nucleus/packager/nucleus-cluster/pom.xml
/Users/wnevins/dev/cl/nucleus/packager/nucleus-cluster-l10n/pom.xml

====================

Now I deleted cluster-admin.jar from modules directory. Started the server. NO PROBLEM!
=======================

In summary -

The module exports nothing.
No other modules use anything in the module
You can delete the module with no ill effects!!!

Otherwise it ALWAYS shows up as active.

My final attempt was to literally sit in the debugger and go through every registered OSGi module. cluster-admin NEVER showed up!

I added this simple method to MonitoringBootstrap.java, the debugged with suspend=y. The breakpoint never hit! It was called by the ModuleLifecycleListener methods (all of them). And also by discoverProbeProviders() which goes through all already-registered modules.

private void check(Module m) {
if(m == null)
return;

String name = m.getName();

if(name.toUpperCase().indexOf("cluster") >= 0)

{ // set a breakpoint here... System.out.println("XXXXX"); }

}

Sahoo Says:

Good question. There is an additional service angle to it. [24] must be
having an HK2 service which is active. When an HK2 service gets
activated, we first activate the OSGi bundle corresponding to that
service. Before [24] can be activated, it first has to be resolved.
Since it depends on [26] and [242], they also get resolved (note - the
dependencies only get resolved).

So, the real trigger comes from [24]. Perhaps it contains a command
that's getting executed or a startup service or something of that sort.

Thanks,
Sahoo

Comment by Byron Nevins [ 05/Mar/13 ]

I spent a lot of time on this with no progress. We need an OSGi expert to look at it.

Comment by TangYong [ 05/Mar/13 ]

I have done a hard invesigation about the issue and found the true reason as following:

While starting domain, there is a command called "UptimeCommand"[1] which will be executed by kernel.

[1]: com.sun.enterprise.v3.admin.UptimeCommand

While executing the command, com.sun.enterprise.v3.admin.CommandRunnerImpl will execute the following logic(line 1112):

CommandRunnerImpl.java

            //Get list of suplemental commands
            Collection<SupplementalCommand> suplementalCommands = 
                    supplementalExecutor.listSuplementalCommands(model.getCommandName());

Then, This will trigger com.sun.enterprise.v3.admin.SupplementalCommandExecutorImpl class to execute the following logic(line 165)

List<ServiceHandle<Supplemental>> supplementals = habitat.getAllServiceHandles(Supplemental.class);

OK, stopping for a while and let us backing the issue, and noticing the following classes's declaring in cluster-admin module:

1) com.sun.enterprise.v3.admin.cluster.PostRegisterInstanceCommand

PostRegisterInstanceCommand.java
@Service(name="_post-register-instance")
@Supplemental(value="_register-instance", ifFailure=FailurePolicy.Warn)
@PerLookup
@ExecuteOn(value={RuntimeType.DAS})
@RestEndpoints({
    @RestEndpoint(configBean=Domain.class,
        opType=RestEndpoint.OpType.POST, 
        path="_post-register-instance", 
        description="_post-register-instance")
})
public class PostRegisterInstanceCommand extends RegisterInstanceCommandParameters implements AdminCommand {
...

2) com.sun.enterprise.v3.admin.cluster.PostUnregisterInstanceCommand

PostUnregisterInstanceCommand.java
@Service(name="_post-unregister-instance")
@Supplemental(value="_unregister-instance", ifFailure=FailurePolicy.Warn)
@PerLookup
@ExecuteOn(value={RuntimeType.DAS})
@RestEndpoints({
    @RestEndpoint(configBean=Domain.class,
        opType=RestEndpoint.OpType.POST, 
        path="_post-unregister-instance", 
        description="_post-unregister-instance")
})
public class PostUnregisterInstanceCommand implements AdminCommand {
...

The above two classes all have @Supplemental and @Service annotation, and most important annotation is @Supplemental because this will directly cause the two classes being the candidates of "habitat.getAllServiceHandles(Supplemental.class);".

BTW: Only having @Service must not make the class or module can be started, and defaultly hk2 will use lazy loading way to load or start a module unless this module's runlevel is in kernel starting or more lower. In addition, just as sahoo saying, there is another case that another started module depends on this module explicitly by @inject or by hk2 service locator to get services eagerly.

The issue belongs to the latter , that is to say, by hk2 service locator to get services eagerly and trigger hk2 to start cluster-admin ondemand.

As to fixing way, needing to discuss in depth.

Please seeing whether there is something wrong with me.

Thanks
--Tang

Comment by Tom Mueller [ 05/Mar/13 ]

Yes, this is what I'm seeing too. Thanks for the analysis.

Comment by Tom Mueller [ 05/Mar/13 ]

It looks like the SupplementalCommandExecutorImpl.getSupplementalCommandsList method's call to ServiceLocator.getAllServiceHandles(Supplemental.class) is causing all modules that have a class with the @Supplemental annotation to be loaded. According to John W., this shouldn't be happening. He is investigating.

Comment by jwells [ 05/Mar/13 ]

I have fixed this on the hk2 side. After we promote hk2 and integrate it back into GlassFish I'll close this bug:

$ ./glassfish4/bin/asadmin osgi lb | grep cluster
7|Installed | 1|cluster-admin (4.0.0.SNAPSHOT)
176|Installed | 1|cluster-common (4.0.0.SNAPSHOT)

Comment by jwells [ 06/Mar/13 ]

Fixed with uptake of HK2 2.1.67





[GLASSFISH-19699] ClassNotFoundException for JdbcResourceInjector when install-node is run Created: 20/Feb/13  Updated: 26/Mar/13  Resolved: 26/Mar/13

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b76_EE7MS5
Fix Version/s: 4.0_b81

Type: Bug Priority: Major
Reporter: Tom Mueller Assignee: Yamini K B
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When the install-node command is run, the following exception is printed:

$ asadmin install-node x
MultiException stack 1 of 1
java.lang.ClassNotFoundException: org.glassfish.jdbc.config.JdbcResourceInjector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.jvnet.hk2.internal.Utilities.loadClass(Utilities.java:464)
at org.jvnet.hk2.internal.ServiceLocatorImpl.loadClass(ServiceLocatorImpl.java:1618)
at org.jvnet.hk2.internal.ServiceLocatorImpl.reifyDescriptor(ServiceLocatorImpl.java:360)
at org.jvnet.hk2.internal.ServiceLocatorImpl.narrow(ServiceLocatorImpl.java:1678)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetDescriptor(ServiceLocatorImpl.java:895)
at org.jvnet.hk2.internal.ServiceLocatorImpl.getServiceHandle(ServiceLocatorImpl.java:1141)
at org.jvnet.hk2.internal.ServiceLocatorImpl.getServiceHandle(ServiceLocatorImpl.java:1130)
at org.jvnet.hk2.config.DomDocument.getModelByElementName(DomDocument.java:156)
at org.jvnet.hk2.config.ConfigParser.handleElement(ConfigParser.java:164)
at org.jvnet.hk2.config.ConfigParser.handleElement(ConfigParser.java:231)
at org.jvnet.hk2.config.ConfigParser.handleElement(ConfigParser.java:238)
at org.jvnet.hk2.config.ConfigParser.handleElement(ConfigParser.java:190)
at org.jvnet.hk2.config.ConfigParser.parse(ConfigParser.java:100)
at org.jvnet.hk2.config.ConfigParser.parse(ConfigParser.java:130)
at org.jvnet.hk2.config.ConfigParser.parse(ConfigParser.java:116)
at org.jvnet.hk2.config.ConfigParser.parse(ConfigParser.java:112)
at com.sun.enterprise.admin.cli.cluster.NativeRemoteCommandsBase.checkIfNodeExistsForHost(NativeRemoteCommandsBase.java:340)
at com.sun.enterprise.admin.cli.cluster.InstallNodeBaseCommand.validate(InstallNodeBaseCommand.java:100)
at com.sun.enterprise.admin.cli.cluster.InstallNodeSshCommand.validate(InstallNodeSshCommand.java:95)
at com.sun.enterprise.admin.cli.CLICommand.execute(CLICommand.java:296)
at com.sun.enterprise.admin.cli.AdminMain.executeCommand(AdminMain.java:352)
at com.sun.enterprise.admin.cli.AdminMain.doMain(AdminMain.java:289)
at org.glassfish.admin.cli.AsadminMain.main(AsadminMain.java:54)

It doesn't matter what hostname is passed to the command (I used "x" in this case). If you pass in more arguments, you get even more exceptions. This doesn't seem to effect the operation of the command.



 Comments   
Comment by Tom Mueller [ 20/Feb/13 ]

This output appears to be coming from line 359 of NativeRemoteCommandsBase.java where it catches an exception while parsing the domain.xml. It seems odd that this class is parsing all of the domain.xml files from all of the domains in the default domains directory. If multiple domains are going to be checked, why check those in the default domains directory. What if there are other domains in other directories that are unknown to this command. It seems that this check is a halfhearted attempt to make sure that this node isn't already installed in some other domain. Since it doesn't check all domains, it doesn't seem worth doing.

I suspect that the reason for the exception is that the entire modules directory is not in the classpath for the local command, so it isn't able to access all classes that are needed to be able to parse a domain.xml file. The command would need to create a class loader based on the entire modules directory in order to parse the domain.xml file.

Comment by Byron Nevins [ 20/Feb/13 ]

Revision 43131 -

The problem is in InstallNodeBaseCommand.java
method == validate()

What it does

A check of the domains that just happen to be in the default domains dir is performed.

Line 340 -->
DomDocument doc = parser.parse(domainURL);

That line throws an Exception. A stacktrace is dumped (To scare the user?), the Exception is swallowed and totally ignored. It returns false, which then has no further effect.

Comment by Yamini K B [ 21/Feb/13 ]

There are 2 issues here:
1. Looking through the history of NativeRemoteCommandBase, there has been some changes related to HK2 which is causing the exception. The following change fixes that:

Index: src/main/java/com/sun/enterprise/admin/cli/cluster/NativeRemoteCommandsBase.java
===================================================================
— src/main/java/com/sun/enterprise/admin/cli/cluster/NativeRemoteCommandsBase.java (revision 59713)
+++ src/main/java/com/sun/enterprise/admin/cli/cluster/NativeRemoteCommandsBase.java (working copy)
@@ -65,6 +65,8 @@
import com.sun.enterprise.config.serverbeans.Domain;
import com.sun.enterprise.config.serverbeans.Nodes;
import com.sun.enterprise.config.serverbeans.Node;
+import com.sun.enterprise.module.ModulesRegistry;
+import com.sun.enterprise.module.single.StaticModulesRegistry;

import com.sun.enterprise.universal.glassfish.TokenResolver;
import com.sun.enterprise.util.io.DomainDirs;
@@ -327,14 +329,9 @@
}
);

  • ServiceLocator serviceLocator = ServiceLocatorFactory.getInstance().create("default");
  • try { - HK2Populator.populate(serviceLocator, new ClasspathDescriptorFileFinder(cl), null); - }

    catch (IOException e)

    { - logger.log(Level.SEVERE, "Error initializing HK2", e); - }
  • + ModulesRegistry registry = new StaticModulesRegistry(cl);
    + ServiceLocator serviceLocator = registry.createServiceLocator("default");
    +
    ConfigParser parser = new ConfigParser(serviceLocator);
    URL domainURL = domainXMLFile.toURI().toURL();
    DomDocument doc = parser.parse(domainURL);

2. The command should scan domains in user configured domains dir as well.

Comment by Tom Mueller [ 21/Feb/13 ]

WRT #2 in the previous comment, there is no way of knowing where the "user configured domains dirs" might be. They are not configured. Domains directories are only ever specified using a --domaindir option.

Other than list-domains, we don't have any command that looks at or does anything to more than one domain. And with list-domains, it only looks at the domains in the directory passed via the --domaindir option (or the default).

I strongly recommend that this "feature" of looking at multiple domains be removed from the install-node command.

Comment by Yamini K B [ 26/Mar/13 ]

Fix checked in r60828





[GLASSFISH-19057] Potential file descriptor leaks in SSHLauncher Created: 05/Sep/12  Updated: 05/Sep/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Joe Di Pol Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

From inspection it appears as though SSHLauncher has potential file descriptor leaks, especially with the use of SCPClient. This is mainly concerning the "connection" field which sometimes is closed via SSHUtil.unregister(), but other times does not appear to be.

For example, if a consumer of SSHLauncher calls getSCPClient(), there does not appear to be a way to ever close the connection opened by that method.

In general the management of connections in SSHLauncher seem rather haphazard, and should be looked at more closely.



 Comments   
Comment by Joe Di Pol [ 05/Sep/12 ]

One quick safety net may be to put a finalizer on SSHUtil to close all connections in activeConnections.

We may also want to put a close() method on SSHLauncher that closes all active connections in SSHUtil.





[GLASSFISH-18707] delete-local-instance should NOT have a --domain parameter Created: 09/May/12  Updated: 19/Sep/14

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b38_ms2
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

delete-local-instance is NOT a true local command. It demands that the instance's DAS be running. If it is not running then the command fails and does nothing.

Therefore it is just busy-work for the user to specify the name of the domain since we KNOW which domain it is by the usual host and port args.

Currently this issue is only on das-branch



 Comments   
Comment by Tom Mueller [ 07/Feb/13 ]

Setting the target fix version to 4.0.1 since this is not needed for the Java EE 7 RI/SDK release.





[GLASSFISH-18645] plain file in INSTALL_ROOT causes install-node to fail Created: 18/Apr/12  Updated: 27/Apr/12  Resolved: 23/Apr/12

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Tom Mueller Assignee: Yamini K B
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-release-note-added, 3_1_2-release-notes

 Description   

If a plain file exists in the install root directory, e.g., glassfish3 based on the glassfish.zip distribution, then the install-node command fails with a NullPointerException.

The root cause is in the InstallNodeBaseCommand.isFileWithinBinDirectory method in the line:

String s = f.getParentFile().getName();

Here, f.getParentFile() return null, so there is a NullPointerException.

A work-around is to remove any plain files from that directory before using install-node.



 Comments   
Comment by Byron Nevins [ 18/Apr/12 ]

Reassign to author.

Comment by Joe Di Pol [ 19/Apr/12 ]

This is made worse by the fact that the GlassFish installer leaves two files in the install root directory (uninstall.exe, uninstall.sh). So with this bug the install-node command can only be used with installations created using the zip distribution.

The work-around is to install the software manually on the remote node and not use the install-node command.

Comment by Rebecca Parks [ 19/Apr/12 ]

Added to the 3.1.2 Release Notes:

Description

If a plain file exists in the installation parent directory, for example glassfish3 based on the glassfish.zip distribution, then the install-node command fails with a NullPointerException.

Because the GlassFish installer leaves two files in the installation parent directory (uninstall.exe, uninstall.sh), the install-node command can only be used with installations created using the glassfish.zip distribution.

Workaround

Do one of the following:

  • Remove any plain files from the installation parent directory before using the install-node command.
  • Install the software manually on the remote node and do not use the install-node command.
Comment by Byron Nevins [ 19/Apr/12 ]

Not directly related but sort-of related:

Added a utility method

Project: glassfish
Repository: svn
Revision: 53581
Author: bnevins
Date: 2012-04-19 17:48:46 UTC
Link:

Log Message:
------------
I was inspired by
http://java.net/jira/browse/GLASSFISH-18645
to add a utility method to get the parent directory of a file – that actually does what you
expect it to do in ALL cases.

E.g.

File f = new File("/etc/.");

f.getParentFile() returns a file with this path: "/etc"
FileUtils.getParentFile(f) returns what you want – "/"

Revisions:
----------
53581

Modified Paths:
---------------
trunk/main/nucleus/common/common-util/src/test/java/com/sun/enterprise/util/io/FileUtilsTest.java
trunk/main/nucleus/common/common-util/src/main/java/com/sun/enterprise/util/io/FileUtils.java

Diffs:
------
Index: trunk/main/nucleus/common/common-util/src/test/java/com/sun/enterprise/util/io/FileUtilsTest.java
===================================================================
— trunk/main/nucleus/common/common-util/src/test/java/com/sun/enterprise/util/io/FileUtilsTest.java (revision 53580)
+++ trunk/main/nucleus/common/common-util/src/test/java/com/sun/enterprise/util/io/FileUtilsTest.java (revision 53581)
@@ -37,9 +37,9 @@

  • only if the new code is made subject to such option by the copyright
  • holder.
    */
    -
    package com.sun.enterprise.util.io;

+import com.sun.enterprise.universal.io.SmartFile;
import java.io.File;
import org.junit.AfterClass;
import org.junit.BeforeClass;
@@ -70,12 +70,28 @@
assertTrue(FileUtils.mkdirsMaybe(d2));
assertFalse(d2.mkdirs());

  • if(!d1.delete())
    + if (!d1.delete())
    d1.deleteOnExit();
  • if(!d2.delete())
    + if (!d2.delete())
    d2.deleteOnExit();

}
+ @Test
+ public void testParent()

{ + File f = null; + assertNull(FileUtils.getParentFile(f)); + f = new File("/foo/././././."); + File wrongGrandParent = f.getParentFile().getParentFile(); + File correctParent = FileUtils.getParentFile(f); + File sanitizedChild = SmartFile.sanitize(f); + File sanitizedWrongGrandParent = SmartFile.sanitize(wrongGrandParent); + File shouldBeSameAsChild = new File(correctParent, "foo"); + // check this out -- surprise!!!! + assertEquals(sanitizedWrongGrandParent, sanitizedChild); + assertEquals(shouldBeSameAsChild, sanitizedChild); + }

+
+
}
Index: trunk/main/nucleus/common/common-util/src/main/java/com/sun/enterprise/util/io/FileUtils.java
===================================================================
— trunk/main/nucleus/common/common-util/src/main/java/com/sun/enterprise/util/io/FileUtils.java (revision 53580)
+++ trunk/main/nucleus/common/common-util/src/main/java/com/sun/enterprise/util/io/FileUtils.java (revision 53581)
@@ -45,6 +45,7 @@

package com.sun.enterprise.util.io;

+import com.sun.enterprise.universal.io.SmartFile;
import java.io.*;
import java.util.*;

@@ -63,6 +64,23 @@
final static Logger _utillogger = com.sun.logging.LogDomains.getLogger(FileUtils.class,com.sun.logging.LogDomains.UTIL_LOGGER);

/**
+ * The method, java.io.File.getParentFile() does not necessarily do what
+ * you would think it does. What it really does is to simply chop off the
+ * final element in the path and return what is left-over. E.g.
+ * if the file is /foo/. then the "parent" that is returned is /foo
+ * which is probably not what you expected.
+ * This method really returns the parent directory - or null if there is none.
+ * @param f
+ * @return
+ */
+ public static File getParentFile(File f)

{ + if (f == null) + return null; + + return SmartFile.sanitize(f).getParentFile(); + }

+
+ /**

  • Wrapper for File.mkdirs
  • This version will return true if the directory exists when the method returns.
  • Unlike File.mkdirs which returns false if the directory already exists.
    @@ -624,8 +642,8 @@

return new File(relative);
}

+
/**

  • Executes the supplied work object until the work is done or the max.
  • retry count is reached.
    @@ -692,7 +710,7 @@
    f = File.createTempFile(TMPFILENAME, "jar", directory);
    }
    catch (IOException ioe)
    Unknown macro: {-//Bug 4677074 ioe.printStackTrace(); +//Bug 4677074 ioe.printStackTrace(); //Bug 4677074 begin _logger.log(Level.SEVERE, "iplanet_util.io_exception", ioe); //Bug 4677074 end@@ -986,7 +1004,7 @@ _utillogger.log(Level.FINE, Strings.get("enterprise_util.rename_initial_success", new Object [] { fromFilePath, toFilePath } )); }
  • } else { + }

    else

    Unknown macro: { _utillogger.log(FILE_OPERATION_LOG_LEVEL, Strings.get("enterprise_util.retry_rename_success", new Object [] { fromFilePath, toFilePath, Integer.valueOf(retries) } ));
    }
    @@ -999,7 +1017,7 @@
    { fromFilePath, toFilePath, Integer.valueOf(retries) } )); }

    return result;

  • }
    + }

/** Appends the given line at the end of given text file. If the given

  • file does not exist, an attempt is made to create it.
    @@ -1038,7 +1056,7 @@
    appendText(fileName, buffer.toString());
    }
    ///////////////////////////////////////////////////////////////////////////
  • +
    /** A utility routine to read a <b> text file </b> efficiently and return

  • the contents as a String. Sometimes while reading log files of spawned
  • processes this kind of facility is handy. Instead of opening files, coding
    @@ -1049,12 +1067,12 @@
  • @throws java.io.IOException if there is an i/o error.
  • @throws java.io.FileNotFoundException if the file could not be found
    */
  • public static String readSmallFile(final String fileName)
    + public static String readSmallFile(final String fileName)
    throws IOException, FileNotFoundException { return (readSmallFile(new File(fileName)) ); }
  • public static String readSmallFile(final File file)
    +
    + public static String readSmallFile(final File file)
    throws IOException { final BufferedReader bf = new BufferedReader(new FileReader(file)); final StringBuilder sb = new StringBuilder(); //preferred over StringBuffer, no need to synchronize @@ -1173,7 +1191,7 @@ }

    return new File[0];
    }

  • +
    /**

  • Represents a unit of work that should be retried, if needed, until it
  • succeeds or the configured retry limit is reached.
Comment by Yamini K B [ 23/Apr/12 ]

Same as JIRA-18447

Comment by bernaps [ 27/Apr/12 ]

So while waiting for the next release of Glassfish to be release here is a simple workaround

From the DAS that is trying to run install-node or install-node-ssh just move the uninstall.exe and uninstall.sh from the root directory of glassfish (any files there will give the Null Pointer)

so for me
$ mv uninstall.exe bin/
$ mv uninstall.sh bin/

Now your install-node will work. Enjoy Glassfish





[GLASSFISH-18469] On the local and remote hosts was used the same user, but install-node-dcom failed without -w <user_name> Created: 07/Mar/12  Updated: 15/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b21
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

3.1.2 bui8ld 21. Window 2008 machines.

I believe that this is a regression issue. install-node-dcom failed, because was not used -w <user_name> option. See, for example:
asadmin install-node-dcom -W password1.txt bigapp-oblade-3
com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure:
unknown user name or bad password.
Command install-node-dcom failed.

But the command: "asadmin install-node-dcom -w aroot -W password1.txt bigapp-oblade-3" was executed successfully.
On the DAS machine and on the remote host was used the same user aroot, and according to the help: "The default is the user that is running this subcommand.". I.e. in my case aroot.






[GLASSFISH-18451] install-node-dcom does not function Created: 05/Mar/12  Updated: 26/Jun/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: jp2011 Assignee: Byron Nevins
Resolution: Unresolved Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows server 2008 R2 Sp1
Glassfish 3.1.2 Release


Tags: dcom

 Description   

I initially tried using a windows domain account to install the node and that didn't work as per GLASSFISH-18327. That issue was also incorrectly marked as resolved. Network captures show that the release version of Glassfish 3.1.2 still does not use the domain account, but attempts to use the local account.

After giving up with this, I created a new local account on the remote machine called glassfish. Granted full access to the 2 required registry keys and added the account to the administrators group. Attempting to install the remote node using this account still fails with the following message:

Successfully verified that the host, hostname, is not the local machine as required. Successfully resolved host name to: hostname/10.65.30.xxx Successfully connected to DCOM Port at port 135 on host hostname. Successfully connected to NetBIOS Session Service at port 139 on host hostname. Successfully connected to Windows Shares at port 445 on host hostname. The remote file, C: doesn't exist on hostname: Access is denied.

I performed a network capture and can tell you the following:

1. The user account is successfully authenticated with STATUS_SUCCESS (0x00000000)
2. SMB is attempting to access \\hostname\C$ no matter what I set the remote test directory to.
3. NT Status: STATUS_FS_DRIVER_REQUIRED (0xc000019c) is returned from the remote host but I suspect this is normal and used for dynamic library loading for the file system.

4. NT Status: STATUS_ACCESS_DENIED (0xc0000022) is returned on attempting to connect to \\hostname\C$

The documentation does not state any other prerequisite or permissions that need to be setup for this to function. What is missing?



 Comments   
Comment by Byron Nevins [ 06/Mar/12 ]

what is the exact command you're running?

Comment by jp2011 [ 06/Mar/12 ]

To make things even simpler, it is reproducible by the validate-dcom command alone.

Password file contains the following line: AS_ADMIN_WINDOWSPASSWORD=$

{ALIAS=glassfish-alias}

I have setup the alias already in asadmin as per the documentation.

c:\glassfish3\bin>asadmin --passwordfile passwordfile.txt validate-dcom -w glassfish remotehost
remote failure:
Successfully verified that the host, remotehost, is not the local machine as required.
Successfully resolved host name to: remotehost/10.65.30.187
Successfully connected to DCOM Port at port 135 on host remotehost.
Successfully connected to NetBIOS Session Service at port 139 on host remotehost
nc.
Successfully connected to Windows Shares at port 445 on host remotehost.
The remote file, C: doesn't exist on remotehost: Access is denied.

Command validate-dcom failed.

I can speak to the network capture I took as well, but that would be easier offline to this web portal.

Comment by Byron Nevins [ 06/Mar/12 ]

Can you access the c$ share from another computer – say

net use X: \\other\c$

?

Comment by Byron Nevins [ 06/Mar/12 ]

Please make sure theses items are setup correctly, especially the third one:

1. Server service is in the started state and is set to start automatically.
2. Remote Registry service is also in the started state and is set to start automatically.
3. Set the Local Policy for Network Access:Control Panel" > "Administrative Tools" -> "Local Security Policy"> "Local Policies" -> "Security Options" -> "Network Access: Sharing security model for local accounts" Make sure it is set to Classic

Comment by ljnelson [ 28/Mar/12 ]

I have exactly the same problem.

I installed and ran setup-local-dcom on the remote machine as an administrator. It claimed it ran successfully.

Then I made sure that your steps 1-3 above were taken. I had to manually start the remote registry service.

My remote machine is running Windows 7 Professional on a 64-bit machine with all updates installed.

Here is my command and output:

ljnelson$ asadmin --passwordfile ~/.glassfish.passwords --port=9048 validate-dcom --windowsuser lnelson --windowsdomain jenzabar --remotetestdir 'C:\crap' --verbose true 10.63.4.42
remote failure: 
Successfully verified that the host, 10.63.4.42, is not the local machine as required.
Successfully resolved host name to: /10.63.4.42
Successfully connected to DCOM Port at port 135 on host 10.63.4.42.
Successfully connected to NetBIOS Session Service at port 139 on host 10.63.4.42.
Successfully connected to Windows Shares at port 445 on host 10.63.4.42.
The remote file, C:\crap doesn't exist on 10.63.4.42 : The parameter is incorrect.

Command validate-dcom failed.

C:\crap is a directory present on the remote machine. I haven't set it up to be shared in any way, but I haven't done anything else to it, either. Any path supplied to --remotetestdir is considered to not exist. I've tried moving slashes around and doubling up backslashes in case it's a path issue; it's not.

Hope this data point helps.

Comment by lb54 [ 11/Apr/12 ]

Hi.
I have also this issue:
Win 2003 SP2 (Domain Admin Server, GF 3.1.1, updated to 3.1.2)
Win 2008 Server R2 Enterprise SP1 (node, formerly connected through SSH via cygwin)
User is authorized for both machines. DCOM is planned to replace the SSH-communication.

Message is from Web Console is:
Successfully verified that the host, myserver.host.xx, is not the local machine as required. Successfully resolved host name to: myserver.host.xx/<IP-Address> Successfully connected to DCOM Port at port 135 on host myserver.host.xx. Successfully connected to NetBIOS Session Service at port 139 on host gibson-10.tecis.hh. Successfully connected to Windows Shares at port 445 on host myserver.host.xx. The remote file, C: doesn't exist on myserver.host.xx : Logon failure: unknown user name or bad password.

The CLI also fails with:
remote failure: Command install-node-dcom failed.

com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure: unknown user name or bad password.
Command create-node-dcom failed.

Is there a way to "workaround" this or do I have to wait for an update?

Comment by jp2011 [ 11/Apr/12 ]

There has been no fix for this because the cause is still unknown to Oracle. The workaround is do not use DCOM. We have personally abandoned Windows as a platform for production/QA in favour of RHEL 5 Linux distro. SSH is built in, and the cluster runs a lot faster with less overhead. The downside is that you have to learn Linux commands. But really, is this that bad?

Comment by lb54 [ 13/Apr/12 ]

I agree with you.
BUT: Telling my company to use Linux Servers instead of Windows will not work, they don't want to hear that.
Using SSH Nodes on Windows System with cygwin seems to be an alternative. But I used Glassfish 3.1.1 with ssh (cygwin) already, the communication seems to be not very stable (long running startup processes and long loading "Clusters" page with the Web Console).

@Byron: Is there a plan for this bugfix so far?

Comment by mr_daemon [ 16/Apr/12 ]

I did some incredibly tedious debugging and was able to get it to work:

For the validate-dcom test to pass, since it seems to ignore the parameter for the test directory entirely and always use C:\ regardless, you must disable the new (vista+) policy that prevents users from elevating their privileges over the network by navigatinig to

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System

and creating a new DWORD named LocalAccountTokenFilterPolicy of 1. This will allow the delete-me.bat file to be created there.

However this then breaks again:

PS D:\private> d:\glassfish3\bin\asadmin.bat --passwordfile dcom-pw.txt validate-dcom -w glassfish -v=true qlsvrnode2
remote failure:
Successfully verified that the host, qlsvrnode2, is not the local machine as required.
Successfully resolved host name to: qlsvrnode2/192.168.9.11
Successfully connected to DCOM Port at port 135 on host qlsvrnode2.
Successfully connected to NetBIOS Session Service at port 139 on host qlsvrnode2.
Successfully connected to Windows Shares at port 445 on host qlsvrnode2.
Successfully accessed C: on qlsvrnode2 using DCOM.
Successfully wrote delete_me.bat to C: on qlsvrnode2 using DCOM.
Could not connect to WMI (Windows Management Interface) on qlsvrnode2. : Error setting up remote connection to WMI

This is not mentionned at all in the documentation, but turns out you also need to change ownership and set permissions to the following registry key, in addition to the ones already listed:

HKEY_CLASSES_ROOT\CLSID\{76A64158-CB41-11D1-8B02-00600806D9B6}

Once this is accomplished, everything works as advertised.

I am not fond of the security implications but at least it works and is at least more reliable than Cygwin+sshd.

Comment by Byron Nevins [ 19/Apr/12 ]

Thanks for the excellent comments and work everyone. I'll try and address this problem soon.

Comment by lb54 [ 26/Apr/12 ]

Hi Byron.
Are there any plans to release this fix so far? Or is the "hack" described above the official solution?

Thanks for info.

Best wishes.

Basti

Comment by lb54 [ 16/May/12 ]

Hi there.
It seems that no one is working on this ticket right now.
Is there a chance to get a fix for this in the near future?
Unfortunatly the "quick fix" described above does not work for me, so I need another workaround or this bug fixed.

Can anyone help me?

Thanks.

Basti

Comment by mtobler [ 25/Jun/12 ]

I have not been able to get this to work on a set of 2008 R2 Servers
which I am trying to cluster. Unfortunately I am unable to get the ssh
functionality to work as well which leaves me with no clustering
capability and wondering why we used Glassfish.
Is anyone going to work on this anytime soon?

I added the following to 18327 but am adding it here as requested:
asadmin> validate-dcom --passwordfile do-not-delete gf01
remote failure:
Successfully verified that the host, gf01, is not the local machine as required.
Successfully resolved host name to: gf01/172.18.11.169
Successfully connected to DCOM Port at port 135 on host gf01.
Successfully connected to NetBIOS Session Service at port 139 on host gf01.
Successfully connected to Windows Shares at port 445 on host gf01.
The remote file, C: doesn't exist on gf01 : Logon failure: unknown user name or bad password.

I am using a domain and the user is a domain user.

I have gone through every document I can find on this issue and have verified all settings/registry keys/etc are correct. I have tried this via asdamin and via the console and get the same result.

Comment by Byron Nevins [ 26/Jun/12 ]

Sorry I overlooked the activity on this issue. I'll try to look into it soon. mtobler – please document what you did/what happened etc. Are you using a Windows Domain?





[GLASSFISH-18447] install-node-ssh NullPointerException because of empty zip file created Created: 03/Mar/12  Updated: 24/Apr/12  Resolved: 24/Apr/12

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: 4.0_b34

Type: Bug Priority: Major
Reporter: Hrotkó Gábor Assignee: Yamini K B
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 4 minutes
Time Spent: Not Specified
Original Estimate: 4 minutes
Environment:

$ uname -a
Linux host1 2.6.32-38-generic #83-Ubuntu SMP Wed Jan 4 11:12:07 UTC 2012 x86_64 GNU/Linux

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)


Tags: 3_1_2-next, install-node-ssh

 Description   

Trying to create a new node with install-node-ssh fails. Both from cli and web admin console. Ssh setup was successful. I get NullPointerException without a stacktrace. I assume that this is because the zip file cannot be created. Permissions are ok, it was a fresh install with user1.

$ /home/user1/glassfish3/glassfish/bin/asadmin install-node-ssh --installdir /home/user2/glassfish3 --sshkeyfile /home/user1/.ssh/id_dsa --sshuser user2 --sshport 22 node1
Successfully connected to user2@node1 using keyfile /home/user1/.ssh/id_dsa
java.lang.NullPointerException
Command install-node-ssh failed.
$ ls -la glas*.zip
rw-rr- 1 user1 user1 0 2012-03-03 16:36 glassfish2006240698242408681.zip

The big problem is, that there is no stacktrace.



 Comments   
Comment by hiro2k [ 09/Mar/12 ]

I'm also running into this issue and cannot create any ssh nodes. This is preventing me from upgrading to 3.1.2 from 3.1.1.

Comment by bybates [ 12/Mar/12 ]

I am also experiencing the same issue.

I have a scripted cluster deployment working with GF 3.1.1, so I'm positive my SSH keys and environment are configured correctly.

When using the same scripts with Glassfish 3.1.2, the install-node subcommand is failing. I get a similar zero byte zip archive mentioned above. I also experience the same problem when executing the create-node-ssh subcommand.

Comment by Joe Di Pol [ 19/Mar/12 ]

An alternative is to perform the GlassFish installation yourself on the remote systems, and then use create-node-ssh to point to that installation (without using the --install option).

Comment by Yamini K B [ 20/Apr/12 ]

Can you send across output of 'ls -F' of install root? I need to confirm if this is same as JIRA-18645

Comment by Hrotkó Gábor [ 20/Apr/12 ]

$ ls -F
bin/ glassfish/ install/ javadb/ mq/ pkg/ uninstall.exe uninstall.sh updatetool/ var/

Comment by Yamini K B [ 24/Apr/12 ]

Fix checked in r53630





[GLASSFISH-18441] install-node doesn't handle bin/pkg and bin/updatetool symlinks Created: 01/Mar/12  Updated: 01/Mar/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Joe Di Pol Assignee: Yamini K B
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to UPDATECENTER2-2212 Remove use of symlinks Open

 Description   

On unix platforms the following files are symlinks (after update center bootstrap):

glassfish3/bin/pkg
glassfish3/bin/updatetool

If you then do an install-node these symlinks are replaced with the actual files in ../pkg/bin/pkg/ on the remote system. That means any relative paths referred to in the scripts are not found. For example:

To reproduce:

  • Unzip glassfish.zip
  • Run bin/pkg to install the pkg packages
  • Run install-node to install GlassFish on another system
  • On the other system try to run glassfish3/bin/pkg and you'll get an error:
    $ glassfish3/bin/pkg
    cd glassfish3
    bin/pkg[228]: /var/tmp/dipol/glassfish3/bin/../python2.4-minimal/bin/python: 
        not found [No such file or directory]
    

The work-around is to use the actual scripts in glassfish3/pkg/bin:

cd glassfish3
pkg/bin/pkg list
...

You can also repair the broken file by runing "fix" on the pkg packages:

cd glassfish3
pkg/bin/pkg fix pkg

This will complain about a bunch of permission stuff, but will in the end repair the file.



 Comments   
Comment by Joe Di Pol [ 01/Mar/12 ]

One option to fix this is to change UC to not use symlinks.





[GLASSFISH-18437] install-node --force is too slow Created: 01/Mar/12  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: future release
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Tom Mueller Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The install-node --force command takes too long to remove all of the files from the remote host. It prints a "Force removing file..." message for each file that it is removing, and it appears that maybe it is doing a separate "scp" command to remove each file. It appears that it is removing the entire installation (including the domains and nodes directories, which it should not do). If so, a "rm -r" would be better than removing each file individually.

I was tempted to write this as an RFE, but the current implementation is too slow to be useful.

I observed this with an SSH node.






[GLASSFISH-18430] Cluster commands fail if dcom dependencies removed Created: 28/Feb/12  Updated: 03/Apr/12  Resolved: 11/Mar/12

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b23
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Joe Di Pol Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Java Archive File 18430.jar    
Tags: 3_1_2-next

 Description   

If I remove modules/j-interop-repackaged.jar from a GlassFish installation, then all cluster operations fail because the cluster admin module can't be loaded:

$ ./asadmin list-instances
Failed to start Bundle Id [178] State [INSTALLED] [org.glassfish.main.cluster.admin(Cluster Admin):3.1.2]
Closest matching local and remote command(s):
list-instances

Command list-instances failed.

j-interop-repackaged.jar needs to be a soft dependency where only the DCOM support stops working when it is removed.



 Comments   
Comment by Joe Di Pol [ 07/Mar/12 ]

I wonder if it is enough to update the metadata for cluster-common.jar so the Import-Package field specifies resolution="optional" for the interop packages like:

Import-Package: ... org.jinterop.dcom.common;password=GlassFish;resolution:="optional"

Comment by Byron Nevins [ 08/Mar/12 ]

Step #1

I did this for 4.0 already. It should be repeated for 3.1.2 if back-porting is desired

Move the dependency for the j-interop jar out of common/common-util

Comment by Byron Nevins [ 08/Mar/12 ]

AFAIK – there is one and only one way to do this with the j-interop jar:

Reflection.

The way I did it with DTrace was IMO an elegant solution.

Create an interface-Contract (DTraceContract.java in glassfish-api)
Implement the interface in value-add
value-add uses reflection to access the actual DTrace code
If there is no value-add, then the Habitat returns null with a call to getByContract()

How to use this approach for DCOM:

1) Create a glassfish public API, JinteropContract
2) Implement the contract inside the j-interop-repackaged jar
3) cluster/common now works exclusively with the Habitat-returned value from getByContract()
4) if (3) is null – it simply errors out.

Note: this is quite a bit of tricky work!

Comment by Byron Nevins [ 09/Mar/12 ]

Progress:

Now OSGi and Maven are perfectly happy if the j-interop-repackaged.jar isn't available.

Next Step:
If WIndows/Dcom code is called anyways – catch the NoClassDefFoundError and fail cleanly with a good message
for the user.

Comment by Byron Nevins [ 09/Mar/12 ]

I chose the non-reflection answer. I just tell OSGi that the jar is optional. You won't see an error until you try to load the Windows/Dcom class.

Comment by Byron Nevins [ 11/Mar/12 ]

This jar file contains all the modified/new files for the svn revision # 52846 on the main trunk.

It also has a diffs file.

Comment by Byron Nevins [ 11/Mar/12 ]

Done!

D:\gf\trunk\main\nucleus\cluster>svn commit
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending cluster\common\osgi.bundle
Sending cluster\common\pom.xml
Adding cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\LocalStrings.properties
Adding cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\SharedStrings.java
Sending cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\io\WindowsRemoteFileSystem.java
Sending cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\process\WindowsRemoteScripter.java
Sending cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\process\WindowsWmi.java
Transmitting file data ........
Committed revision 52846.

Comment by Byron Nevins [ 02/Apr/12 ]

BugDB: 13917303

Comment by Byron Nevins [ 03/Apr/12 ]

Ported to 3.1.2.1





[GLASSFISH-18371] SSH: Do not allow running DAS on 4.0 and Remote Instances on 3.x Created: 15/Feb/12  Updated: 15/Apr/14

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b24
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-18366 create-dcom should detect GlassFish v... Resolved

 Description   

I have not checked but this is almost certainly the case. See 18366.

It is visible by nadmin not existing in a 3.x installation.
But the real problem is that we should forbid hetero-version clustering.



 Comments   
Comment by Byron Nevins [ 15/Feb/12 ]

the DCOM version of this issue





[GLASSFISH-18366] create-dcom should detect GlassFish version and provide better error message. Created: 15/Feb/12  Updated: 18/Feb/13  Resolved: 18/Feb/13

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b23
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Anissa Lam Assignee: Byron Nevins
Resolution: Works as designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-18371 SSH: Do not allow running DAS on 4.0... Open

 Description   

Environment: 4.0 installed on local machine. 3.1.2 installed on remote window machine.
Without realizing i am mixing 2 version, I tried using GUI to create a DCOM node for that machine. validate-dcom runs fine and return success.
but the dcom node is not created. I keep getting the error:

"Warning: some parameters appear to be invalid. DCOM node not created. To force creation of the node with these parameters rerun the command using the --force option. Couldn't connect via DCOM to remote host: bigapp-oblade-3.us.oracle.com"

I believe I have entered all the values correctly, cannot figure out why the node is not created.

I think create-node-dcom should provide better error message to point out the real reason.



 Comments   
Comment by Byron Nevins [ 15/Feb/12 ]

WindowsremotePinger ASSUMES that if "asadmin version" returns nothing that it can't connect. In this case the problem is the wrong version – it connected just fine.

On 2/14/2012 7:15 PM, Byron Nevins wrote:

Comment by Byron Nevins [ 18/Feb/13 ]

All players need to be the same version.





[GLASSFISH-18327] install-node-dcom does not abide by --windowsdomain parameter Created: 06/Feb/12  Updated: 04/Oct/13  Resolved: 06/Feb/12

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b20
Fix Version/s: 3.1.2_b21, 3.1.2.2

Type: Bug Priority: Blocker
Reporter: jp2011 Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2
Glassfish 3.1.2_b20


Attachments: Zip Archive 18327.zip    
Tags: 3_1_2-approved

 Description   

Running the command install-node-dcom ignores the --windowsdomain or -d parameters.

asadmin> install-node-dcom --windowsdomain domain.net
Enter the value for the hosts operand> somehost
Enter remote password for serviceaccount@somehost>
com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure:
unknown user name or bad password.
Command install-node-dcom failed.

As can be seen, the windows domain is ignored and the asadmin is attempting to connect to somehost with a local account.



 Comments   
Comment by Byron Nevins [ 06/Feb/12 ]

The command obviously fell through the cracks and received no testing.

Confirmed problem.

Comment by Byron Nevins [ 06/Feb/12 ]
  • What is the impact on the customer of the bug?
    Bad. He will not be able to use a Windows Domain Controller for authenticated login to remote machines in the domain.
    Whether a real user would really ever do this is debatable. I would not. I would create local accounts on each machine.

How likely is it that a customer will see the bug and how serious is the bug?
100% chance if they run the command with the windows domain option

Is it a regression?
No.
Does it meet other bug fix criteria (security, performance, etc.)?
Yes

  • What is the cost of fixing the bug?
    1 man-day.

How risky is the fix?
Little or no risk. I will thoroughly test that it continues to work both before & after the fix with NON-DOMAIN authentication.
Unfortunately I don't think I will be able to test domain controller authentication because:
(a) My company refuses to give me multiple Windows computers. Therefore I can't create my own domain.
(b) The computer in a domain that I can access is provisioned and it is locked down too tightly to access with SAMBA/jcifs.

How much work is the fix?
See "cost" above.

Is the fix complicated?
No. Very straight-forward.

  • Is there an impact on documentation or message strings?
    No.
  • Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
    Can't possibly destabilize since it is a local-only command.
    QA should run normal tests and should try it against a domain-controller authenticated machine.
  • Which is the targeted build of 3.1.2 for this fix?
    The last one.
Comment by Byron Nevins [ 06/Feb/12 ]

Bugfix - source and diffs

Comment by Byron Nevins [ 06/Feb/12 ]

I tested for regressions thoroughly – I installed on a remote in my LAN. I also installed over VPN to a lab machine.

This guarantees that the existing behavior still works fine for non Domain Controller authentication.

It SHOULD work for Domain Controller authentication but I can't test it.
There's no risk since before the fix – Domain Controller authentication definitely did not work. It can't possibly be any worse and it probably is fixed now.

Worst case scenario is that a user would have to run the command against a local account on the machine or he would need to install GF manually on the remote machine(s).

Comment by Byron Nevins [ 06/Feb/12 ]

Sending cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Transmitting file data .
Committed revision 52465.

Comment by jp2011 [ 09/Feb/12 ]

Sorry to tell you but this isn't fixed in bundle 21.

Performing a network capture while running the command still shows the host name being used instead of the domain for passing the credentials. I'm not sure how you are validating the domain in the OK method, but this may not be correct and hence why the host is still being used instead of the domain.

I'm also fairly certain the other dcom commands are not working with the windowsdomain switch either.

Comment by easarina [ 06/Mar/12 ]

I believe that this is a regression issue and the error message was seen because was not use -w <user_name> option. See, for example:
asadmin install-node-dcom -W password1.txt bigapp-oblade-3
com.sun.enterprise.util.cluster.windows.process.WindowsException: Logon failure:
unknown user name or bad password.
Command install-node-dcom failed.

But the command: "asadmin install-node-dcom -w aroot -W password1.txt bigapp-oblade-3" was executed successfully. On the DAS machine and on the remote host was used the same user aroot, and according to the help: "The default is the user that is running this subcommand.". I.e. in my case aroot.

Comment by jp2011 [ 06/Mar/12 ]

The command that I used was correct I just didn't spell it out in the notes. Trying the command as you listed also doesn't work and specifies a local windows account on the remote host. I have also tried a local windows account and the command doesn't work. The user is successfully authenticated but errors still occur with the SMB side (see GLASSFISH-18451).

Furthermore, the GUI behaves the same as my attempt to run the commands manually from the command line.

Comment by easarina [ 06/Mar/12 ]

Does validate-dcom command work?

Comment by jp2011 [ 06/Mar/12 ]

Password file contains the following line: AS_ADMIN_WINDOWSPASSWORD=$

{ALIAS=glassfish-alias}

I have setup the alias already in asadmin as per the documentation.

c:\glassfish3\bin>asadmin --passwordfile passwordfile.txt validate-dcom -w glassfish remotehost
remote failure:
Successfully verified that the host, remotehost, is not the local machine as required.
Successfully resolved host name to: remotehost/10.65.30.187
Successfully connected to DCOM Port at port 135 on host remotehost.
Successfully connected to NetBIOS Session Service at port 139 on host remotehost
nc.
Successfully connected to Windows Shares at port 445 on host remotehost.
The remote file, C: doesn't exist on remotehost: Access is denied.

Command validate-dcom failed.

Comment by Byron Nevins [ 06/Mar/12 ]

Bug located – The NtlmPasswordAuthentication object which does the actual Windows authentication – was ignoring the domain!

==========================================================================

Index: src/main/java/com/sun/enterprise/util/cluster/windows/io/WindowsRemoteFileSystem.java
===================================================================
— src/main/java/com/sun/enterprise/util/cluster/windows/io/WindowsRemoteFileSystem.java (revision 52702)
+++ src/main/java/com/sun/enterprise/util/cluster/windows/io/WindowsRemoteFileSystem.java (working copy)
@@ -37,7 +37,6 @@

  • only if the new code is made subject to such option by the copyright
  • holder.
    */
    -
    package com.sun.enterprise.util.cluster.windows.io;

import com.sun.enterprise.util.cluster.windows.process.WindowsCredentials;
@@ -55,9 +54,25 @@
private final NtlmPasswordAuthentication authorization;

public WindowsRemoteFileSystem(WindowsCredentials cr)

{ - host = getIP(cr.getHost()); - authorization = new NtlmPasswordAuthentication(host, cr.getUser(), cr.getPassword()); + // if host and domain are the same we can use the IP address of the host + // otherwise use the domain name. + boolean useDomain; + String hostName = cr.getHost(); + String domain = cr.getDomain(); + + if(!domain.equals(hostName)) + useDomain=true; + else + useDomain = false; + + host = getIP(hostName); + + if(useDomain) + authorization = new NtlmPasswordAuthentication(domain, cr.getUser(), cr.getPassword()); + else + authorization = new NtlmPasswordAuthentication(host, cr.getUser(), cr.getPassword()); }

+
public WindowsRemoteFileSystem(String hostname, NtlmPasswordAuthentication auth) {
host = getIP(hostname);
authorization = auth;
@@ -85,7 +100,8 @@
private String getIP(String hostname) {
try

{ return InetAddress.getByName(hostname).getHostAddress(); - }

catch (Exception e)

{ + }

+ catch (Exception e)

{ return hostname; }

}

Comment by easarina [ 06/Mar/12 ]

For me install-node-dcom failed, if -w <user_name> was not used.

Comment by easarina [ 06/Mar/12 ]

If validate-dcom doesn't work (Access is denied), then DCOM was not enabled correctly (or was not enabled at all). Please see how to enable dcom in the 3.1.2 High Availability Administration Guide. In my environment, both validate-dcom and install-node-dcom work.

asadmin validate-dcom -W password1.txt bigapp-oblade-3
Command validate-dcom executed successfully.

I will open a minor regression bug regarding: install-node-dcom doesn't work without -w <user_name>.

Comment by jp2011 [ 07/Mar/12 ]

I appreciate your thought that I haven't enabled DCOM correctly and I would love it if that was the case but so far there is nothing that has been suggested or mentioned that I haven't already done. I see other people on the internet having the same issue without an answer and would like to help get to the bottom of the issue.

If this is a configuration issue in windows, it isn't currently documented in the 3.1.2 documents. If this is a code issue that I have run into that you haven't, it needs to be identified and fixed for the good of the community. Could you please email me offline and we can better investigate the cause and possible resolution?

Many thanks.

Comment by easarina [ 07/Mar/12 ]

There is no problems in a code, if DCOM works install-node-dcom and other commands work. But DCOM has to be enabled. The 3.1.2 High Availability Administration Guide has a detail instruction how to do it. You need to edit the Windows registry manually. Did you see and execute DCOM enabling procedure?

Comment by jp2011 [ 07/Mar/12 ]

In fact there are issues in code with the DCOM validation and missing Windows configuration steps in the documentation. Byron and I have been discussing this yesterday and he has found another bug when using domain accounts. As far as registry edits, I have already done them as per the documentation and allowed full control to my administrators group (which my glassfish account belongs to).

When using a local account with Windows Server 2008 R2 (64 bit my case) there are missing steps in the documentation. Windows does not allow remote access to c$ by default any longer which is required for the validate-dcom to work.

These steps are validated by using the windows command "net use" in the command prompt. The first example shows default windows behaviour attempting to access the c$ on a remote host. The second example illustrates what happens after an configuration change in Windows that isn't documented by Oracle:

<snip>
C:\Users\GFAdmin>net use \\remotehost\c$ /USER:remotehost\glassfish
The password is invalid for \\remotehost\c$.

Enter the password for 'remotehost\glassfish' to connect to 'remotehost':
System error 5 has occurred.

Access is denied.

C:\Users\GFAdmin>net use \\remotehost\c$ /USER:remotehost\glassfish
Enter the password for 'remotehost\glassfish' to connect to 'remotehost':
The command completed successfully.

C:\Users\GFAdmin>
</snip>

Once you perform the following steps to allow remote access to c$, DCOM works with a local account (almost):

Enable Local Account Token Filter Policy
• Click Start
• Type regedit in the Start Search box, and then click regedit in the Programs list.
• Expand the following subkey:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System
• If the LocalAccountTokenFilterPolicy registry entry does not exist, follow these steps:
a. On the Edit menu, click New, and then click DWORD Value
b. Type LocalAccountTokenFilterPolicy, and then press ENTER
• Right-click LocalAccountTokenFilterPolicy, and then click Modify
• In the Value data box, type 1, and then click OK
• Done, no need to reboot.

However now there is a new problem (PROGRESS!)

Successfully connected to Windows Shares at port 445 on host remotehost.
Successfully accessed C: on remotehost using DCOM.
Successfully wrote delete_me.bat to C: on remotehost using DCOM.
Could not connect to WMI (Windows Management Interface) on remotehost. : Error setting up remote connection to WMI

A quick search shows that Byron has run into this before:
http://java.net/projects/glassfish/lists/commits/archive/2011-11/message/143

At this point I have done a lot of troubleshooting to figure out the remote WMI failures which I cannot document here due to size. Needless to say, something is still either not setup properly or broken in the validation code that Byron is looking at.

Comment by Byron Nevins [ 07/Mar/12 ]

Checked in a fix to the (first) problem where the windows domain was never used for auth/auth.
I checked it in to 4.0 just now and put the fix for 3.1.2 into the pipeline for the first patch release.

Sending common\src\main\java\com\sun\enterprise\util\cluster\windows\io\WindowsRemoteFileSystem.java
Transmitting file data .
Committed revision 52813.

Comment by easarina [ 07/Mar/12 ]

When DCOM enabling problem will be solved, you can give in install-node-dcom a full machine name, i.e. <machne_name.domain>

Comment by easarina [ 09/Mar/12 ]

Byron, could you give me the fix as a patch, so I can verify the fix, using a negative testcase.

Comment by Byron Nevins [ 02/Apr/12 ]

BugDB 13917234

Comment by mtobler [ 22/Jun/12 ]

Is there a release Date for this fix? I am trying to set up Clustering using DCOM on multiple 2008 R2 Servers and I consistently get the following:
asadmin> validate-dcom --passwordfile do-not-delete gf01
remote failure:
Successfully verified that the host, gf01, is not the local machine as required.
Successfully resolved host name to: gf01/172.18.11.169
Successfully connected to DCOM Port at port 135 on host gf01.
Successfully connected to NetBIOS Session Service at port 139 on host gf01.
Successfully connected to Windows Shares at port 445 on host gf01.
The remote file, C: doesn't exist on gf01 : Logon failure: unknown user name or bad password.

I have gone through every document I can find on this issue and have verified all settings/registry kays/etc are correct and this won't go away. I get the same via the console.

Comment by Byron Nevins [ 26/Jun/12 ]

It *is* fixed in the open-source 4.0 codebase. You could always go grab the one changed file and build it. I commented on the filename and the subversion revision number in this issue. There is no date yet for the next 3.x version of GlassFish.

Comment by rdblaha1 [ 04/Oct/13 ]

I do not see a reply to the comment mtobler 22/Jun/12. I receive a similar response for validate-dcom and can find no solution in an forum, blog, or tutorial. In this issue and GLASSFISH-18451 the trail ends June 2012. Where can I find an answer. (Glassfish 3.1.2.2, Java 1.6.0_45).

The only difference for what I am getting returned is the last lines.

Using command line:

C:\glassfish\glassfish3122\glassfish\bin>.\asadmin --port 4892 --passwordfile C:\glassfish\glassfish3122\glassfish\domains\testing_cluster\config\dcom-passwords validate-dcom -w Rick.Blaha -v=true TestWeb2
remote failure:
Successfully verified that the host, TestWeb2, is not the local machine as required.
Successfully resolved host name to: TestWeb2/10.3.30.129
Successfully connected to DCOM Port at port 135 on host TestWeb2.
Successfully connected to NetBIOS Session Service at port 139 on host TestWeb2.
Successfully connected to Windows Shares at port 445 on host TestWeb2.
dcom.no.remote.file.access : Logon failure: unknown user name or bad password.

Command validate-dcom failed.

I only need direction to the document solution whether found in another issue or wherever. If I am completely out of the ballpark tell me that too.





[GLASSFISH-18209] Endless SSH Network Timeout Created: 19/Jan/12  Updated: 14/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b19
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-18185 'update-node-ssh' command hangs when ... Open

 Description   

Endlessly long wait to connect to SSH – when there is no SSH daemon running.

SSHLauncher.java:

{{
private void openConnection() throws IOException

{ boolean isAuthenticated = false; String message= ""; connection = new Connection(host, port); connection.connect(new HostVerifier(knownHostsDatabase)); }}

the connection.connect() call is endless.

While looking at 18185 i saw this as follows:

The remote system is Windows. It is a DCOM node and update-node-ssh is called on it. THe above method tries to connect to an sshd at host:135

It seems to take "forever". I quit waiting and forcibly killed GF to get out of the state.

Recommendation – for a "ssh-ping" 5 or 10 seconds should be plenty.






[GLASSFISH-18206] Endless Network Timeout Created: 18/Jan/12  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17, 4.0_b19
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

GF Admin clustering tasks are far too subject to ridiculously long network timneouts. E.g. in this bug we wait a full 10 minutes to get output from "asadmin version".

How to reproduce:

0. Remote windows box has glassfish installed.
1. validate-dcom works fine from (different) DAS machine
2. Sabotage asadmin.bat on the remote machine so that it hangs. Easy! See [1] below
3. Note how the command hangs for a very very long time.

[1] Add this to asadmin.bat's java call
-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1234

=========================
How did I find this out? Adding the line in[1] is an essential trick for debugging remote GF calls. I forgot about it and left it in.






[GLASSFISH-18185] 'update-node-ssh' command hangs when ssh port is not provided. Created: 13/Jan/12  Updated: 25/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b17.zip


Attachments: JPEG File dcom-to-ssh-error.JPG     Text File server.log.txt    
Issue Links:
Related
is related to GLASSFISH-18209 Endless SSH Network Timeout Open
Tags: 312_gui_new, 312_qa, 3_1_2-exclude, 3_1_2-release-note-added, 3_1_2-release-notes

 Description   

Currently it's not possible to convert a DCOM node to an SSH node in Admin Console. For an existing DCOM node, when user changes to SSH (and selects password authentication in my case), and hits Save, the long running process popup is there for a long time, about 15 minutes. Then the following error is displayed:

An error has occurred
Check server log for more information.

The server.log file contains:

[#|2012-01-12T17:46:55.953-0800|WARNING|glassfish3.1.2|javax.enterprise.system.t
ools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=97;_ThreadName=Thread-2
;|Could not connect to host jed-asqe-43 using SSH.: There was a problem while co
nnecting to jed-asqe-43:135: Operation interrupted: host=jed-asqe-43 port=135 us
er=j2eetest password=<concealed> keyFile=/export/home/j2eetest/.ssh/id_rsa keyPa
ssPhrase=<concealed> authType=null knownHostFile=/export/home/j2eetest/.ssh/know
n_hosts|#]

[#|2012-01-12T17:46:55.953-0800|SEVERE|glassfish3.1.2|org.glassfish.admingui|_Th
readID=98;_ThreadName=Thread-2;|java.io.InterruptedIOException: Operation interr
upted;
java.io.InterruptedIOException: Operation interrupted;
restRequest: endpoint=https://localhost:4848/management/domain/nodes/node/jedy/u
pdate-node-ssh
attrs={sshpassword=*******, installdir=C:\as\dcomtest\glassfish3, nodehost=jed-a
sqe-43, sshuser=${user.name}}
method=POST|#]

The conversion works in CLI with the following command:

asadmin update-node-ssh --nodehost <host name> --sshport 22 --sshuser <user name> <node name>

However, if I execute the following in CLI, to convert DCOM to SSH node, it also hangs and then fails:

asadmin update-node-ssh --nodehost <host name> <node name>

Thus my guess is that Admin Console does not pass along the other two options (that should not be required). Assigning to Admin Console to verify and pass on to CLI, if this is the case. I understand this will most likely not get fixed for this release.



 Comments   
Comment by Anissa Lam [ 13/Jan/12 ]

As shown in the log Lidia pasted, the REST request sent is:

restRequest: endpoint=https://localhost:4848/management/domain/nodes/node/jedy/update-node-ssh
attrs={sshpassword=*******, installdir=C:\as\dcomtest\glassfish3, nodehost=jed-asqe-43, sshuser=${user.name}}
method=POST|#]
so, console is sending in all the info that user enters. At that time, Lidia probably didn't set the ssh-port. I verified that if sshport is specified, it will be sent in as well. I believed console is doing the correct thing.

Transfer to backend for evaluation.

Comment by lidiam [ 13/Jan/12 ]

That's correct, I did not set the ssh port since the default is always set to 22 and there was no need to change that.

Interesting thing to note is that if I create a new SSH node ssh port field is prepopulated to 22. If I choose to convert CONFIG node in Admin Console, ssh port is populated with 22, however, when I choose to convert DCOM node to SSH, ssh port field is not populated, so there is an inconsistency here. In fact if I enter ssh port in Admin Console when converting DCOM node to SSH, it works fine - we can document it as a workaround.

There are two issues here then:

1. ssh port is not populated when switching from DCOM to SSH node.
2. update-node-ssh command hangs when ssh port is not provided.

Comment by Anissa Lam [ 13/Jan/12 ]

I will file a separate P4 bug about sshport not populated when converting from DCOM to SSH.

I changed the summary of this bug to correctly reflect the issue.

This affects both CLI and GUI. I don't know how often user will convert DCOM node to SSH node, but if we decide to release note this, the work around is

  • ensure 'sshport' param is specified if using CLI
  • fill in port number (default is 22) when doing that in GUI.
Comment by Byron Nevins [ 18/Jan/12 ]

The problem is that a DCOM node has the "sshport" set to 135. When you run update-node-ssh it can't tell apart these 2 scenarios:

1) You're running the command on an existing ssh node that happens to use port 135 instead of 2
2) You're converting from a DCOM node that always has 135 set as the port number.

============

This is a gray area. Technically the software did precisely what you asked it to do – it updated the node config with the data you gave it and only the data you gave it. That's how it was designed.

It should be fixed in 4.0.

Recommended Fix

DCOM shouldn't bother with the port setting of 135. That's in a much lower abstraction - we never have to deal with the port number. DCOM should simply never use the sshport field.





[GLASSFISH-18182] Error message too long, hard to read in Admin Console Created: 13/Jan/12  Updated: 13/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b17
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: lidiam Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

build ogs-3.1.2-b17.zip


Attachments: JPEG File node-create-error.JPG    
Tags: 312_qa

 Description   

Currently when user tries to install a node on a remote host to a directory where glassfish is already installed the following is printed in Admin Console:

An error has occurred
Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed. Ignoring unrecognized element schedules at Line number = 57 Column number = 18 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3425 Ignoring unrecognized element backup-configs at Line number = 62 Column number = 23 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3634 The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

It is hard to see the actual cause of the problem. We should, 1. in the least print the last sentence on a line by itself but 2. ideally not include the information in between. Hence it would be:

1.
An error has occurred
Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed. Ignoring unrecognized element schedules at Line number = 57 Column number = 18 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3425 Ignoring unrecognized element backup-configs at Line number = 62 Column number = 23 System Id = file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml Public Id = null Location Uri= file:/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/config/domain.xml CharacterOffset = 3634

The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

2.
An error has occurred

Successfully connected to j2eetest@tuppy using keyfile /export/home/j2eetest/.ssh/id_rsa Command install-node-ssh failed.

The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it.

I'm attaching screenshot of the error message. I understand it is too late to fix this issue for this release, hence logging as minor. This issue surfaced after fixing http://java.net/jira/browse/GLASSFISH-18037.






[GLASSFISH-18124] Cannot start cluster - synchronization fails Created: 05/Jan/12  Updated: 09/Jan/12  Resolved: 09/Jan/12

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: Tom Mueller
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b16.zip on solaris


Attachments: Text File server.log.txt     JPEG File synchronization-error.JPG    
Tags: 312_gui_new, 312_qa

 Description   

I have a cluster with two instances: one on localhost, another one on a remote solaris machine on an SSH node. This cluster was running in the past. I can no longer start the cluster with the following error:

cll2: Could not start instance cll2 on node localhost-domain1 (localhost). Command failed on node localhost-domain1 (localhost): Previous synchronization failed at Jan 4, 2012 5:28:26 PM Will perform full synchronization. Removing all cached state for instance cll2. Command start-local-instance failed. CLI802 Synchronization failed for directory config, caused by: remote failure: Command timed out. Unable to acquire a lock to access the domain. Another command acquired exclusive access to the domain on Wed, 04 Jan 2012 17:28:56 PST. Retry the command at a later time. To complete this operation run the following command locally on host localhost from the GlassFish install location /export/home/j2eetest/3.1.2/glassfish3: bin/asadmin start-local-instance --node localhost-domain1 --sync normal cll2 clt1: Could not start instance clt1 on node tuppy (tuppy). Command failed on node tuppy (tuppy): Previous synchronization failed at Jan 4, 2012 6:04:59 PM Will perform full synchroni .... msg.seeServerLog

I'will attach screenshot and server.log. I am not certain what is causing this issue but I was testing Download Logs in Admin Console before hitting this issue. I'm going to leave the machine intact, in case anyone wants to poke around. This issue may not be easily reproducible.



 Comments   
Comment by lidiam [ 05/Jan/12 ]

I had another cluster with two instances on a DCOM node. After removing this cluster, I could start the solaris/ssh cluster without problems. This seems to have been caused by issue http://java.net/jira/browse/GLASSFISH-18123 - trying to download log files for a server instance on a DCOM node.

Comment by Anissa Lam [ 05/Jan/12 ]

Console is showing whatever backend gives back.
Transfer to backend for evaluation.

Comment by lidiam [ 05/Jan/12 ]

I tried several times, but cannot reproduce it. I see messages like:

[#|2012-01-04T21:34:23.865-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=286;_ThreadName=Thread-2;|Warning: Synchronization with DAS failed, continuing startup...

but cluster starts fine after that.

Comment by Byron Nevins [ 09/Jan/12 ]

This bug seems to be related to domain-locking and synchronization.

Comment by Tom Mueller [ 09/Jan/12 ]

The submitter of this issue reported that this issue cannot be reproduced.
Marking this as "Cannot Reproduce".

If this issue shows up again, please reopen the issue.





[GLASSFISH-18121] Intermittent: cannot start instance on a remote node: server requires a password Created: 05/Jan/12  Updated: 10/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: lidiam
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b16.zip, DAS on solaris, node on WinXP


Attachments: Text File server.log     Text File server.log.clusteredinstance.txt    
Tags: 312_gui_new, 312_qa, 3_1_2-exclude

 Description   

This is an intermittent issue but it happened twice already. I create a standalone instance on a DCOM node and cannot start it with the following in the server.log of the instance:

[#|2012-01-04T15:44:32.265-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.to
ols.admin.com.sun.enterprise.container.common|_ThreadID=10;_ThreadName=Thread-2;

The server requires a valid admin password to be set before it can start. Pleas
e set a password using the change-admin-password command.
#]

I have a cluster with two instances on the same node running fine. I also created another standalone instance and it started fine. I'm accessing Admin Console on solaris from the windows box, so secure admin and domain password are both set. I'll attach server.log.



 Comments   
Comment by lidiam [ 05/Jan/12 ]

I just got this error for the 3rd time but this time with an SSH node. I had a cluster with two instances, one on localhost one on an ssh node (solaris). The cluster was running fine. I stopped it and added another instance on the ssh node. When I tried to start cluster the newly added server instance failed to start with the following error in Admin Console:

Command succeeded with Warning
clt2: Could not start instance clt2 on node tuppy (tuppy). Command failed on node tuppy (tuppy): Warning: Synchronization with DAS failed, continuing startup... Waiting for clt2 to start .....................................Command start-local-instance failed. Error starting instance clt2. The server exited prematurely with exit code 0. Before it died, it produced the following output: Launching GlassFish on Felix platform Jan 4, 2012 10:31:54 PM com.sun.common.util.logging.LoggingConfigImpl copyLoggingPropertiesFile WARNING: Logging.properties file not found, creating new file using DAS logging.properties [#|2012-01-04T22:31:55.037-0800|INFO|glassfish3.1.2|com.sun.enterprise.server.logging.GFFileHandler|_ThreadID=1;_ThreadName=main;|Running GlassFish Version: GlassFish Server Open Source Edition 3.1.2-b16 (build 16)|#] [#|2012-01-04T22:31:57.647-0800|INFO|glassfish3.1.2|javax.enterprise.system.core.transaction.com.sun.jts.CosTransactions|_ThreadID=10;_ThreadName=main;|JTS5014: Reco .... msg.seeServerLog

Instance's server.log contained the same error and exception:

[#|2012-01-04T22:31:59.029-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.to
ols.admin.com.sun.enterprise.container.common|_ThreadID=10;_ThreadName=Thread-2;

The server requires a valid admin password to be set before it can start. Pleas
e set a password using the change-admin-password command.
#]

[#|2012-01-04T22:31:59.031-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.co
re.com.sun.enterprise.v3.services.impl|_ThreadID=10;_ThreadName=Thread-2;|Unable
to start v3. Closing all ports
org.jvnet.hk2.component.ComponentException: injection failed on com.sun.enterpri
se.v3.admin.AdminAdapter.authenticator with interface org.glassfish.internal.api
.AdminAccessController
at org.jvnet.hk2.component.InjectionManager.error_injectionException(Inj
ectionManager.java:284)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java
:165)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java
:93)
at com.sun.hk2.component.AbstractCreatorImpl.inject(AbstractCreatorImpl.
java:126)
at com.sun.hk2.component.ConstructorCreator.initialize(ConstructorCreato
r.java:91)

Comment by lidiam [ 05/Jan/12 ]

Attaching log file for clustered instance on an ssh node that fails to start.

Comment by Byron Nevins [ 09/Jan/12 ]

This will take more time to hunt-down than is available before HCF for 3.1.2.

Comment by Byron Nevins [ 10/Jan/12 ]

I can't reproduce this. Please do the following. We need to make sure it's a Dcom issue.

When it happens again simply run start-local-instance directly on the remote machine. Use the --verbose option

What does it say?





[GLASSFISH-18098] Display message that instance does not exist when collecting logs Created: 30/Dec/11  Updated: 09/Feb/12  Resolved: 09/Feb/12

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b16
Fix Version/s: 3.1.2_b18

Type: Bug Priority: Major
Reporter: lidiam Assignee: naman_mehta
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b16.zip


Attachments: JPEG File instance-startup-failure.JPG     File server.log.das     File server.log.instance.startupfailure    
Tags: 312_gui_new, 312_qa, 312_verified

 Description   

Create a standalone instance and do not start it. Go to Domain -> Domain Logs tab and select the newly created instance. Click Download button. The browser will be spinning for a very long time waiting for DAS (e.g. more than 10 minutes) and eventually come back with nothing. Instead, we should check if the instance directory exists and display a message to the user if it does not.



 Comments   
Comment by lidiam [ 30/Dec/11 ]

The following exception is printed to the DAS server.log:

[#|2011-12-29T16:27:05.931-0800|WARNING|glassfish3.1.2|com.sun.grizzly.config.Gr
izzlyServiceListener|_ThreadID=13;_ThreadName=Thread-2;|GRIZZLY0023: Interruptin
g idle Thread: admin-thread-pool-4848(12).|#]

[#|2011-12-29T16:27:05.935-0800|WARNING|glassfish3.1.2|javax.enterprise.system.c
ontainer.web.com.sun.enterprise.web|_ThreadID=29;_ThreadName=Thread-2;|StandardW
rapperValve[DownloadServlet]: PWC1406: Servlet.service() for servlet DownloadSer
vlet threw exception
java.lang.RuntimeException: com.sun.jersey.api.client.ClientHandlerException: ja
va.io.InterruptedIOException: Operation interrupted
at org.glassfish.admingui.common.servlet.LogFilesContentSource.getInputS
tream(LogFilesContentSource.java:110)
at org.glassfish.admingui.common.servlet.DownloadServlet.writeContent(Do
wnloadServlet.java:277)

Will attach server.log for details.

Comment by lidiam [ 30/Dec/11 ]

I'm getting an exception trying to attach server.log, so here is the full exception:

[#|2011-12-29T16:27:05.935-0800|WARNING|glassfish3.1.2|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=29;_ThreadName=Thread-2;|StandardWrapperValve[DownloadServlet]: PWC1406: Servlet.service() for servlet DownloadServlet threw exception
java.lang.RuntimeException: com.sun.jersey.api.client.ClientHandlerException: java.io.InterruptedIOException: Operation interrupted
at org.glassfish.admingui.common.servlet.LogFilesContentSource.getInputStream(LogFilesContentSource.java:110)
at org.glassfish.admingui.common.servlet.DownloadServlet.writeContent(DownloadServlet.java:277)
at org.glassfish.admingui.common.servlet.DownloadServlet.doPost(DownloadServlet.java:174)
at org.glassfish.admingui.common.servlet.DownloadServlet.doGet(DownloadServlet.java:155)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1542)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:281)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:161)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:331)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:232)
at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:849)
at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:746)
at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1045)
at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:228)
at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.io.InterruptedIOException: Operation interrupted
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.filter.CsrfProtectionFilter.handle(CsrfProtectionFilter.java:97)
at com.sun.jersey.api.client.filter.CsrfProtectionFilter.handle(CsrfProtectionFilter.java:97)
at com.sun.jersey.api.client.filter.CsrfProtectionFilter.handle(CsrfProtectionFilter.java:97)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at org.glassfish.admingui.common.util.RestUtil.postRestRequestFromServlet(RestUtil.java:712)
at org.glassfish.admingui.common.servlet.LogFilesContentSource.getInputStream(LogFilesContentSource.java:106)
... 28 more
Caused by: java.io.InterruptedIOException: Operation interrupted
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:695)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:660)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 37 more

#]
Comment by lidiam [ 31/Dec/11 ]

Consequently if I try to start the created server instance, it fails with the following in server.log:

[#|2011-12-30T17:13:53.734-0800|SEVERE|glassfish3.1.2|javax.enterprise.system.to
ols.admin.com.sun.enterprise.container.common|_ThreadID=10;_ThreadName=Thread-2;|The server requires a valid admin password to be set before it can start. Pleas
e set a password using the change-admin-password command.|#]

I created another instance on the same DCOM node and started it fine from Admin Console. However, if I create an instance, try to collect log files and then try to start it, startup fails with the above error (server.log attached).

Comment by lidiam [ 31/Dec/11 ]

Attaching DAS server.log.

Comment by andriy.zhdanov [ 02/Jan/12 ]

Sorry, can't reproduce any of the two problems on glassfish-3.1.2-b16, and can't understand how this could happen, or how second problem can relate to collecting log files.

Would it be possible to get access to an instance where the problem reproduces?

Comment by Anissa Lam [ 02/Jan/12 ]

If the instance is created using "localhost-domain1", and try to do collect-logs right after it is created, before starting it or restarting DAS, then I see a popup saying error occurs during download. I think thats correct and acceptable.
"java.lang.RuntimeException: Error while downloading log files from instance1"

If the instance is created using SSH node, the same experience as using localhost-domain1. see the popup error.

If the instance is created using DCOM node, then i see 'waiting for localhost' for a long time and nothing happens. Like Lidia described.
If i use CLI to do it,
%asadmin collect-log-files --target dcom-instance1

the command hangs and nothing comes back. Thats probably what is causing the behavior in GUI. Console is waiting for the command to finish.
Since this is specific to DCOM instance, transfer to Byron to take a look.

Andriy, I will send you the machine name and password to create a DCOM node so you can reproduce that too.

Comment by naman_mehta [ 09/Jan/12 ]

It's depends on 18055 so once 18055 is closed it's not now reproducible.

Comment by lidiam [ 09/Feb/12 ]

Reopening to mark as fixed/verified. Issue was certainly reproducible on earlier builds.

Comment by lidiam [ 09/Feb/12 ]

Fixed as part of issue 18055.

Comment by lidiam [ 09/Feb/12 ]

Verified in build ogs-3.1.2-b21.zip





[GLASSFISH-18094] Need Status if _create-instance-filesystem fails Created: 29/Dec/11  Updated: 29/Dec/11  Resolved: 29/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b17, 4.0_b17

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on GLASSFISH-18084 das.properties needs work Open
Tags: 3_1_2-review

 Description   

1) make sure that a remote DCOM node is unable to call back to DAS. Pretty easy. Have DAS on a different domain, or not findable in DNS.
Very simple way – setup DAS on a laptop with a dynamic IP address. The instance machine won't find it.

2) run create-instance and it will fail to create the crucial das.properties file yet it returns success

Problem: We don't know how to get return status back from remote DCOM commands

Fix: make sure all the directories and files were created



 Comments   
Comment by Byron Nevins [ 29/Dec/11 ]

What is the impact on the customer of the bug?
Currently this happens:
create-instance will report that the instance was created successfully even though the local stuff didn't work

How likely is it that a customer will see the bug and how serious is the bug?
He will always see it.

Is it a regression? Does it meet other bug fix criteria (security, performance, etc.)?
New feature.
What is the cost/risk of fixing the bug?
1 day

How risky is the fix? How much work is the fix? Is the fix complicated?
1 day. Fairly simple. Essentially no risk.

Is there an impact on documentation or message strings?
No

Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
QA should re-run DCOM tests to verify

Which is the targeted build of 3.1.2 for this fix?
B16

Comment by Byron Nevins [ 29/Dec/11 ]

I left out of the description – the problem is that after it fails on the remote node it is reported as a success to the user. And there is NO WAY the instance can run.

Comment by Byron Nevins [ 29/Dec/11 ]

Now the failure is noticede and this message is presented:

D:\gf\branches\3.1.2\cluster\ssh>\glassfish3\glassfish\bin\asadmin -W d:\pw create-instance --node laptop_node fubar300
Successfully created instance fubar300 in the DAS configuration, but failed to create the instance files on node laptop_node (bigapp-oblade-3).

Command failed on node laptop_node (bigapp-oblade-3): Command _create-instance-filesystem failed.

To complete this operation run the following command locally on host bigapp-oblade-3 from the GlassFish install location c:/glassfish3:

bin/asadmin --host WNEVINS-LAP --port 4848 create-local-instance --node laptop_node fubar300

===========

Notice how "WNEVINS-LAP" will never work from the instance machine which is sitting in a Lab inside Oracle.

Comment by Byron Nevins [ 29/Dec/11 ]

Another enhancement – the created instance dir should be whacked when there is a failure contacting DAS

Comment by Byron Nevins [ 29/Dec/11 ]

Added utility methods:

D:\gf\branches\3.1.2\cluster\common>svn commit . D:\gf\trunk\main\nucleus\cluster\common
Adding branches\3.1.2\cluster\common\src\main\java\com\sun\enterprise\util\cluster\Paths.java
Adding branches\3.1.2\cluster\common\src\test
Adding branches\3.1.2\cluster\common\src\test\java
Adding branches\3.1.2\cluster\common\src\test\java\com
Adding branches\3.1.2\cluster\common\src\test\java\com\sun
Adding branches\3.1.2\cluster\common\src\test\java\com\sun\enterprise
Adding branches\3.1.2\cluster\common\src\test\java\com\sun\enterprise\util
Adding branches\3.1.2\cluster\common\src\test\java\com\sun\enterprise\util\cluster
Adding branches\3.1.2\cluster\common\src\test\java\com\sun\enterprise\util\cluster\PathsTest.java
Adding trunk\main\nucleus\cluster\common\src\main\java\com\sun\enterprise\util\cluster\Paths.java
Adding trunk\main\nucleus\cluster\common\src\test
Adding trunk\main\nucleus\cluster\common\src\test\java
Adding trunk\main\nucleus\cluster\common\src\test\java\com
Adding trunk\main\nucleus\cluster\common\src\test\java\com\sun
Adding trunk\main\nucleus\cluster\common\src\test\java\com\sun\enterprise
Adding trunk\main\nucleus\cluster\common\src\test\java\com\sun\enterprise\util
Adding trunk\main\nucleus\cluster\common\src\test\java\com\sun\enterprise\util\cluster
Adding trunk\main\nucleus\cluster\common\src\test\java\com\sun\enterprise\util\cluster\PathsTest.java
Transmitting file data ....

Comment by Byron Nevins [ 29/Dec/11 ]

related

Comment by Byron Nevins [ 29/Dec/11 ]

D:\gf\branches\3.1.2\cluster>svn commit -F commit.txt ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java D:\gf\trunk\main\nucleu
s\cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java
Sending branches\3.1.2\cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java
Sending trunk\main\nucleus\cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java
Transmitting file data ..
Committed revision 51823.





[GLASSFISH-18084] das.properties needs work Created: 23/Dec/11  Updated: 18/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b15, 4.0_b16
Fix Version/s: future release

Type: Bug Priority: Critical
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-18094 Need Status if _create-instance-files... Resolved
Tags: 3_1_2-exclude

 Description   

Say I have this setup:

DAS is running on my laptop which is connected via VPN to a huge corporation. Let's say Oracle. My official hostname is "laptop"
The remote computer that will host instances is sitting at the Corporation behind the firewall.

Now I create a node and an instance on the remote computer using SSH or DCOM. That creates a file called

das.properties

Inside das.properties is the information to call back to DAS. In my case here the hostname is laptop. There are three problems:

(1) the hostname "laptop" is useless. There is NO WAY the remote machine can find its way to my laptop with that name. It would have to have an IP address.

(2) There was NO HANDSHAKE when the instance was created! The command should have failed. The user has no idea that there is no way for the remote to call DAS back ever.

(3) this also happens across domains. E.g. if the 2 machines have these names that are in DNS – it still won't work:
somehost.in.oracle.com
another.us.oracle.com

Why? The domain gets chopped off. If 'another' is the remote it will look for DAS at
somehost.us.oracle.com

(4) I ran it with a secure DAS – yet isSecure is set to false in DAS.properties.
(5) The protocol is set http. Shouldn't it be https?

WORKAROUND:
After das.properties is created, hand-edit the hostname to something that the remote machine can access.



 Comments   
Comment by kshitiz_saxena [ 04/Jan/12 ]

Instead of manual edit, set address for admin-listener in DAS :
asadmin set configs.config.server-config.network-config.network-listeners.network-listener.admin-listener.address=<YOUR IP ADDRESS>

This works.

Comment by Joe Di Pol [ 30/Jan/12 ]

Not a 3.1.2 stopper





[GLASSFISH-18083] Help can be wrong Created: 23/Dec/11  Updated: 14/Feb/13

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.1, 4.0_b01
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to GLASSFISH-17421 DAS sometimes uses wrong DAS hostname... Open
Tags: 3_1_2-exclude

 Description   

I ran create-instance from a remote computer using a "config" node (not DCOM or SSH!)

The instance goo1 was registered with the DAS. You must run the following command on the instance host to complete the instance creation:
bin/asadmin --host WNEVINS-LAP --port 4848 create-local-instance --node mynode goo1
Command create-instance executed successfully.

============

Unfortunately "wnevins-lap" is greek to the remote computer. It could find DAS via IP address but there is ZERO chance of finding it with that name.

Scenario DAS is on a laptop connected to OWAN via VPN
remote is hard-wired on OWAN

recommend changing or supplanting the host address.



 Comments   
Comment by Joe Di Pol [ 29/Dec/11 ]

The value for the --host option is returned by:

Server.getAdminHost()

It appears as though in the submitters case this method is not returning a valid hostname for their Windows latop OS configuration. For Windows machines configured to operate as a server (static IP and hostname assigned), this usually works fine.

This problem is related to issue GLASSFISH-17421. The work-around is to configure the IP address explicitly for the admin adaptor on the DAS (instead of using 0.0.0.0).

Will not address in 3.1.2

Comment by gfuser9999 [ 03/Apr/12 ]

Actually the admin-listener is the one that determines
the contact name of the DAS. So all the
information is there since the admin-listener
permit one to set the "server-name"
Unfortunately even if this is set, the code
does not take/make use of this.

In the same token, when accessing the say
http://das.foo.com:4848 it will issue a redirect
and this does a redirect without FQDN
and goes to "https://das:4848" IF HTTP/1.0
protocol is forces. (same issue here
server-name not made use of)





[GLASSFISH-18081] validate-dcom has too much info in its output Created: 23/Dec/11  Updated: 28/Dec/11  Resolved: 27/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b15, 4.0_b16
Fix Version/s: 3.1.2_b16

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

THere are two output levels for the command. Normal and --verbose.

1) Add a third one – debug which is triggered, like manyother commands by the env. var AS_DEBUG.
2) Make the current verbose output the debug output
3) make the current normal output the verbose output
4) make a very simple terse output the normal output

This only applies to success. If there is a failure give back as much info as is practicable.

Thanks to Nazrul Islam for this idea. This is why it is good for "fresh" eyes to look in...



 Comments   
Comment by Byron Nevins [ 23/Dec/11 ]

The 3 outputs of interest before making any changes. One per comment. This is the 'normal' success output:

d:\gf>vd bigapp-oblade-1
asadmin -W d:\pw validate-dcom -w hudson bigapp-oblade-1

Successfully verified that the host, bigapp-oblade-1, is not the local machine as required.
Successfully resolved host name to: bigapp-oblade-1/10.133.184.150
Successfully connected to DCOM Port at port 135 on host bigapp-oblade-1.
Successfully connected to NetBIOS Session Service at port 139 on host bigapp-oblade-1.
Successfully connected to Windows Shares at port 445 on host bigapp-oblade-1.
Successfully accessed C: on bigapp-oblade-1 using DCOM.
Successfully wrote delete_me.bat to C: on bigapp-oblade-1 using DCOM.
Successfully accessed WMI (Windows Management Interface) on bigapp-oblade-1. There are 83 processes running on bigapp-oblade-1.
Successfully ran the test script on bigapp-oblade-1 using DCOM.
The script simply ran the DIR command. Here are the first few lines from the output of the dir command on the remote machine:

C:\Windows\system32>dir C:\
Volume in drive C has no label.
Volume Serial Number is 6028-F4DB

Directory of C:\

12/05/2011 10:21 AM 63 .asadminpass
12/10/2011 12:50 AM 755 .asadmintruststore
12/19/2011 02:04 PM <DIR> aroot
12/22/2011 01:37 AM 49,083 cli.log
12/13/2011 06:29 PM 49,585 cli.log.run1

Verified that a JDK is installed and available in the Path on bigapp-oblade-1. javac -version returned this: javac 1.7.0_02

Comment by Byron Nevins [ 23/Dec/11 ]

This is current verbose output:

Successfully verified that the host, bigapp-oblade-1, is not the local machine as required.
Successfully resolved host name to: bigapp-oblade-1/10.133.184.150
Successfully connected to DCOM Port at port 135 on host bigapp-oblade-1.
Successfully connected to NetBIOS Session Service at port 139 on host bigapp-oblade-1.
Successfully connected to Windows Shares at port 445 on host bigapp-oblade-1.
Successfully accessed C: on bigapp-oblade-1 using DCOM.
Successfully wrote delete_me.bat to C: on bigapp-oblade-1 using DCOM.
Below are the command lines for all the remote processes that have one:
CommandLine = "\\SystemRoot\\System32
smss.exe";
CommandLine = "%SystemRoot%\\system32
csrss.exe ObjectDirectory=
Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll
=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileControl=Off MaxReque
stThreads=16";
CommandLine = "%SystemRoot%\\system32
csrss.exe ObjectDirectory=
Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll
=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileControl=Off MaxReque
stThreads=16";
CommandLine = "wininit.exe";
CommandLine = "winlogon.exe";
CommandLine = "C:\\Windows\\system32
services.exe";
CommandLine = "C:\\Windows\\system32
lsass.exe";
CommandLine = "C:\\Windows\\system32
lsm.exe";
CommandLine = "C:\\Windows\\system32
svchost.exe -k DcomLaunch";
CommandLine = "C:\\Windows\\system32
svchost.exe -k RPCSS";
CommandLine = "\"LogonUI.exe\" /flags:0x0";
CommandLine = "C:\\Windows\\System32
svchost.exe -k LocalServiceNetworkRestricted";
CommandLine = "C:\\Windows\\system32
svchost.exe -k netsvcs";
CommandLine = "C:\\Windows\\system32
svchost.exe -k LocalService";
CommandLine = "C:\\Windows\\System32
svchost.exe -k LocalSystemNetworkRestricted";
CommandLine = "C:\\Windows\\system32
svchost.exe -k NetworkService";
CommandLine = "\"C:
Program Files (x86)
Common Files
Symantec Shared\\ccSvcHst.exe\" /h ccCommon";
CommandLine = "C:\\Windows\\system32
svchost.exe -k LocalServiceNoNetwork";
CommandLine = "C:\\Windows\\System32
spoolsv.exe";
CommandLine = "\"C:
Program Files (x86)
Symantec AntiVirus\\DefWatch.exe\"";
CommandLine = "C:\\OracleATS\\agentmanager\\bin
AgentManagerService.exe -s C:\\OracleATS\\agentmanager\\bin\\\\AgentManagerService.conf";
CommandLine = "C:\\OracleATS\\openScript\\HelperService\\bin
wrapper.exe -s C:\\OracleATS\\openScript\\HelperService\\conf
wrapper.conf";
CommandLine = "\"..\\..\\jdk\\jre\\bin\\java\" -Doracle.config=./Wrapper.dll -Doracle.agent.dir=../../OFT -Doracle.agent.dir.1=../../agent -Doracl
e.agent.dir.2=../../DataCollector/bin -Doracle.ignoreChecksum=true -Xms16m -Xmx64m -Djava.library.path=\".;..;../../lib;../../OFT;../../oats/lib\" -cl
asspath \"./AgentManagerService.jar;../AgentManager.jar;../../lib/log4j-1.2.16.jar;../../lib/bcprov-jdk16-145.jar;../../lib/wlfullclient.jar;../../lib
/Utilities.jar;../../lib/Framework.jar;../../lib/common_ejbClient.jar;../../lib/common_ejb.jar;../processDescriptors/ProcessDescriptors.jar\" -Dwrappe
r.key=\"Wc9CKVeOSSU2o70y\" -Dwrapper.port=1787 -Dwrapper.use_system_time=\"TRUE\" -Dwrapper.version=\"3.1.2\" -Dwrapper.native_library=\"wrapper\" -Dw
rapper.service=\"TRUE\" -Dwrapper.cpu.timeout=\"10\" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp oracle.oats.empstart.EmpStartMain 9
001";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "C:\\ORACLE~1\\wls\\wlserver\\server\\bin
beasvc.exe";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "C:\\STEMAgent\\agent10g\\bin
nmesrvc.exe ";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "c:\\stemag~1\\agent10g\\ccr\\bin
nmz.exe c:\\stemag~1\\agent10g\\ccr\\hosts
bigapp-oblade-1.us.oracle.com";
CommandLine = "\"..\\..\\jre\\bin\\java\" -Dfile.encoding=\"UTF-8\" -Djava.library.path=\"../lib\" -classpath \"../lib/wrapper.jar;../helperServic
e.jar;../log4j.jar\" -Dwrapper.key=\"ZPXm1Dkdr8GmapfT\" -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=1
352 -Dwrapper.version=\"3.3.2\" -Dwrapper.native_library=\"wrapper\" -Dwrapper.service=\"TRUE\" -Dwrapper.cpu.timeout=\"10\" -Dwrapper.jvmid=1 org.tan
ukisoftware.wrapper.WrapperSimpleApp oracle.oats.scripting.modules.functionalTest.helperService.MainClass";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "c:\\oracleats\\oxe\\app\\oracle\\product\\10.2.0\\server\\bin
ORACLE.EXE XE";
CommandLine = "C:\\Windows\\system32
svchost.exe -k regsvc";
CommandLine = "C:\\cygwin\\bin
cygrunsrv.exe";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "C:\\cygwin\\usr\\sbin
sshd.exe -D";
CommandLine = "\"C:
Program Files (x86)
Symantec AntiVirus\\Rtvscan.exe\"";
CommandLine = "cmd /c \"\"C:\\STEMAgent\\agent10g\\bin\\emctl.bat\" istart agent\"";
CommandLine = "C:\\STEMAgent\\agent10g\\\\perl\\5.8.3\\bin\\MSWin32-x64-multi-thread
perl.exe C:\\STEMAgent\\agent10g\\bin
emwd.pl agent";
CommandLine = "C:\\STEMAgent
agent10g/bin/emagent";
CommandLine = "C:\\Windows\\System32
svchost.exe -k termsvcs";
CommandLine = "C:\\Windows\\system32
svchost.exe -k NetworkServiceNetworkRestricted";
CommandLine = "C:\\Windows\\System32
msdtc.exe";
CommandLine = "%SystemRoot%\\system32
csrss.exe ObjectDirectory=
Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll
=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileControl=Off MaxReque
stThreads=16";
CommandLine = "winlogon.exe";
CommandLine = "taskhost.exe USER";
CommandLine = "rdpclip";
CommandLine = "\"C:\\Windows\\system32\\Dwm.exe\"";
CommandLine = "C:\\Windows
Explorer.EXE";
CommandLine = "\"C:
Program Files (x86)
Common Files
Symantec Shared\\ccApp.exe\" ";
CommandLine = "\"C:\\Windows\\system32\\cmd.exe\" ";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:\\Windows\\system32\\cmd.exe\" ";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:\\Windows\\system32\\taskmgr.exe\" ";
CommandLine = "\"C:
Program Files\\Java\\jre6\\bin\\java.exe\" -cp C:/export/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOp
tions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -client -javaagent:C:/export/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -Dosgi.shell.tel
net.maxconn=1 -Dfelix.fileinstall.disableConfigSave=false -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Dfelix.fileinstall.dir=C:\\export
glassf
ish3
glassfish/modules/autostart/ -Djavax.net.ssl.keyStore=C:\\export\\glassfish3\\glassfish\\domains
domain1/config/keystore.jks -Dosgi.shell.telne
t.port=6666 -Djava.security.policy=C:\\export\\glassfish3\\glassfish\\domains
domain1/config/server.policy -Djava.awt.headless=true -Dfelix.fileinsta
ll.log.level=2 -Dfelix.fileinstall.poll=5000 -Dcom.sun.aas.instanceRoot=C:\\export\\glassfish3\\glassfish\\domains
domain1 -Dosgi.shell.telnet.ip=127
.0.0.1 -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory -Djava.end
orsed.dirs=C:\\export\\glassfish3\\glassfish/modules/endorsed;C:\\export\\glassfish3
glassfish/lib/endorsed -Dcom.sun.aas.installRoot=C:\\export
gla
ssfish3
glassfish -Dfelix.fileinstall.bundles.startTransient=true \"-Djava.ext.dirs=C:
Program Files\\Java\\jre6/lib/ext;C:
Program Files\\Java
jr
e6/jre/lib/ext;C:\\export\\glassfish3\\glassfish\\domains\\domain1/lib/ext\" -Dfelix.fileinstall.bundles.new.start=true -Djavax.net.ssl.trustStore=C:\
\export\\glassfish3\\glassfish\\domains
domain1/config/cacerts.jks -Dcom.sun.enterprise.security.httpsOutboundKeyAlias=s1as -Djava.security.auth.logi
n.config=C:\\export\\glassfish3\\glassfish\\domains
domain1/config/login.conf DANTLR_USE_DIRECT_CLASS_LOADING=true -Dgosh.args=-nointeractive \"-Dj
ava.library.path=C:/export/glassfish3/glassfish/lib;C:/Windows/System32;C:/export/appserver-sqe/ee/cluster_inf;C:/Windows/Sun/Java/bin;C:/Windows;C:/b
in:/bin: C:/export/glassfish3/glassfish/bin:C:/aroot/jdk1.7.0_02/bin;C:/Perl64/site/bin;C:/Perl64/bin;C:/cygwin/bin;C:/OracleATS/oxe/app/oracle/produc
t/10.2.0/server/BIN;C:/Windows/System32/wbem;C:/Windows/System32/WindowsPowerShell/v1.0;C:/aroot/jdk1.7.0_02/bin;C:/export/glassfish3/glassfish/bin\"
com.sun.enterprise.glassfish.bootstrap.ASMain domainname domain1 -asadmin-args --host,,,localhost,,,port,,,4848,,,user,,,admin,,,-passwordfile,,
,C:/export/appserver-sqe/ee/cluster_inf/password.txt,,,-secure=false,,,terse=false,,,echo=false,,,interactive=false,,,start-domain,,,-verbose=
false,,,-debug=false,,,-domaindir,,,C:\\export\\glassfish3\\glassfish
domains,,,domain1 -instancename server -verbose false -debug false -asadmin-c
lasspath C:/export/glassfish3/glassfish/modules/admin-cli.jar -asadmin-classname com.sun.enterprise.admin.cli.AsadminMain -upgrade false -type DAS -do
maindir C:/export/glassfish3/glassfish/domains/domain1 -read-stdin true";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:
Program Files\\Java\\jre6\\bin\\java.exe\" -cp C:/export/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOp
tions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -server -javaagent:C:/export/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -Dosgi.shell.tel
net.maxconn=1 -Dfelix.fileinstall.disableConfigSave=false -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Dfelix.fileinstall.dir=C:\\export
glassf
ish3
glassfish/modules/autostart/ -Djavax.net.ssl.keyStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in3/config/keystore.jks -Dosg
i.shell.telnet.port=26667 -Djava.security.policy=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in3/config/server.policy -Djava.awt.head
less=true -Dfelix.fileinstall.log.level=3 -Dfelix.fileinstall.poll=5000 -Dcom.sun.aas.instanceRoot=C:\\export\\glassfish3\\glassfish\\nodes
localhost
-domain1
in3 -Dosgi.shell.telnet.ip=127.0.0.1 -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.Apps
erverConfigEnvironmentFactory -Djava.endorsed.dirs=C:\\export\\glassfish3\\glassfish/modules/endorsed;C:\\export\\glassfish3
glassfish/lib/endorsed -
Dcom.sun.aas.installRoot=C:\\export\\glassfish3
glassfish -Dfelix.fileinstall.bundles.startTransient=true \"-Djava.ext.dirs=C:
Program Files\\Java
jre6/lib/ext;C:
Program Files\\Java\\jre6/jre/lib/ext;C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1\\in3/lib/ext\" -Dfelix.fileinstall.
bundles.new.start=true -Djavax.net.ssl.trustStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in3/config/cacerts.jks -Dcom.sun.enterp
rise.security.httpsOutboundKeyAlias=s1as -Djava.security.auth.login.config=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in3/config/log
in.conf DANTLR_USE_DIRECT_CLASS_LOADING=true \"-Dgosh.args=-noshutdown -c noop=true\" \"-Djava.library.path=C:/export/glassfish3/glassfish/lib;C:/Wi
ndows/System32;C:/export/appserver-sqe/ee/cluster_inf;C:/Windows/Sun/Java/bin;C:/Windows;C:/bin:/bin: C:/export/glassfish3/glassfish/bin:C:/aroot/jdk1
.7.0_02/bin;C:/Perl64/site/bin;C:/Perl64/bin;C:/cygwin/bin;C:/OracleATS/oxe/app/oracle/product/10.2.0/server/BIN;C:/Windows/System32/wbem;C:/Windows/S
ystem32/WindowsPowerShell/v1.0;C:/aroot/jdk1.7.0_02/bin;C:/export/glassfish3/glassfish/bin\" com.sun.enterprise.glassfish.bootstrap.ASMain -asadmin-ar
gs -host,,,localhost,,,port,,,4848,,,user,,,admin,,,passwordfile,,,C:/export/appserver-sqe/ee/cluster_inf/password.txt,,,secure=false,,,-ter
se=false,,,-echo=false,,,interactive=false,,,start-local-instance,,,verbose=false,,,-debug=false,,,in3 -instancename in3 -verbose false -debug f
alse -asadmin-classpath C:/export/glassfish3/glassfish/modules/admin-cli.jar -asadmin-classname com.sun.enterprise.admin.cli.AsadminMain -upgrade fals
e -type INSTANCE -instancedir C:/export/glassfish3/glassfish/nodes/localhost-domain1/in3 -read-stdin true";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:
Program Files\\Java\\jre6\\bin\\java.exe\" -cp C:/export/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOp
tions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -server -javaagent:C:/export/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -Dosgi.shell.tel
net.maxconn=1 -Dfelix.fileinstall.disableConfigSave=false -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Dfelix.fileinstall.dir=C:\\export
glassf
ish3
glassfish/modules/autostart/ -Djavax.net.ssl.keyStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
qwerty/config/keystore.jks -D
osgi.shell.telnet.port=26668 -Djava.security.policy=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
qwerty/config/server.policy -Djava.aw
t.headless=true -Dfelix.fileinstall.log.level=3 -Dfelix.fileinstall.poll=5000 -Dcom.sun.aas.instanceRoot=C:\\export\\glassfish3\\glassfish\\nodes
loc
alhost-domain1
qwerty -Dosgi.shell.telnet.ip=127.0.0.1 -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverb
eans.AppserverConfigEnvironmentFactory -Djava.endorsed.dirs=C:\\export\\glassfish3\\glassfish/modules/endorsed;C:\\export\\glassfish3
glassfish/lib/e
ndorsed -Dcom.sun.aas.installRoot=C:\\export\\glassfish3
glassfish -Dfelix.fileinstall.bundles.startTransient=true \"-Djava.ext.dirs=C:
Program File
s\\Java\\jre6/lib/ext;C:
Program Files\\Java\\jre6/jre/lib/ext;C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1\\qwerty/lib/ext\" -Dfelix.
fileinstall.bundles.new.start=true -Djavax.net.ssl.trustStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
qwerty/config/cacerts.jks -
Dcom.sun.enterprise.security.httpsOutboundKeyAlias=s1as -Djava.security.auth.login.config=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1\
\qwerty/config/login.conf DANTLR_USE_DIRECT_CLASS_LOADING=true \"-Dgosh.args=-noshutdown -c noop=true\" \"-Djava.library.path=C:/export/glassfish3/g
lassfish/lib;C:/Windows/System32;C:/export/glassfish3/glassfish/domains/domain1/config;C:/Windows/Sun/Java/bin;C:/Windows;C:/bin:/bin: C:/export/glass
fish3/glassfish/bin:C:/aroot/jdk1.7.0_02/bin;C:/Perl64/site/bin;C:/Perl64/bin;C:/cygwin/bin;C:/OracleATS/oxe/app/oracle/product/10.2.0/server/BIN;C:/W
indows/System32/wbem;C:/Windows/System32/WindowsPowerShell/v1.0;C:/aroot/jdk1.7.0_02/bin;C:/export/glassfish3/glassfish/bin\" com.sun.enterprise.glass
fish.bootstrap.ASMain asadmin-args --host,,,localhost,,,port,,,4848,,,secure=false,,,terse=false,,,echo=false,,,-interactive=false,,,start-l
ocal-instance,,,-verbose=false,,,debug=false,,,-node,,,localhost-domain1,,,qwerty -instancename qwerty -verbose false -debug false -asadmin-classp
ath C:/export/glassfish3/glassfish/modules/admin-cli.jar -asadmin-classname com.sun.enterprise.admin.cli.AsadminMain -upgrade false -type INSTANCE -in
stancedir C:/export/glassfish3/glassfish/nodes/localhost-domain1/qwerty -read-stdin true";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:
Program Files\\Java\\jre6\\bin\\java.exe\" -cp C:/export/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOp
tions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -server -javaagent:C:/export/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -Dosgi.shell.tel
net.maxconn=1 -Dfelix.fileinstall.disableConfigSave=false -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Dfelix.fileinstall.dir=C:\\export
glassf
ish3
glassfish/modules/autostart/ -Djavax.net.ssl.keyStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
i101/config/keystore.jks -Dos
gi.shell.telnet.port=26669 -Djava.security.policy=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
i101/config/server.policy -Djava.awt.he
adless=true -Dfelix.fileinstall.log.level=3 -Dfelix.fileinstall.poll=5000 -Dcom.sun.aas.instanceRoot=C:\\export\\glassfish3\\glassfish\\nodes
localho
st-domain1
i101 -Dosgi.shell.telnet.ip=127.0.0.1 -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.A
ppserverConfigEnvironmentFactory -Djava.endorsed.dirs=C:\\export\\glassfish3\\glassfish/modules/endorsed;C:\\export\\glassfish3
glassfish/lib/endorse
d -Dcom.sun.aas.installRoot=C:\\export\\glassfish3
glassfish -Dfelix.fileinstall.bundles.startTransient=true \"-Djava.ext.dirs=C:
Program Files
Jav
a\\jre6/lib/ext;C:
Program Files\\Java\\jre6/jre/lib/ext;C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1\\i101/lib/ext\" -Dfelix.fileinst
all.bundles.new.start=true -Djavax.net.ssl.trustStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
i101/config/cacerts.jks -Dcom.sun.e
nterprise.security.httpsOutboundKeyAlias=s1as -Djava.security.auth.login.config=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
i101/conf
ig/login.conf DANTLR_USE_DIRECT_CLASS_LOADING=true \"-Dgosh.args=-noshutdown -c noop=true\" \"-Djava.library.path=C:/export/glassfish3/glassfish/lib
;C:/Program Files/Java/jre6/bin;C:/export/glassfish3/glassfish/nodes/localhost-domain1/i101/config;C:/Windows/Sun/Java/bin;C:/Windows/System32;C:/Wind
ows;C:/bin:/bin: C:/export/glassfish3/glassfish/bin:C:/aroot/jdk1.7.0_02/bin;C:/Perl64/site/bin;C:/Perl64/bin;C:/cygwin/bin;C:/OracleATS/oxe/app/oracl
e/product/10.2.0/server/BIN;C:/Windows/System32/wbem;C:/Windows/System32/WindowsPowerShell/v1.0;C:/aroot/jdk1.7.0_02/bin;C:/export/glassfish3/glassfis
h/bin\" com.sun.enterprise.glassfish.bootstrap.ASMain asadmin-args --host,,,localhost,,,port,,,4848,,,secure=false,,,terse=false,,,-echo=false
,,,-interactive=false,,,start-local-instance,,,verbose=false,,,debug=false,,,-node,,,localhost-domain1,,,i101 -instancename i101 -verbose false
-debug false -asadmin-classpath C:/export/glassfish3/glassfish/modules/admin-cli.jar -asadmin-classname com.sun.enterprise.admin.cli.AsadminMain -upgr
ade false -type INSTANCE -instancedir C:/export/glassfish3/glassfish/nodes/localhost-domain1/i101 -read-stdin true";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:
Program Files\\Java\\jre6\\bin\\java.exe\" -cp C:/export/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOp
tions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -server -javaagent:C:/export/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -Dosgi.shell.tel
net.maxconn=1 -Dfelix.fileinstall.disableConfigSave=false -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Dfelix.fileinstall.dir=C:\\export
glassf
ish3
glassfish/modules/autostart/ -Djavax.net.ssl.keyStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in1/config/keystore.jks -Dosg
i.shell.telnet.port=26666 -Djava.security.policy=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in1/config/server.policy -Djava.awt.head
less=true -Dfelix.fileinstall.log.level=3 -Dfelix.fileinstall.poll=5000 -Dcom.sun.aas.instanceRoot=C:\\export\\glassfish3\\glassfish\\nodes
localhost
-domain1
in1 -Dosgi.shell.telnet.ip=127.0.0.1 -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.Apps
erverConfigEnvironmentFactory -Djava.endorsed.dirs=C:\\export\\glassfish3\\glassfish/modules/endorsed;C:\\export\\glassfish3
glassfish/lib/endorsed -
Dcom.sun.aas.installRoot=C:\\export\\glassfish3
glassfish -Dfelix.fileinstall.bundles.startTransient=true \"-Djava.ext.dirs=C:
Program Files\\Java
jre6/lib/ext;C:
Program Files\\Java\\jre6/jre/lib/ext;C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1\\in1/lib/ext\" -Dfelix.fileinstall.
bundles.new.start=true -Djavax.net.ssl.trustStore=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in1/config/cacerts.jks -Dcom.sun.enterp
rise.security.httpsOutboundKeyAlias=s1as -Djava.security.auth.login.config=C:\\export\\glassfish3\\glassfish\\nodes\\localhost-domain1
in1/config/log
in.conf DANTLR_USE_DIRECT_CLASS_LOADING=true \"-Dgosh.args=-noshutdown -c noop=true\" \"-Djava.library.path=C:/export/glassfish3/glassfish/lib;C:/Wi
ndows/System32;C:/export/glassfish3/glassfish/domains/domain1/config;C:/Windows/Sun/Java/bin;C:/Windows;C:/bin:/bin: C:/export/glassfish3/glassfish/bi
n:C:/aroot/jdk1.7.0_02/bin;C:/Perl64/site/bin;C:/Perl64/bin;C:/cygwin/bin;C:/OracleATS/oxe/app/oracle/product/10.2.0/server/BIN;C:/Windows/System32/wb
em;C:/Windows/System32/WindowsPowerShell/v1.0;C:/aroot/jdk1.7.0_02/bin;C:/export/glassfish3/glassfish/bin\" com.sun.enterprise.glassfish.bootstrap.ASM
ain asadmin-args --host,,,localhost,,,port,,,4848,,,secure=false,,,terse=false,,,echo=false,,,interactive=false,,,start-local-instance,,,-
verbose=false,,,-debug=false,,,-node,,,localhost-domain1,,,in1 -instancename in1 -verbose false -debug false -asadmin-classpath C:/export/glassfish3
/glassfish/modules/admin-cli.jar -asadmin-classname com.sun.enterprise.admin.cli.AsadminMain -upgrade false -type INSTANCE -instancedir C:/export/glas
sfish3/glassfish/nodes/localhost-domain1/in1 -read-stdin true";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "%SystemRoot%\\system32
csrss.exe ObjectDirectory=
Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll
=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileControl=Off MaxReque
stThreads=16";
CommandLine = "winlogon.exe";
CommandLine = "\"taskhost.exe\"";
CommandLine = "rdpclip";
CommandLine = "\"C:\\Windows\\system32\\Dwm.exe\"";
CommandLine = "C:\\Windows
Explorer.EXE";
CommandLine = "\"C:
Program Files (x86)
Common Files
Symantec Shared\\ccApp.exe\" ";
CommandLine = "\"C:\\Windows\\system32\\mmc.exe\" \"C:\\Windows\\system32\\ServerManager.msc\" ";
CommandLine = "C:\\Windows\\servicing
TrustedInstaller.exe";
CommandLine = "C:\\Windows\\system32
sppsvc.exe";
CommandLine = "\"C:\\Windows\\system32\\cmd.exe\" ";
CommandLine = "\\??\\C:\\Windows\\system32
conhost.exe";
CommandLine = "\"C:
Program Files (x86)
Mozilla Firefox\\firefox.exe\" ";
CommandLine = "C:\\Windows\\system32
DllHost.exe /Processid:

{76A64158-CB41-11D1-8B02-00600806D9B6}

";
CommandLine = "C:\\Windows\\system32\\wbem
wmiprvse.exe";
Successfully accessed WMI (Windows Management Interface) on bigapp-oblade-1. There are 83 processes running on bigapp-oblade-1.
Successfully ran the test script on bigapp-oblade-1 using DCOM.
The script simply ran the DIR command. Here are the first few lines from the output of the dir command on the remote machine:

C:\Windows\system32>dir C:\
Volume in drive C has no label.
Volume Serial Number is 6028-F4DB

Directory of C:\

12/05/2011 10:21 AM 63 .asadminpass
12/10/2011 12:50 AM 755 .asadmintruststore
12/19/2011 02:04 PM <DIR> aroot
12/22/2011 01:37 AM 49,083 cli.log
12/13/2011 06:29 PM 49,585 cli.log.run1

Verified that a JDK is installed and available in the Path on bigapp-oblade-1. javac -version returned this: javac 1.7.0_02

Comment by Byron Nevins [ 23/Dec/11 ]

I created a failure by giving it a bad passwod. Here's the output:

asadmin -W d:\pw validate-dcom -w hudson bigapp-oblade-1
remote failure:
Successfully verified that the host, bigapp-oblade-1, is not the local machine as required.
Successfully resolved host name to: bigapp-oblade-1/10.133.184.150
Successfully connected to DCOM Port at port 135 on host bigapp-oblade-1.
Successfully connected to NetBIOS Session Service at port 139 on host bigapp-oblade-1.
Successfully connected to Windows Shares at port 445 on host bigapp-oblade-1.
The remote file, C: doesn't exist on bigapp-oblade-1 : Logon failure: unknown user name or bad password.

Comment by Byron Nevins [ 23/Dec/11 ]

What is the impact on the customer of the bug?
He will get less voluminous output by default.

How likely is it that a customer will see the bug and how serious is the bug?
It's not a bug really. But he will always see it.

Is it a regression? Does it meet other bug fix criteria (security, performance, etc.)?
No. New feature. Tiny tiny bit better performance since there is less output returned.

What is the cost/risk of fixing the bug?
Maybe 2 hours with 90% of that red tape.

How risky is the fix? How much work is the fix? Is the fix complicated?
Very little work. Very very simple. Essentially no risk.

Is there an impact on documentation or message strings?
Perhaps an impact on Doc though I doubt it.

Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
QA should re-run DCOM tests to verify

Which is the targeted build of 3.1.2 for this fix?
B16

Comment by Byron Nevins [ 27/Dec/11 ]

To avoid unintentional noise you get the huge output only when AS_DEBUG=true in the environment AND the verbose flag for the command is true.

Fixed in 3.1.2 and 4.0

Comment by Byron Nevins [ 27/Dec/11 ]

Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Transmitting file data ..
Committed revision 51793.

Comment by Byron Nevins [ 28/Dec/11 ]

outputs:

(1) normal success

asadmin -W d:\pw_vaio validate-dcom -w bnevins vaio
Command validate-dcom executed successfully.

(2) verbose success

asadmin -W d:\pw_vaio validate-dcom -w bnevins --verbose vaio

Successfully verified that the host, vaio, is not the local machine as required.
Successfully resolved host name to: vaio/10.28.51.105
Successfully connected to DCOM Port at port 135 on host vaio.
Successfully connected to NetBIOS Session Service at port 139 on host vaio.
Successfully connected to Windows Shares at port 445 on host vaio.
Successfully accessed C: on vaio using DCOM.
Successfully wrote delete_me.bat to C: on vaio using DCOM.
Successfully accessed WMI (Windows Management Interface) on vaio. There are 42 processes running on vaio.
Successfully ran the test script on vaio using DCOM.
The script simply ran the DIR command. Here are the first few lines from the output of the dir command on the remote machine:

C:\Windows\system32>dir C:\
Volume in drive C has no label.
Volume Serial Number is C024-AB90

Directory of C:\

11/07/2011 10:32 AM 10 aaaaaa
06/10/2009 01:42 PM 24 autoexec.bat
12/24/2011 01:55 PM <DIR> b
12/24/2011 12:35 PM <DIR> bin
06/10/2009 01:42 PM 10 config.sys

Verified that a JDK is installed and available in the Path on vaio. javac -version returned this: javac 1.6.0_27-ea

Command validate-dcom executed successfully.

(3) normal failure (same as verbose)
asadmin -W d:\pw_vaio validate-dcom -w bnevinss vaio
remote failure:
Successfully verified that the host, vaio, is not the local machine as required.
Successfully resolved host name to: vaio/10.28.51.105
Successfully connected to DCOM Port at port 135 on host vaio.
Successfully connected to NetBIOS Session Service at port 139 on host vaio.
Successfully connected to Windows Shares at port 445 on host vaio.
The remote file, C: doesn't exist on vaio : Logon failure: unknown user name or bad password.





[GLASSFISH-18080] An error has occurred when click on "Ping" for newly created DCOM node Created: 23/Dec/11  Updated: 29/Dec/11  Resolved: 27/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: None

Type: Bug Priority: Major
Reporter: sunny-gui Assignee: Byron Nevins
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Server OS: Windows 7 JA native 64-Bit
Bundle: ogs-3.1.2-b14-ml.zip


Attachments: JPEG File dcomNode_ping_error.jpg     JPEG File validate-dcom_success.jpg    

 Description   

An error has occurred when click on "ping" for newly created DCOM node

To reproduce:
1. Log into Admin Console.
2. Go to Nodes from left pane.
3. Click on New button on right pane.
4. Fill out required fields as below.

Type: DCOM
Node Host: somehost
Node Directory: GF_Install_Home/nodes
Install GlassFish Serer: Enabled(check this option)
Remote Test Directory: C:\windows\system32\wbem
Windows User Name: username
Windows User Password: password
5. Click Save to crate DCOM node named dcom4
6. Go back to Nodes and click "Ping" for newly created dcom4

System said:
An error has occurred
Failed to validate DCOM connection to node dcom4_from_agc126 (xxx.us.oracle.com) com.sun.enterprise.util.cluster.windows.process.WindowsException: The network name cannot be found.

7. The host somehost is accessible through DCOM.

Attached screen shots for your refrence.



 Comments   
Comment by Byron Nevins [ 27/Dec/11 ]

I followed the steps. Everything worked perfectly - including the ping.

Comment by sunny-gui [ 29/Dec/11 ]

This issue is not reproducible, so close it.





[GLASSFISH-18078] glassfish doesn't detect version differences when running remote commands via SSH Created: 22/Dec/11  Updated: 22/Dec/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2, 4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Tom Mueller Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If a 4.0 DAS tries to create an instance on a node where 3.1.2 is installed, it will try to use "nadmin" rather than "asadmin" to run the _create-instance-filesystem command. This of course will fail because 3.1.2 doesn't have nadmin. The error message is:

Command failed on node node2 (asqe-sblade-2): bash: /home/hudson/workspace/Cluster/glassfish3/glassfish/lib/nadmin: No such file or directory
Command create-instance completed with warnings.

It would be better if this error message indicated that the software version on the node is not supported.

It might also be better if this was detected when the node is created rather than when an instance is created.

There might also be other compatibility constraints even within the 3.1.x line. Is it supported to have a 3.1.2 DAS with 3.1.1 or 3.1 instances? If it isn't, this should be detected.






[GLASSFISH-18067] Failure to delete files not reported to user in Admin Console Created: 21/Dec/11  Updated: 12/Jan/12  Resolved: 27/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b15
Fix Version/s: 3.1.2_b15, 4.0_b16

Type: Bug Priority: Major
Reporter: lidiam Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b15.zip on solaris


Attachments: JPEG File delete-cluster1.JPG     JPEG File delete-cluster2.JPG     Text File server.log    
Tags: 312_gui_new, 312_qa, 312_verified, 3_1_2-approved

 Description   

Steps to reproduce:

In Admin Console:

1. Create a remote DCOM node.
2. Create a cluster with one instance on the new node.
3. Start and stop cluster. (optional)
4. Log in to the remote windows machine, where cluster resides and go to the following directory in DOS window:

<install directory>/glassfish/nodes/<DCOM node name>/<clustered instance name>

This is so that removing of this directory fails (since it is in use).

5. On Clusters page select the cluster and click on Delete. No warning will be seen, however server.log contains the following:

[#|2011-12-20T18:21:51.250-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=18;_ThreadName=Thread-2;|Executed this remote command:
C:\as\dcomtest\glassfish3\glassfish\bin\asadmin.bat --_auxinput C:\as\dcomtest\glassfish3\glassfish\bin\DELETE_ME_31217372646810002375 --interactive=false _delete-instance-filesystem --node jedy43 cljed1
Received this output:
UTIL6046: Attempt to rename C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\cljed1 to C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\oldinst3712054486246994744.tmp failed after 6 retries
Command _delete-instance-filesystem failed.

#]

[#|2011-12-20T18:21:51.254-0800|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=18;_ThreadName=Thread-2;|UTIL6046: Attempt to rename C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\cljed1 to C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\oldinst3712054486246994744.tmp failed after 6 retries
Command _delete-instance-filesystem failed.|#]

These messages should not be printed at INFO level, which is probably why Admin Console does not detect that something is wrong.



 Comments   
Comment by Anissa Lam [ 21/Dec/11 ]

Have you tried using CLI ? Do you see the error reported if using CLI ?
Assign to Byron for initial evaluation.

Comment by Byron Nevins [ 22/Dec/11 ]

The message told you precisely what it was unable to do. On Windows if you yourself can't delete/rename a file than neither can we. So in a case like this never file a bug without trying to do what it said it couldn't do. In this case you would simply do this:

rd /s C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\cljed1

Almost certainly you have a file or directory locked.

Go get this application from the internet and run it like so:
handle.exe

handle cljed1

That will tell you what process is using the file.

Comment by lidiam [ 22/Dec/11 ]

The point of this bug is not that files are not deleted. That is not the expectation. In fact, I purposely had the instance directory in use on the windows machine to see what happens when I issue uninstall from Admin Console. The problem is that the cause is not reported to the user in Admin Console. This is inconsistent with other commands, when they fail, and a usability issue. When a failure happens, the cause of the failure needs to be displayed in Admin Console, if possible. In this case, not only that the cause of the failure is not displayed - there is no warning at all so user may have no idea that directories were left behind on the windows system. In short here are the issues:

1. No warning in Admin Console
2. Cause of failure to delete directories not printed in Admin Console
3. Failure to delete directories printed in server.log at INFO level.

All of the above should be fixed.

Comment by Anissa Lam [ 22/Dec/11 ]

Lidia, why spell out "Admin Console" ? do you see the warning given out if you use CLI to perform the "delete-cluster" command ?

1. No warning in Admin Console
2. Cause of failure to delete directories not printed in Admin Console

Comment by lidiam [ 22/Dec/11 ]

My concern is with Admin Console behaviour, and that's why I listed issues as seen in Admin Console. However, I tried this scenario in CLI and the problem with deleting directories is reported to the user there:

lancerl(j2eetest):/export/home/j2eetest/3.1.2/glassfish3/glassfish/domains/domain1/logs# asadmin delete-instance clj1
Enter admin password for user "admin">
UTIL6046: Attempt to rename C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\clj1 to C:\as\dcomtest\glassfish3\glassfish\nodes\jedy43\oldinst4407860992800066086.tmp failed after 6 retries
Command _delete-instance-filesystem failed.
The instance, clj1, was deleted from host jed-asqe-43
Command delete-instance executed successfully.

I think the above may be confusing to new users as well:

1. "instance, clj1, was deleted from host jed-asqe-43" - well, it was not; it was only deleted from configuration
2. command executed successfully even though subcommand failed

However, Admin Console does not report the above. There is no warning image/message displayed at all (I'll add screenshots). That happens for both deleting instance from a cluster as well as deleting the entire cluster with instances (which is allowed from Admin Console but not allowed in CLI).

Comment by lidiam [ 22/Dec/11 ]

Screenshots for deleting cluster in Admin Console, when directory on host windows machine is in use and therefore not deleted.

Comment by Anissa Lam [ 23/Dec/11 ]

Looked at this with Byron.
I confirmed that

  • if the instance is on localhost-domain1, then even if the nodes directory is set RO, that it cannot be deleted, the delete command returns WARNING, and console can show that nicely to user.
  • if the instance is on a DCOM node, then the delete command returns SUCCESS, even though the nodes directory cannot be deleted.
    So, this is indeed a DCOM node specific error.
Comment by Byron Nevins [ 23/Dec/11 ]

The problem is that we don't know how to get the return status of a remote script run by DCOM.
I added a hack that looks for the magic string "failed" in the returned output.

Comment by Byron Nevins [ 23/Dec/11 ]

fixed in 4.0

d:\gf\trunk\main\nucleus\cluster>svn commit
Sending cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunner.java
Transmitting file data .
Committed revision 51744.

Comment by Byron Nevins [ 23/Dec/11 ]

What is the impact on the customer of the bug?
False positive. If the files were NOT deleted on Windows they won't see a Warning in the GUI.

How likely is it that a customer will see the bug and how serious is the bug?
A bit likely and serious simply because it is broken.

Is it a regression? Does it meet other bug fix criteria (security, performance, etc.)?
Not a regression – new feature

What is the cost/risk of fixing the bug?
Fixing it perfectly – unended. I don't know how to get the integer return value via COM.
A reasonable but brittle fix is to look for a "magic string" to know that a warning occured.

How risky is the fix? How much work is the fix? Is the fix complicated?
Not risky.
1 day.
String Parsing.

Is there an impact on documentation or message strings?
No - it just does what the analogous commands do already.

Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
QA should re-run manual/auto tests to verify

Which is the targeted build of 3.1.2 for this fix?
B16

Comment by Byron Nevins [ 23/Dec/11 ]

The fix is ready to go whenever I get approval

Comment by Anissa Lam [ 23/Dec/11 ]

I don't quite understand the following :

What is the impact on the customer of the bug?
_delete-service will always fail on Linux

The problem is for DCOM node instance, my DAS os on my Mac. Where and why Linux comes into the picture ?

Comment by Byron Nevins [ 27/Dec/11 ]

My fix that went into 4.0 needs to get rolled back.

problem – we don't know how to get status back from a remote Windows command.

Solution in this case – simply check if the directory still exists after running the command.

Comment by Byron Nevins [ 27/Dec/11 ]

Simple? Not so simple after all!

Solution is to check and see if the directory that was supposed to be deleted still exists using jcifs (SAMBA).

Fixed in 3.1.2 and in 4.0

Sending D:\gf\branches\3.1.2\cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java
Sending D:\gf\trunk\main\nucleus\cluster\ssh\src\main\java\org\glassfish\cluster\ssh\connect\NodeRunnerDcom.java
Transmitting file data ..
Committed revision 51792.

Comment by lidiam [ 12/Jan/12 ]

verified in build ogs-3.1.2-b17.zip





[GLASSFISH-18057] Change error message for dcom node failure Created: 20/Dec/11  Updated: 04/Jan/12  Resolved: 04/Jan/12

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: None

Type: Bug Priority: Major
Reporter: lidiam Assignee: Byron Nevins
Resolution: Works as designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b14.zip


Tags: 312_gui_new, 312_qa

 Description   

When creating a DCOM node, user may see the following error message:

An error has occurred
Successfully verified that the host, jed-asqe-43, is not the local machine as required. Successfully resolved host name to: jed-asqe-43/10.133.185.71 Successfully connected to DCOM Port at port 135 on host jed-asqe-43. Successfully connected to NetBIOS Session Service at port 139 on host jed-asqe-43. Successfully connected to Windows Shares at port 445 on host jed-asqe-43. The remote file, C:\tmp doesn't exist on jed-asqe-43 : Logon failure: unknown user name or bad password.

In my case, the problem was with the setting of Network Access under Security Policy ("Control Panel" -> "Administrative Tools" -> "Local Security Policy"-> "Local Policies"-> "Security Options" -> "Network Access: Sharing security model for local accounts"). Once I set it to Classic, dcom node was created fine. Hence the above error message, "unknown user name or bad password", is misleading. I'm not sure we should be providing a list of possible causes here, since there may be many. Perhaps we should be pointing instead to a document that describes all the possible causes and remedies? We should get doc team involved in deciding this.



 Comments   
Comment by Byron Nevins [ 30/Dec/11 ]

Paul - what to do? Please advise and then reassign to me.

Comment by Paul Davies [ 30/Dec/11 ]

Ideally, the error message should give some information about why the operation failed.

The doc team plans to add a section to the HA Admin Guide that explains how to set up DCOM for use with GlassFish. This section will contain all the steps required to avoid errors when DCOM nodes are created. The procedure for creating a DCOM node that will be added to the HA Admin Guide will list as a prerequisite that DCOM must be set up as described in that section.

Comment by Byron Nevins [ 04/Jan/12 ]

The official documentation explains how to troubleshoot connectivity problems.





[GLASSFISH-18051] Uninstall DCOM node fails Created: 19/Dec/11  Updated: 12/Jan/12  Resolved: 20/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: 3.1.2_b15, 4.0_b15

Type: Bug Priority: Major
Reporter: lidiam Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File server.log    
Tags: 312_verified, 3_1_2-approved

 Description   

Clicking Delete and Uninstall DCOM node fails with the following error:

An error has occurred
Node jedy43 deleted successfully but failed to un-install GlassFish on jed-asqe-43. Please run uninstall-node manually.

DAS server.log contains the following:

[#|2011-12-19T14:46:21.455-0800|INFO|glassfish3.1.2|javax.enterprise.system.tool
s.admin.com.sun.enterprise.v3.admin.cluster.dcom|_ThreadID=104;_ThreadName=Threa
d-2;|Command uninstall-node-dcom failed.

Invalid option: --sshport
Usage: asadmin [asadmin-utility-options] uninstall-node-dcom
[w|-windowsuser <windowsuser(default:$

{user.name}

)>]
[-d|--windowsdomain <windowsdomain>]
[--installdir <installdir(default:$

{com.sun.aas.productRoot}

)>]
[-force[=<force(default:false)>]] [?|--help[=<help(default:false)>]]
hosts ...|#]

[#|2011-12-19T14:46:21.472-0800|SEVERE|glassfish3.1.2|org.glassfish.admingui|_Th
readID=103;_ThreadName=Thread-2;|RestResponse.getResponse() gives FAILURE. endp
oint = 'https://localhost:4848/management/domain/nodes/node/jedy43/delete-node-d
com'; attrs = '

{uninstall=true}

'|#]



 Comments   
Comment by Anissa Lam [ 20/Dec/11 ]

Have you tried to run using CLI,
%asadmin delete-node-dcom --uninstall=true jedy43

You probably will see the same error.

As logged in server.log, console just pass in uninstall=true which is correct.
Reassign to "distribution-management"

Comment by Byron Nevins [ 20/Dec/11 ]

The description clearly shows the problem. You are giving an illegal option: --sshport

No such option for this command.
Should be an easy fix

Comment by Byron Nevins [ 20/Dec/11 ]

Hold on. I just saw that you are calling delete-node-dcom with the uninstall option.
I'll check it out more closely. it might be my bug after all. Standby!

Comment by Byron Nevins [ 20/Dec/11 ]

Refactored code needs work. Pretty easy fix.

Comment by Byron Nevins [ 20/Dec/11 ]

What is the impact on the customer of the bug?
delete-node-dcom will ALWAYS fail in the deleting of the remote files using --uninstall option

How likely is it that a customer will see the bug and how serious is the bug?
Fairly likely and seriously simply because it is broken.

Is it a regression? Does it meet other bug fix criteria (security, performance, etc.)?
Brand new command by definition it's not a regression.

What is the cost/risk of fixing the bug?
Cost is a couple man hours. risk is nil since it is totally broken without the fix.

How risky is the fix? How much work is the fix? Is the fix complicated?
Very little work. Very very simple. Essentially no risk.

Is there an impact on documentation or message strings?
No.

Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
QA should re-run manual tests to verify

Which is the targeted build of 3.1.2 for this fix?
B15

Comment by Byron Nevins [ 20/Dec/11 ]

Thanks for finding this Lidia. This fell through the cracks.

Comment by Byron Nevins [ 20/Dec/11 ]

The refactored code still has the original hard-coded assumption that all remotes are SSH.
It was easy to miss this because it is a "nested" asadmin command.
Not a big deal at all to fox it.

1) Add an abstract method to the base class.
2) Cut & paste the ssh-specific code from the base class into the newly implemented ssh subclass' method
3) Write the dcom-specific method

Comment by Byron Nevins [ 20/Dec/11 ]

Fixed in 4.0

d:\gf\trunk\main\nucleus\cluster\admin>svn commit
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeRemoteCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeSshCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\DeleteNodeDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\PingNodeDcomCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\UpdateNodeDcomCommand.java
Transmitting file data ........
Committed revision 51660.

Comment by Byron Nevins [ 20/Dec/11 ]

d:\gf\branches\3.1.2\cluster>svn commit
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeRemoteCommand.java
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeSshCommand.java
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\DeleteNodeDcom.java
Sending cluster\compare.bat
Sending cluster\copyy.bat
Sending cluster\readme.txt
Sending cluster\setfiles.bat
Transmitting file data .........
Committed revision 51681.

Comment by lidiam [ 12/Jan/12 ]

verified in build ogs-3.1.2-b17.zip





[GLASSFISH-18037] Provide information about cause of failure on the Console screen Created: 17/Dec/11  Updated: 12/Jan/12  Resolved: 22/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: 3.1.2_b16, 4.0

Type: Bug Priority: Major
Reporter: lidiam Assignee: Yamini K B
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ogs-3.1.2-b14.zip


Attachments: JPEG File create-node-failed.JPG     Text File server.log    
Tags: 312_gui_new, 312_qa, 312_verified, 3_1_2-approved

 Description   

When attempting to install Glassfish on a remote machine, during node creation, to a directory that already has Glassfish installed, Admin Console displays the following message:

An error has occurred
Failed to install GlassFish on tuppy. Please check the DAS server.log.

server.log contains the cause of failure:

"The remote installation directory, /export/home/j2eetest/3.1.2/glassfish3, already exists. Use the --force option to overwrite it."

This message should be displayed in Admin Console.



 Comments   
Comment by Anissa Lam [ 17/Dec/11 ]

Note that console is giving the exact same error as when you run this in CLI.

%asadmin create-node-ssh --nodehost bigtruck.us.oracle.com --installdir /tmp/testing --install=true bigtruckNode
remote failure: Failed to install GlassFish on bigtruck.us.oracle.com. Please check the DAS server.log.
Command create-node-ssh failed.

The response map that is passed back from the command contains the following information:

"data => {
extraProperties={methods=[

{name=GET}

, {name=POST, messageParameters={install=

{acceptableValues=, optional=true, defaultValue=false, type=boolean}, id={acceptableValues=, optional=false, defaultValue=, type=string}, sshkeyfile={acceptableValues=, optional=true, defaultValue=, type=string}, installdir={acceptableValues=, optional=true, defaultValue=${com.sun.aas.productRoot}, type=string}, nodedir={acceptableValues=, optional=true, defaultValue=, type=string}, sshport={acceptableValues=, optional=true, defaultValue=22, type=string}, nodehost={acceptableValues=, optional=false, defaultValue=, type=string}, force={acceptableValues=, optional=true, defaultValue=false, type=boolean}

, archive=

{acceptableValues=, optional=true, defaultValue=, type=string}, AS_ADMIN_SSHKEYPASSPHRASE={acceptableValues=, optional=true, defaultValue=, type=string}

, AS_ADMIN_SSHPASSWORD=

{acceptableValues=, optional=true, defaultValue=, type=string}

, sshuser={acceptableValues=, optional=true, defaultValue=$

{user.name}

, type=string}}}]},

message=Failed to install GlassFish on bigtruck.us.oracle.com. Please check the DAS server.log.,
exit_code=FAILURE,

command=create-node-ssh AdminCommand}"

Console is displaying exactly whatever is returned in the message part, which is also displayed by CLI, saying:
"Failed to install GlassFish on bigtruck.us.oracle.com. Please check the DAS server.log".

Transfer to Yamini to fix this to see if it is possible to give the same error thats logged in server.log

Comment by Yamini K B [ 20/Dec/11 ]
  • What is the impact on the customer of the bug?

Its a usability issue. User will need to dig around for root cause of failure. The command might as well display the actual cause directly instead of asking user to go look for it.

  • What is the cost/risk of fixing the bug?

Low risk.

  • Is there an impact on documentation or message strings?

Yes, message strings will be re-arranged/removed.

  • Which tests should QA (re)run to verify the fix did not destabilize GlassFish?

create-node-ssh, install-node-ssh dev tests

  • Which is the targeted build of 3.1.2 for this fix?
    B16
Comment by Yamini K B [ 22/Dec/11 ]

Fixed in 3.1.2 branch, r51727

Comment by Yamini K B [ 22/Dec/11 ]

Fixed in trunk, r51728

Comment by lidiam [ 12/Jan/12 ]

Verified in build ogs-3.1.2-b17.zip





[GLASSFISH-18004] The user was not asked to enter AS_ADMIN_WINDOWSPASSWORD Created: 14/Dec/11  Updated: 15/Dec/11  Resolved: 15/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: None

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Invalid Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

GF 3.1.2 build 14, Win 2008 machine. Executed the follow command:
=================================================================
asadmin create-node-dcom --nodehost localhost qq
remote failure: Missing Windows password. If you are using asadmin, specify the
remote Windows password in a file as follows:
AS_ADMIN_WINDOWSPASSWORD=windows-password
Specify the path of the password file to asadmin with the --passwordfile (or -W)
option.
Command create-node-dcom failed.
==========================================================

Before in this case, the AS_ADMIN_WINDOWSPASSWORD was asked two times. Now it was not asked at all. I believe that, as for any other GF existent passwords, the AS_ADMIN_WINDOWSPASSWORD has to be asked one time, if it was not provided in the passwordfile.



 Comments   
Comment by Byron Nevins [ 15/Dec/11 ]

Note that this command runs in the server, not in the client.

Before – the client saw that there is no passwordfile and used the built-in support in Asadmin to get a password. But that support is hard-coded to ask for a password twice. It is currently impossible to force it to ask only once.

A good solution, which we can't do for 3.1.2, is to create a new attribute for Parameters – something that would tell it to ask for the password ONCE not TWICE.

The choices were to either ask twice for the password or to require a password file.
Nazrul made the call to do the latter - which is what it does now.

If you try the analogous ssh commands they behave exactly the same way (if you are using password authentication)

Elena – You should consider filing an issue against CLI for the prompt-once-for-password capability.





[GLASSFISH-17995] Password Alias Not Resolved in create-node-dcom Created: 13/Dec/11  Updated: 14/Dec/11  Resolved: 14/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14, 4.0_b14
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Critical
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

Easy to reproduce.
1) Use the GUI to create a DCOM node.
1A) Use a password alias

Result – it will fail. create-node-dcom will try to use that exact string $

{ALIAS=xxx}

as the password.



 Comments   
Comment by Byron Nevins [ 13/Dec/11 ]

Here is the fix:

Index: admin/src/main/java/com/sun/enterprise/v3/admin/cluster/dcom/CreateNodeDcom.java
===================================================================
— admin/src/main/java/com/sun/enterprise/v3/admin/cluster/dcom/CreateNodeDcom.java (revision 51411)
+++ admin/src/main/java/com/sun/enterprise/v3/admin/cluster/dcom/CreateNodeDcom.java (working copy)
@@ -39,6 +39,7 @@
*/
package com.sun.enterprise.v3.admin.cluster.dcom;

+import com.sun.enterprise.universal.glassfish.TokenResolver;
import com.sun.enterprise.util.cluster.RemoteType;
import org.glassfish.cluster.ssh.util.DcomUtils;
import java.util.List;
@@ -49,6 +50,7 @@
import org.jvnet.hk2.annotations.*;
import org.jvnet.hk2.component.PerLookup;
import static com.sun.enterprise.util.StringUtils.ok;
+
/**

  • Remote AdminCommand to create a DCOM node
    *
    @@ -65,11 +67,12 @@
    private String windowspassword;
    @Param(name = "windowsdomain", shortName = "d", optional = true)
    private String windowsdomain;
    + private TokenResolver resolver = new TokenResolver();

@@ -82,6 +85,7 @@
protected void validate() throws CommandValidationException

{ if (!ok(windowspassword)) throw new CommandValidationException(Strings.get("update.node.dcom.no.password")); + windowspassword = DcomUtils.resolvePassword(resolver.resolve(windowspassword)); }

@Override

Comment by Byron Nevins [ 13/Dec/11 ]

What is the impact on the customer of the bug?
impossible to create DCOM nodes using a password alias. Very bad.

How likely is it that a customer will see the bug and how serious is the bug?
Is it a regression? Does it meet other bug fix criteria (security, performance, etc.)?

Very likely. None of the rest apply

  • What is the cost/risk of fixing the bug?

How risky is the fix? How much work is the fix? Is the fix complicated?
Simple. Easy. Low Risk. High benefit.

  • Is there an impact on documentation or message strings?
    None
  • Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
    Not necessary. QA never tried the scenario.
  • Which is the targeted build of 3.1.2 for this fix?
    The build schedule has run out at 13[1]. THis will go into B15.

[1] http://wikis.sun.com/display/GlassFish/3.1.2BuildSchedule

Comment by Byron Nevins [ 14/Dec/11 ]

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Transmitting file data ....
Committed revision 51550.
d:\gf\branches\3.1.2\cluster>





[GLASSFISH-17983] Automate DCOM setup. Created: 13/Dec/11  Updated: 14/Dec/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b13
Fix Version/s: None

Type: Improvement Priority: Critical
Reporter: easarina Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2_review

 Description   

Current DCOM instruction includes a step that requires manual windows registry editing. I don't think, that it would be good to recommend customers to edit registry manually. So, it looks for me, that this step has to be automated.



 Comments   
Comment by sb110099 [ 14/Dec/11 ]

Upgrading the bug to P2, as it needs some evaluation and attention for 3.1.2 .
This manual step for DCOM support will need to be automated for customers from usability perspective.

-Sudipa





[GLASSFISH-17972] Disallow Running install-node-dcom on DAS machine Created: 11/Dec/11  Updated: 11/Dec/11  Resolved: 11/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

It makes no sense to install GlassFish on a computer that is already running a GlassFish command in a GlassFish installation!

Possibly useful for testing but DCOM normally will NOT work against localhost anyway



 Comments   
Comment by Byron Nevins [ 11/Dec/11 ]

Easy – check the host in the (new) overridden validate method

Sending D:\gf\branches\3.1.2\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Sending D:\gf\branches\3.1.2\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Sending D:\gf\trunk\main\nucleus\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\LocalStrings.properties
Transmitting file data ......
Committed revision 51459.





[GLASSFISH-17962] Expose all DCOM nodes command as REST endpoint Created: 09/Dec/11  Updated: 20/Dec/11  Resolved: 20/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 4.0_b13
Fix Version/s: 4.0

Type: Bug Priority: Major
Reporter: Anissa Lam Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-17961 Need to port the Nodes support to 4.0 Open
Tags: 3_1_x-exclude

 Description   

All the DCOM nodes commands need to be exposed through REST so admin console can call it.
As of now, GUI only supports DCOM in 3.1.2 branch, and will port the changes to 4.0 once this is available.

You can do http://localhost:4848/management/domain/nodes.json and see the list of endpoints exposed. Note that the DCOM nodes command is not there.



 Comments   
Comment by Anissa Lam [ 13/Dec/11 ]

The DCOM nodes commands thats needed by the console include:

create-node-dcom
delete-node-dcom
update-node-dcom
ping-node-dcom
validate-dcom

Comment by Byron Nevins [ 20/Dec/11 ]

Done.
Reviewed by Jason Lee

d:\gf\trunk\main\nucleus\cluster\admin>svn commit
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeRemoteCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\DeleteNodeSshCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\DeleteNodeDcom.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\PingNodeDcomCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\UpdateNodeDcomCommand.java
Transmitting file data ........
Committed revision 51660.





[GLASSFISH-17947] Add Copious Output Text with How-To-Config-Windows Info Created: 09/Dec/11  Updated: 18/Jan/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b13, 4.0_b14
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Byron Nevins Assignee: Paul Davies
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Windows Config is difficult and DOC should be easy to find.

Paul - what do you think? Details upon failed running of the command? Or details with the --help.
Both?

Let's discuss. Perhaps I should spit out some doc when the command fails. Can you spruce up what I've documented?






[GLASSFISH-17946] validate-dcom -- Add Check for JDK Created: 09/Dec/11  Updated: 09/Dec/11  Resolved: 09/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-review

 Description   

Add a check for a JDK. Perhaps run jar.exe and check that output is returned?



 Comments   
Comment by Byron Nevins [ 09/Dec/11 ]

Done!

I added a test which runs "javac -version" to verify that a JDK is in the path

d:\gf\branches\3.1.2\cluster>svn commit -F D:\gf\svn-commit.3.tmp d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\branches\3.1.2\cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\process\WindowsRemoteScripter.java
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\copyy.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\trunk\main\nucleus\cluster\common\src\main\java\com\sun\enterprise\util\cluster\windows\process\WindowsRemoteScripter.java
Transmitting file data .........
Committed revision 51408.





[GLASSFISH-17943] update-node-dcom doesn't work on the remote host, but update-node-ssh - works. Created: 08/Dec/11  Updated: 09/Dec/11  Resolved: 09/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.1_b12
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Win 2008 GF 3.1.2
Executed the follow commands on the remote machine:

============================================================
C:\export>asadmin --user admin --passwordfile password.txt --host bigapp-oblade-2 create-node-config --nodehost bigapp-oblade-1 node6
Command create-node-config executed successfully.

C:\export>asadmin --user admin --passwordfile password.txt --host bigapp-oblade-2 update-node-dcom --nodehost bigapp-oblade-1 node6
remote failure: Warning: some parameters appear to be invalid.
SSH node not updated. To force an update of the node with these parameters rerun
the command using the --force option.
java.lang.IllegalArgumentException: Bad argument.
Command update-node-dcom failed.

=================================================

So update-node-dcom failed with not informative error message. The same commands in ssh environment (was used the same build), were executed successfully.

[root@asqe-x2250-st7 bin]# ./asadmin --host asqe-x2250-st5 --user admin update-node-config --nodehost asqe-x2250-st7 node12
Enter admin password for user "admin">
Command update-node-config executed successfully.
[root@asqe-x2250-st7 bin]# ./asadmin --host asqe-x2250-st5 --user admin update-node-ssh --nodehost asqe-x2250-st7 node12
Enter admin password for user "admin">
Command update-node-ssh executed successfully.



 Comments   
Comment by Byron Nevins [ 09/Dec/11 ]

The problem is that you didn't give a windows password. No way can you run any DCOM commands without a password.

Added a special error message and I now test explicitly for this case.

c:\Temp>asadmin update-node-dcom --nodehost bigapp-oblade-2 node6
remote failure: Warning: some parameters appear to be invalid.
Node not updated. To force an update of the node with these parameters rerun the command using the --force option.
Missing Windows password. If you are using asadmin, specify the remote Windows password in a file as follows:
AS_ADMIN_WINDOWSPASSWORD=windows-password
Specify the path of the password file to asadmin with the --passwordfile (or -W) option.
Command update-node-dcom failed.

====================================

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Transmitting file data .....
Committed revision 51397.





[GLASSFISH-17942] validate-dcom against localhost gave a wrong status. Created: 08/Dec/11  Updated: 09/Dec/11  Resolved: 09/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Win 2008 GF 3.1.2 b12.
Executed validate-dcom against localhost:

=================================================================
asadmin validate-dcom localhost
Enter AS_ADMIN_WINDOWSPASSWORD password>
Enter AS_ADMIN_WINDOWSPASSWORD password again>
remote failure:
Successfully resolved host name to: localhost/127.0.0.1
Successfully connected to DCOM Port at port 135 on host localhost.
Can't connect to NetBIOS Session Service at port 139 on host localhost.
This is usually caused by a firewall blocking the port or the Server Service bei
ng stopped. : Connection refused: connect
Successfully connected to Windows Shares at port 445 on host localhost.

Command validate-dcom failed.
======================================================

I believe in this case validate-dcom gave a wrong status. It works if to use machine name instead of "localhost" and create-node-dcom works with nodehost=localhost.



 Comments   
Comment by Byron Nevins [ 09/Dec/11 ]

This is the way it works. localhost and the machine name have different intricate connections to DCOM. There is no need to change this behavior since we don't support using Distributed COM for communications on the same machine.

Comment by Byron Nevins [ 09/Dec/11 ]

Note that this behavior is out of our hands – it is DCOM itself

Comment by Byron Nevins [ 09/Dec/11 ]

Add a check – if the host == this machine
then fail with an error message immediately.

Comment by Byron Nevins [ 09/Dec/11 ]

AFTER the change:

d:\gf\branches\3.1.2\cluster\admin>asadmin validate-dcom -W \pw_orcl -w wnevins localhost
remote failure: The host, localhost, is the local machine. DCOM is only for use on distributed systems.

d:\gf\branches\3.1.2\cluster\admin>asadmin validate-dcom -W \pw_orcl -w wnevins 127.0.0.1
remote failure: The host, 127.0.0.1, is the local machine. DCOM is only for use on distributed systems.

d:\gf\branches\3.1.2\cluster\admin>asadmin validate-dcom -W \pw_orcl -w wnevins wnevins-lap
remote failure: The host, wnevins-lap, is the local machine. DCOM is only for use on distributed systems.

d:\gf\branches\3.1.2\cluster\admin>asadmin validate-dcom -W \pw_orcl -w wnevins 10.28.51.113
remote failure: The host, 10.28.51.113, is the local machine. DCOM is only for use on distributed systems.

d:\gf\branches\3.1.2\cluster\admin>asadmin validate-dcom -W \pw_orcl -w wnevins 10.159.220.147
remote failure: The host, 10.159.220.147, is the local machine. DCOM is only for use on distributed systems.

Comment by Byron Nevins [ 09/Dec/11 ]

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\copyy.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Transmitting file data .......
Committed revision 51431.





[GLASSFISH-17941] ping-node-dcom failed for the updated node. Created: 08/Dec/11  Updated: 09/Dec/11  Resolved: 09/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b12
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Win 2008 GF build 12. Create one node (node2) using create-node-dcom (did not use --installdir) and created another node, using create-node-config (did not use --installdir) then run update-node-dcom against that config node (node3).

After that executed ping-node-dcom against these nodes. ping-node-dcom was executed successfully against node2, but failed against node3.
========================================================================
C:\export>asadmin --user admin --passwordfile password.txt ping-node-dcom node2
Successfully made DCOM connection to node node2 (bigapp-oblade-1)
Command ping-node-dcom executed successfully.

C:\export>asadmin --user admin --passwordfile password.txt ping-node-dcom node3
remote failure: Failed to validate DCOM connection to node node3 (bigapp-oblade-
1)
Could not connect to host bigapp-oblade-1 using DCOM.
Command ping-node-dcom failed.

===================================================================

I believe the problem happened because node3 did not have the install-dir:
----------------------------------------------------------------------------

<node node-host="bigapp-oblade-1" name="node2" windows-domain="bigapp-oblade-1" type="DCOM" install-dir="$

{com.sun.aas.productRoot}

">
<ssh-connector ssh-port="135">
<ssh-auth password="Ch@ng3m3"></ssh-auth>
</ssh-connector>
</node>
<node node-host="bigapp-oblade-1" name="node3" type="DCOM">
<ssh-connector ssh-port="135">
<ssh-auth password="Ch@ng3m3"></ssh-auth>
</ssh-connector>
</node>
-----------------------------------------------------------------



 Comments   
Comment by Byron Nevins [ 09/Dec/11 ]

This is a feature, not a bug. We prevented you from creating a garbage DCOM node. If the install-dir is not setup correctly then there is zero chance of DCOM working. So we prevent you from doing that. If you insist – then you can use the force option.

In my case I see this:

d:\gf\branches\3.1.2\cluster\admin>asadmin update-node-dcom --nodehost bigapp-oblade-2 -W \pw -w hudson xxx2
remote failure: Warning: some parameters appear to be invalid.
Node not updated. To force an update of the node with these parameters rerun the command using the --force option.
com.sun.enterprise.util.cluster.windows.process.WindowsException: The network name cannot be found.

The installdir is ALWAYS populated – the default is the DAS' install-dir. In my case it is d:\glassfish3 and the remote is on c:\glassfish3.

SAMBA correctly told me that it couldn't find the network name. Drive D is the network name it was looking for.

This is not a valid bug.

Comment by easarina [ 09/Dec/11 ]

I did not see any warning or error messages just created a config node and updated it to dcom node. And then ping doesn't work.

Comment by Byron Nevins [ 09/Dec/11 ]

What commands, exactly, are you running?
Please provide the exact commands.

Comment by Byron Nevins [ 09/Dec/11 ]

You said you saw no error or warning yet you said:

C:\export>asadmin --user admin --passwordfile password.txt ping-node-dcom node3
remote failure: Failed to validate DCOM connection to node node3 (bigapp-oblade-
1)
Could not connect to host bigapp-oblade-1 using DCOM.
Command ping-node-dcom failed.

======
This isn't a error message?!?

Comment by easarina [ 09/Dec/11 ]

C:\export>asadmin start-domain
Waiting for domain1 to start ...........
Successfully started the domain : domain1
domain Location: C:\export\glassfish3\glassfish\domains\domain1
Log File: C:\export\glassfish3\glassfish\domains\domain1\logs\server.log
Admin Port: 4848
Command start-domain executed successfully.

C:\export>asadmin create-node-config node1
Command create-node-config executed successfully.

C:\export>asadmin --passwordfile password1.txt update-node-dcom --nodehost locahost node1
Command update-node-dcom executed successfully.

C:\export>asadmin --passwordfile password1.txt ping-node-dcom node1
remote failure: Failed to validate DCOM connection to node node1 (localhost)
Could not connect to host localhost using DCOM.
Command ping-node-dcom failed.

Comment by Byron Nevins [ 09/Dec/11 ]

I think I see what you mean now. I assumed you didn't give the password for the remote machine like the other bug. We can all save time if you simply give me the exact commands you ran for every bug you file.

Comment by Byron Nevins [ 09/Dec/11 ]

Now I detect the problem:

d:\gf\branches\3.1.2\cluster\admin>asadmin ping-node-dcom node99
remote failure: Failed to validate DCOM connection to node node99 (bigapp-oblade-3)
The configuration for the node is invalid. There is no value for the installdir. Try running update-node-dcom and specify the install directory for
GlassFish.

FIXED!

Comment by Byron Nevins [ 09/Dec/11 ]

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Transmitting file data ....
Committed revision 51400.





[GLASSFISH-17940] disallow running uninstall-node-dcom on DAS machine. Created: 08/Dec/11  Updated: 12/Dec/11  Resolved: 12/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b12
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: easarina Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Win 2008, GF 3.1.2 build 12.

Executed uninstall-node-dcom on DAS machine against DAS machine. The uninstall process was started and then failed, but it removed bin directory and left all running processes. I believe that self-destructive uninstall should not try to remove the installation on DAS machine.



 Comments   
Comment by Byron Nevins [ 08/Dec/11 ]

Not a common use case to say the least!!

Please give me the exact commands you ran.

Comment by Byron Nevins [ 08/Dec/11 ]

->P4

Comment by easarina [ 09/Dec/11 ]

C:\export>asadmin --user admin --passwordfile password.txt uninstall-node-dcom --installdir C:\export\glassfish3 bigapp-oblade-2
java.lang.NullPointerException
at com.sun.enterprise.admin.cli.cluster.NativeRemoteCommandsBase.removeT
railingSlash(NativeRemoteCommandsBase.java:360)
at com.sun.enterprise.admin.cli.cluster.NativeRemoteCommandsBase.checkIf
NodeExistsForHost(NativeRemoteCommandsBase.java:327)
at com.sun.enterprise.admin.cli.cluster.UninstallNodeBaseCommand.validat
e(UninstallNodeBaseCommand.java:75)
at com.sun.enterprise.admin.cli.CLICommand.execute(CLICommand.java:254)
at com.sun.enterprise.admin.cli.AsadminMain.executeCommand(AsadminMain.j
ava:306)
at com.sun.enterprise.admin.cli.AsadminMain.main(AsadminMain.java:238)
com.sun.enterprise.universal.process.WindowsException: The process cannot access
the file because it is being used by another process.
Command uninstall-node-dcom failed.
The system cannot find the path specified.
======================================
C:\export>cd glassfish3\glassfish

C:\export\glassfish3\glassfish>ls
domains legal lib modules osgi

C:\export\glassfish3\glassfish>
=====================================

Most error messages happened because of the bug:17944

But last messages belong to that command:
====================================================
com.sun.enterprise.universal.process.WindowsException: The process cannot access the file because it is being used by another process.
Command uninstall-node-dcom failed.
====================================================================
The command failed, but GF bin directory was removed. So the installation was corrupted. I don't think that it is a minor issue, because a user just by mistake can run such command and corrupt the installation.

Comment by easarina [ 12/Dec/11 ]

At the comment to the bug 17944 Yamini wrote:

"I'll open a new issue to disallow running install-node-dcom on localhost."

I believe that also should not be allowed to run uninstall-node-dcom on the localhost.

Comment by Byron Nevins [ 12/Dec/11 ]

Thanks for finding this Elena!

All done.

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\UninstallNodeDcomCommand.java
Sending D:\gf\trunk\main\nucleus\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\UninstallNodeDcomCommand.java
Transmitting file data ..
Committed revision 51506.





[GLASSFISH-17925] Remote Script on Windows is given Garbage Name if run from UNIX Created: 07/Dec/11  Updated: 07/Dec/11  Resolved: 07/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b13, 4.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-review

 Description   

1. I ran install-node-dcom from a UNIX machine to a Windows machine
2. It said that it failed.

I attached a debugger and saw that it actually worked perfectly! It's just that no output was returned from running the remote script.
This is the first time I tried it from UNIX instead of Windows so maybe it is a problem on UNIX?
At any rate the code below is too stringent. In this case the unpacking went fine.

Recommend:
Change the test to look for a particular file that got unpacked instead of looking at the output.

String out = scripter.run(unpackScript);

if (out == null || out.length() < 50)
throw new CommandException(Strings.get("dcom.error.unpacking", unpackScript, out));

logger.fine("Output from Windows Unpacker:\n" + out);
}



 Comments   
Comment by Byron Nevins [ 07/Dec/11 ]

The code is in:

InstallNodeDcomCommand.unpackOnHosts()

Comment by Byron Nevins [ 07/Dec/11 ]

Actual Error is this:

SmartFile.sanitize was used to create a remote path. But SmartFile is designed for creating paths for use on the CURRENT PLATFORM! It got confused with the "C:"

the result is that the remote script path was set to garbage like this:

/export/home/bnlocal/c:/gf/unpack.bat

So the unpack script never ran.

Comment by Byron Nevins [ 07/Dec/11 ]

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster\cli d:\gf\branches\3.1.2\cluster\cli
Sending D:\gf\branches\3.1.2\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Sending D:\gf\trunk\main\nucleus\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Transmitting file data ..
Committed revision 51358.

I also fixed the "ask for the password twice" issue

Comment by Byron Nevins [ 07/Dec/11 ]

The earlier comment about UNIX not getting the output from the remote command – is JUST PLAIN WRONG.

It actually gets the output just fine. The problem is that the path to the script was garbage. It never actually ran the script – thus no output.





[GLASSFISH-17924] SSH not working Created: 07/Dec/11  Updated: 13/Dec/11  Resolved: 13/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b14
Fix Version/s: 3.1.2_b14

Type: Bug Priority: Blocker
Reporter: Byron Nevins Assignee: Yamini K B
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

~/gf/trunk/v2/appserv-tests/devtests/admin/cli>asadmin install-node-ssh --x
Exception in thread "main" org.jvnet.hk2.component.ComponentException: injection failed on com.sun.enterprise.admin.cli.cluster.InstallNodeSshCommand.sshLauncher with class org.glassfish.cluster.ssh.launcher.SSHLauncher
at org.jvnet.hk2.component.InjectionManager.error_injectionException(InjectionManager.java:277)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java:159)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java:91)
at com.sun.hk2.component.AbstractCreatorImpl.inject(AbstractCreatorImpl.java:126)
at com.sun.hk2.component.ConstructorCreator.initialize(ConstructorCreator.java:91)
at com.sun.hk2.component.AbstractCreatorImpl.get(AbstractCreatorImpl.java:82)
at com.sun.hk2.component.EventPublishingInhabitant.get(EventPublishingInhabitant.java:139)
at com.sun.hk2.component.AbstractInhabitantImpl.get(AbstractInhabitantImpl.java:76)
at org.jvnet.hk2.component.Habitat.getComponent(Habitat.java:796)
at com.sun.enterprise.admin.cli.CLICommand.getCommand(CLICommand.java:182)
at com.sun.enterprise.admin.cli.AsadminMain.executeCommand(AsadminMain.java:305)
at com.sun.enterprise.admin.cli.AsadminMain.main(AsadminMain.java:238)
Caused by: org.jvnet.hk2.component.ComponentException: Failed to create class org.glassfish.cluster.ssh.launcher.SSHLauncher
at com.sun.hk2.component.ConstructorCreator.create(ConstructorCreator.java:71)
at com.sun.hk2.component.AbstractCreatorImpl.get(AbstractCreatorImpl.java:80)
at com.sun.hk2.component.EventPublishingInhabitant.get(EventPublishingInhabitant.java:139)
at com.sun.hk2.component.AbstractInhabitantImpl.get(AbstractInhabitantImpl.java:76)
at org.jvnet.hk2.component.Habitat.getBy(Habitat.java:1048)
at org.jvnet.hk2.component.Habitat.getByType(Habitat.java:1029)
at com.sun.hk2.component.InjectInjectionResolver.getComponentInjectValue(InjectInjectionResolver.java:159)
at com.sun.hk2.component.InjectInjectionResolver.getValue(InjectInjectionResolver.java:90)
at org.jvnet.hk2.component.InjectionManager.inject(InjectionManager.java:141)
... 10 more
Caused by: java.lang.NoClassDefFoundError: com/trilead/ssh2/ServerHostKeyVerifier
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
at java.lang.Class.getConstructor0(Class.java:2699)
at java.lang.Class.newInstance0(Class.java:326)
at java.lang.Class.newInstance(Class.java:308)
at com.sun.hk2.component.ConstructorCreator.create(ConstructorCreator.java:65)
... 18 more
Caused by: java.lang.ClassNotFoundException: com.trilead.ssh2.ServerHostKeyVerifier
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 24 more
~/gf/trunk/v2/appserv-tests/devtests/admin/cli>



 Comments   
Comment by Byron Nevins [ 07/Dec/11 ]

Sorry. False alarm. I must have had a screwy installation. I reinstalled 3.1.2 and everything is working fine.





[GLASSFISH-17913] install-node-dcom doesn't Created: 06/Dec/11  Updated: 13/Dec/11  Resolved: 08/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b11
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: Paul Davies Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows XP



 Description   

Attempts to use install-node-dcom to copy GlassFish Server software to a DCOM-accessible host fail.

The error that occurs depends on the options specified:

  • If an existing installation directory and --force are specified, a null pointer exception is thrown:
    asadmin> install-node-dcom --force=true --installdir C:\glassfish3 somehost
    java.lang.NullPointerException
    Command install-node-dcom failed.
    
  • If a nonexistent installation directory is specified, the network name cannot be found:
    asadmin> install-node-dcom -w hudson --installdir C:\gfuser somehost
    com.sun.enterprise.universal.process.WindowsException: The network name cannot be found.
    

    The host somehost is accessible through DCOM:

    asadmin> validate-dcom somehost
    
    Successfully resolved host name to: somehost/XXX.XXX.XXX.XXX
    Successfully connected to DCOM Port at port 135 on host somehost.
    Successfully connected to NetBIOS Session Service at port 139 on host somehost
    Successfully connected to Windows Shares at port 445 on host somehost.
    Successfully accessed C: on somehost using DCOM.
    Successfully wrote delete_me.bat to C: on somehost using DCOM.
    Successfully accessed WMI (Windows Management Interface) on somehost.  
    There are 50 processes running on somehost.
    Successfully ran the test script on somehost using DCOM.
    The script simply ran the DIR command.  Here are the first few lines from the 
    output of the dir command on the remote machine:
    
    C:\Windows\system32>dir C:\
     Volume in drive C has no label.
     Volume Serial Number is XXXX-XXX
    
     Directory of C:\
    
    12/06/2011  10:45 AM                 8 delete_me.bat
    03/03/2011  10:36 AM    <DIR>          export
    
    
    Command validate-dcom executed successfully.
    


 Comments   
Comment by Byron Nevins [ 08/Dec/11 ]

The first problem is fixed.

The second problem makes no sense as a bug because it is simply the correct normal usage of the command:

install-node-dcom -w hudson --installdir C:\gfuser somehost

Did you leave something out?

I tried it and it worked fine. Please re-open if I missed something.

D:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending branches\3.1.2\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Sending branches\3.1.2\cluster\compare.bat
Sending branches\3.1.2\cluster\copyy.bat
Sending branches\3.1.2\cluster\setfiles.bat
Sending trunk\main\nucleus\cluster\cli\src\main\java\com\sun\enterprise\admin\cli\cluster\InstallNodeDcomCommand.java
Transmitting file data .....
Committed revision 51393.





[GLASSFISH-17911] update-node-com error message refers to SSH Created: 06/Dec/11  Updated: 06/Mar/12

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: not determined

Type: Bug Priority: Major
Reporter: Paul Davies Assignee: Byron Nevins
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-exclude

 Description   

An unsuccessful attempt to update a DCOM node displays an error message that refers to SSH:

asadmin> update-node-dcom -w hudson --nodehost  host.example.com xkyd
remote failure: Warning: some parameters appear to be invalid.
SSH node not updated. To force an update of the node with these parameters rerun
the command using the --force option.
com.sun.enterprise.universal.process.WindowsException: org.jinterop.dcom.common.
JIException: Access is denied, please check whether the [domain-username-password]
are correct. Also, if not already done please check the GETTING STARTED and
FAQ sections in readme.htm. They provide information on how to correctly configure
the Windows machine for DCOM access, so as to avoid such exceptions.  [0x00000005]
Command update-node-dcom failed.


 Comments   
Comment by Joe Di Pol [ 30/Jan/12 ]

Not a 3.1.2 stopper.

Comment by Tom Mueller [ 06/Mar/12 ]

Bulk update to change fix version to "not determined" for all issues still open but with a fix version for a released version.





[GLASSFISH-17739] create-instance fails when DAS on Linux, instance on Windows and using --nodedir Created: 15/Nov/11  Updated: 30/Nov/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.1_b12
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: bthalmayr Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2 64bit, Java(TM) SE Runtime Environment (build 1.7.0_01-b08), cygwin



 Description   

First of all 'ssh' and 'scp' works fine using public key auth from 'DAS' server to 'node' server. ('DAS' receides on RHEL, 'node' on Windows 2008 R2 with cygwin sshd).

'asadmin create-node-ssh' works fine...
asadmin --user <user> --passwordfile=<pwd-file> --port <port> create-node-ssh --nodehost <nodehost> --installdir c:/sun/glassfish3 --nodedir c:/sun/glassfish3/nodes --sshuser <runtime-user> --sshkeyfile <keyfile-for-runtime-user> --install=false <node-name>

'asadmin create-instance' fails ...

asadmin --user <user> --passwordfile=<pwd-file> --port <port> create-instance --node <node-name> --config <config> <instance-name>
Successfully created instance <instance-name> in the DAS configuration, but failed to create the instance files on node <node-name> (<node-fqdn>).

Command failed on node <node-name> (<node-fqdn>): cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Node directory c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes does not exist or is not a directory
Command _create-instance-filesystem failed.

admin-logger on 'DAS' shows...
..

[#|2011-11-15T18:18:07.167+0100|FINE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.connect.NodeRunner;MethodName=trace;|NodeRunner: Running command on <node-fqdn>: c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _validate-das-options --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>|#]

...

[#|2011-11-15T18:18:07.170+0100|FINER|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.launcher.SSHLauncher;MethodName=runCommand;|Running command c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port>_validate-das-options --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name> on host: <node-fqdn>|#]

...

[#|2011-11-15T18:18:08.534+0100|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;|cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Command _validate-das-options executed successfully.|#]

...

[#|2011-11-15T18:18:08.538+0100|FINE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.connect.NodeRunner;MethodName=trace;|NodeRunner: Running command on <node-fqdn>: c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>|#]

...

[#|2011-11-15T18:18:08.540+0100|FINER|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;ClassName=org.glassfish.cluster.ssh.launcher.SSHLauncher;MethodName=runCommand;|Running command c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port 4849 _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name> on host: <node-fqdn>|#]

...

[#|2011-11-15T18:18:09.915+0100|WARNING|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=3895;_ThreadName=Thread-2;|Successfully created instance <instance-name> in the DAS configuration, but failed to create the instance files on node <node-name> (<node-fqdn>).: Command ' c:/sun/glassfish3/glassfish/bin/asadmin --_auxinput - --interactive=false --host <das-fqdn> --port <das-port> _create-instance-filesystem --nodedir c:/sun/glassfish3/glassfish/c:/sun/glassfish3/nodes --node <node-name> <instance-name>' failed on node <node-name> (<node-fqdn>): cygwin warning:
MS-DOS style path detected: c:/sun/glassfish3/glassfish/bin/asadmin
Preferred POSIX equivalent is: /cygdrive/c/sun/glassfish3/glassfish/bin/asadmin
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Node directory c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes does not exist or is not a directory^M

of course the directory 'c:\sun\glassfish3\glassfish\c:\sun\glassfish3\nodes' does not exist.

It seems that the value for '-installdir' and '-nodedir' are concatenated



 Comments   
Comment by bthalmayr [ 15/Nov/11 ]

using 'cygwin'-style pathnames 'create-instance' fails as well, but different error.

'DAS' log shows ...

[#|2011-11-15T18:58:14.418+0100|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=4735;_ThreadName=Thread-2;|Using DAS host <das-fqdn> and port <das-port> from existing das.properties for node
<node-name>. To use a different DAS, create a new node using create-node-ssh or
create-node-config. Create the instance with the new node and correct
host and port:
asadmin --host das_host --port das_port create-local-instance --node node_name instance_name.
Command _create-instance-filesystem executed successfully.|#]

[#|2011-11-15T18:58:15.323+0100|SEVERE|glassfish3.1.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=4735;_ThreadName=Thread-2;|Successfully created instance <instance-name> in the DAS configuration, but failed to install configuration files for the instance on node <node-fqdn> during bootstrap.

SSH configuration information

Additional failure info: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/gf-wnode1/.
com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$BootstrapException: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/<node-name>/
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.bootstrapInstance(SecureAdminBootstrapHelper.java:172)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.bootstrapSecureAdminRemotely(CreateInstanceCommand.java:337)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.createInstanceFilesystem(CreateInstanceCommand.java:432)
at com.sun.enterprise.v3.admin.cluster.CreateInstanceCommand.execute(CreateInstanceCommand.java:239)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$1.execute(CommandRunnerImpl.java:355)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:370)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:1045)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.access$1200(CommandRunnerImpl.java:96)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1244)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1232)
at com.sun.enterprise.v3.admin.AdminAdapter.doCommand(AdminAdapter.java:459)
at com.sun.enterprise.v3.admin.AdminAdapter.service(AdminAdapter.java:209)
at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:168)
at com.sun.enterprise.v3.server.HK2Dispatcher.dispath(HK2Dispatcher.java:117)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:238)
at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828)
at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725)
at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019)
at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225)
at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: /cygdrive/c/sun/glassfish3/nodes/<node-name>/
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$RemoteHelper.mkdirs(SecureAdminBootstrapHelper.java:268)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.mkdirs(SecureAdminBootstrapHelper.java:178)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper.bootstrapInstance(SecureAdminBootstrapHelper.java:168)
... 28 more
Caused by: com.trilead.ssh2.SFTPException: No such file (SSH_FX_NO_SUCH_FILE: A reference was made to a file which does not exist.)
at com.trilead.ssh2.SFTPv3Client.statBoth(SFTPv3Client.java:441)
at com.trilead.ssh2.SFTPv3Client.lstat(SFTPv3Client.java:471)
at com.sun.enterprise.v3.admin.cluster.SecureAdminBootstrapHelper$RemoteHelper.mkdirs(SecureAdminBootstrapHelper.java:266)
... 30 more

#]

'/cygdrive/c/sun/glassfish3/nodes/<node-name>/' does not exist on the 'Windows' server

unsing 'ls -ld /cygdrive/c/sun/glassfish3/nodes' shows it is owned by the 'runtime-user' which is/should be used by GlassFish

Comment by Byron Nevins [ 16/Nov/11 ]

SSH issue

Comment by Joe Di Pol [ 29/Nov/11 ]

Running instances and the DAS on systems with different OS types is not supported. That may be contributing to the problem. That said we should investigate this to see what is going on. One workaround to try is to not specify the nodedir at all – it will default to the nodes directory under the installdir.

I'm also surprised using the cygwin posix path did not work.

I'm lower the priority because this is technically not a supported configuration.

Comment by bthalmayr [ 29/Nov/11 ]

I don't think it matters if the DAS receides on a different OS or not. Cygwin (or MKS) have to be used anyway in Windows environment.

Every sample I've seen so far (on the numerous wikis) does not specify --nodedir option. Has it really been tested?

I can confirm that using Cygin-style path and not specifying --nodedir option works.

BTW could you please point me to the location in the docs where it's mentioned that running the servers on different OSes is not a supported configuration?

Comment by Joe Di Pol [ 29/Nov/11 ]

The Deployment Planning Guide at http://docs.oracle.com/cd/E18930_01/html/821-2419/abfay.html#abfbc
has this note:

"Note - All hosts in a cluster on which the DAS and GlassFish Server instances are running must have the same operating system."

And yes we have tests that run with nodedir so it does work, at least in the scenarios we are testing.

I'll investigate this further.

Comment by Joe Di Pol [ 30/Nov/11 ]

What is happening is that the DAS (running on unix) interprets the DOS style nodedir path, "c:\sun\glassfish3\nodes", as a relative path. So it prepends the installdir to it and ends up with a bogus path.

Using the cygwin path works around this problem, but then fails because (I think) the DAS uses the scp or sftp client to copy over some data to the instance, and scp/sftp may not result in a cygwin shell at the DOS end and therefore the cygwin path is not understood.

Running the DAS on the same OS as the instance avoids these problems, and that's why that is the supported configuration.

We could use a heuristic to detect DOS paths when running on Unix, but that would be a bit fragile.

In any case, this is a lower priority bug since it is essentially an unsupported configuration.





[GLASSFISH-17727] Incorrectly named Windows service wrapper files created by create-service --name=<service-name> Created: 15/Nov/11  Updated: 13/Dec/11  Resolved: 13/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b07
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: Paul Davies Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

According to the create-service(1) man page, if the --name=service-name option is specified, the subcommand creates Windows service wrapper files that are named as follows:

  • Configuration file: service-nameService.xml
  • Executable file: service-nameService.exe

However, the create-service subcommand appears to ignore this option when naming the files:

asadmin>  create-service --name pmd_domain domain1
...
Command create-service executed successfully.

Directory of C:\glassfish3\glassfish\domains\domain1\bin

11/14/2011  04:13 PM    <DIR>          .
11/14/2011  04:13 PM    <DIR>          ..
11/14/2011  04:13 PM            30,208 domain1Service.exe
11/14/2011  04:13 PM             3,124 domain1Service.xml
               2 File(s)         33,332 bytes
               2 Dir(s)  86,851,964,928 bytes free

According to the create-service(1) man page, these files should be named pmd_domainService.exe and pmd_domainService.xml.



 Comments   
Comment by Byron Nevins [ 13/Dec/11 ]

Done.

d:\gf\branches\3.1.2\admin>svn commit d:\gf\trunk\main\nucleus\admin d:\gf\branches\3.1.2\admin
Sending D:\gf\branches\3.1.2\admin\server-mgmt\src\main\java\com\sun\enterprise\admin\servermgmt\services\LinuxService.java
Sending D:\gf\branches\3.1.2\admin\server-mgmt\src\main\java\com\sun\enterprise\admin\servermgmt\services\WindowsService.java
Sending D:\gf\trunk\main\nucleus\admin\server-mgmt\src\main\java\com\sun\enterprise\admin\servermgmt\services\WindowsService.java
Transmitting file data ...
Committed revision 51522.





[GLASSFISH-17680] DCOM provisioning subcommands prompt twice for the Windows User password Created: 09/Nov/11  Updated: 08/Dec/11  Resolved: 08/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b07
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Paul Davies Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows


Tags: 3_1_2-approved

 Description   

Subcommands for provisioning DCOM nodes that prompt for a password, such as create-node-dcom and validate-dcom, prompt for the Windows user's password twice:

asadmin> validate-dcom host1
Enter AS_ADMIN_WINDOWSPASSWORD password>
Enter AS_ADMIN_WINDOWSPASSWORD password again>

Typically, users are prompted twice when they are setting a password. However, these commands are prompting for the password to authenticate the user against an existing password and. Therefore, to avoid misleading users, these subcommands should prompt for the password only once.



 Comments   
Comment by Byron Nevins [ 08/Dec/11 ]

Currently it is impossible to prompt once.

The choices are:

1) never prompt. Demand a password file only. This is what SSH commands do.
2) prompt twice if no password presented in a file

Analysis:

The password code is all elegantly and invisibly handled by CLI. But currently it ALWAYS prompts twice. The reason is that CLICommand.getPassword() has a third boolean arg, 'create'. I need to set that boolean to false but it's impossible because it is called from CLICommand.initializeCommandPassword() with a hard-coded 'true' for the create option.

The way the ssh commands handle this is to never prompt for the password. The password is marked as optional. This forces CLICommand to not prompt for it. If the password isn't in a password file then it is a hard error.

DCOM commands, on the other hand, set optional to false so CLICommand will prompt twice for it.

A possible solution is to add a new option to @Param --> "createpassword".
This would probably be hairy (risky) to do right now for 3.1.2.

Comment by Byron Nevins [ 08/Dec/11 ]

Now the password for validate-dcom and create-node-dcom MUST be supplied in a password file

d:\gf\branches\3.1.2\cluster>svn commit -F D:\gf\svn-commit.2.tmp d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\copyy.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\ValidateDcom.java
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
Transmitting file data .........
Committed revision 51387.





[GLASSFISH-17580] Create-node-dcom failed with AS_ADMIN_DCOMPASSWORD and required AS_ADMIN_WINDOWSPASSWORD Created: 03/Nov/11  Updated: 09/Dec/11  Due: 03/Nov/11  Resolved: 09/Dec/11

Status: Closed
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b07
Fix Version/s: 3.1.2_b02, 4.0

Type: Bug Priority: Major
Reporter: li.wu Assignee: Byron Nevins
Resolution: Invalid Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Bundle: ogs-3.1.2-web-b07-windows-ml.exe
OS: Windows2008 64bit
Server locale: zh_TW


Attachments: JPEG File create-node-dcom_fail.jpg     JPEG File create-node-dcom_succ.jpg    

 Description   

1. Install the bundle and start domain1;
2. Edit c:\glassfish3\glassfish\bin\gfpass file as:
"AS_ADMIN_PASSWORD=
AS_ADMIN_DCOMPASSWORD=welcome".
3. Run cmd: asadmin -W gfpass create-node-dcom --nodehost localhost node4. The cmd becomes interactive and requires AS_ADMIN_WINDOWSPASSWORD. I enter AS_ADMIN_WINDOWSPASSWORD as "test",then node4 is created.
4. Check domain.xml,there is " <ssh-auth password="test" /> " for node4. The password is AS_ADMIN_WINDOWSPASSWORD,not AS_ADMIN_DCOMPASSWORD.
5. Edit c:\glassfish3\glassfish\bin\gfpass file as:
"AS_ADMIN_PASSWORD=
AS_ADMIN_DCOMPASSWORD=welcome
AS_ADMIN_WINDOWSPASSWORD=" .
6. Run cmd: asadmin -W gfpass create-node-dcom --nodehost localhost node5. The cmd failed because of missing DCOM password. Pls check the pictures attached.



 Comments   
Comment by Byron Nevins [ 03/Nov/11 ]

– just a local string
=========================

d:\gf\branches\3.1.2\cluster>svn commit -F \temp\commit.txt d:\gf\trunk\main\nucleus\cluster\admin d:\gf\branches\3.1.2\cluster\admin
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\LocalStrings.properties
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\LocalStrings.properties
Transmitting file data ..
Committed revision 50648.

Comment by li.wu [ 02/Dec/11 ]

The issue reproduced for java_ee_sdk-6u4-web-b12-jdk7-windows-x64-ml.exe on Windows7 x64. Is it fixed on b12? Or later build?

Comment by Byron Nevins [ 09/Dec/11 ]

You must have a very old build. November 3 was when the change went in. The String "DCOMPASSWORD" does not exist in the source code anywhere anymore.





[GLASSFISH-17447] install-node-ssh should report error not success in this case Created: 20/Oct/11  Updated: 02/Nov/11  Resolved: 02/Nov/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b07, 4.0

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Yamini K B
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Say you have an old version of GlassFish installed on a remote host. You want to upgrade the bits to the latest version.

You run "asadmin install-node-ssh ...."

Result:
Nothing is done but it reports success.

Why? There is a test:

if (checkIfAlreadyInstalled(host, sshInstallDir))
continue;

– it checks if "asadmin version" runs successfully. If so it silently returns without doing anything. No warning either. It is officially a success.

Recommended Fix:

Report an ERROR – the node is already installed. Tell user to run uninstall-node first.



 Comments   
Comment by Yamini K B [ 02/Nov/11 ]

3.1.2 - r50558
4.0 - r50559





[GLASSFISH-17406] Add Windows Domain Support to update-node-dcom Created: 11/Oct/11  Updated: 11/Oct/11  Resolved: 11/Oct/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b02, 4.0

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

.



 Comments   
Comment by Byron Nevins [ 11/Oct/11 ]

Complicated choices: On the one hand refactor code into common base class/ On the other hand param names have "ssh" wired into them.
Devised a solution that reuses as much code as possible yet is fairly understandable...

d:\gf\main\nucleus\cluster\admin>svn commit -F \temp\commit.txt
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeRemoteCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeSshCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\UpdateNodeDcomCommand.java
Transmitting file data .....
Committed revision 50171.

d:\3.1.2\cluster\admin>svn commit -F \temp\commit.txt
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeRemoteCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\UpdateNodeSshCommand.java
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\UpdateNodeDcomCommand.java
Transmitting file data .....
Committed revision 50172.





[GLASSFISH-17402] Need more validation in 2 commands using DCOM Created: 10/Oct/11  Updated: 07/Dec/11  Resolved: 07/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b12

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

2 commands:

install-node
uninstall-node

if in "DCOM mode" then fail the command if sshport and/or sshkeyfile are specified as options since they don't apply to DCOM






[GLASSFISH-17401] setup-dcom change an option name Created: 10/Oct/11  Updated: 24/Oct/11  Resolved: 11/Oct/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b01, 4.0
Fix Version/s: 3.1.2_b02, 4.0

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

change "--domain" to "--windowsdomain" to match other commands



 Comments   
Comment by Byron Nevins [ 11/Oct/11 ]

After the fix:

d:\3.1.2\cluster\admin>asadmin setup-dcom --x
Invalid option: --x
Usage: asadmin [asadmin-utility-options] setup-dcom
[--dcomuser <dcomuser(default:$

{user.name}

)>]
[--windowsdomain <windowsdomain>]
[--remotetestdir <remotetestdir(default:C:\)>]
[?|-help[=<help(default:false)>]] host

============================
d:\3.1.2\cluster\admin>svn commit -F \temp\commit.txt
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\SetupDcom.java
Transmitting file data .
Committed revision 50173.

d:\gf\main\nucleus\cluster\admin>svn commit -F \temp\commit.txt
Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\SetupDcom.java
Transmitting file data .
Committed revision 50174.





[GLASSFISH-17400] update-node-dcom - get rid of ssh from option names Created: 10/Oct/11  Updated: 07/Dec/11  Resolved: 07/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b12

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

Adding an alias for --sshuser --dcomuser is not visible from auto-generated help.

It would be nice to change --sshuser to --remoteuser and then give an alias of "--sshuser".

opinions? If I don't hear any feedback I'll re-work the code to get rid of all options with "ssh" in them and add duplicate options that have "dcom" embedded in their names.



 Comments   
Comment by Byron Nevins [ 10/Oct/11 ]

I have no idea where the strikeouts in the above comment came from?!?

d:\gf\main\nucleus\cluster\cli>asadmin update-node-dcom --x
Invalid option: --x
Usage: asadmin [asadmin-utility-options] update-node-dcom
[--nodehost <nodehost>] [--installdir <installdir>]
[--nodedir <nodedir>] [--sshport <sshport>] [--sshuser <sshuser>]
[--sshkeyfile <sshkeyfile>] [--force[=<force(default:false)>]]
[?|-help[=<help(default:false)>]] name

Comment by Byron Nevins [ 11/Oct/11 ]

Final Decision:

change the names of the options --dcomuser





[GLASSFISH-17395] DCOM - delete-instance is not deleting files Created: 07/Oct/11  Updated: 09/Dec/11  Resolved: 08/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

It SAYS it deleted files but they are still there.



 Comments   
Comment by Byron Nevins [ 08/Dec/11 ]

Fixed.
Note that on Windows if you have Explorer or a command prompt or ANYTHING holding on to the file – they won't get deleted.





[GLASSFISH-17394] SSH: Very Long Pointless Wait Created: 07/Oct/11  Updated: 02/Nov/11  Resolved: 02/Nov/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b07, 4.0

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Yamini K B
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Ref: svn#50151

Say the user already has GF installed on a remote node. Now he runs create-node-ssh and doesn't set force to true. Here is what happens:

1) create a huge (86MB+ ) zip file which takes a long time to create ~~ a minute?
2) Now check and see that the install dir already exists
3) throw an exception and tell the user about the problem
4) delete the zip file from step (1)

=========

Fix is to move step (1) AFTER step (2).
I did this for DCOM but not for SSH.
Look for the precopy() method



 Comments   
Comment by Byron Nevins [ 07/Oct/11 ]

I didn't make it clear that this bug covers both these cases:

1) install-node
2) create-node-ssh --install=true





[GLASSFISH-17393] create --install --> chicken and egg problem Created: 07/Oct/11  Updated: 09/Dec/11  Resolved: 09/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14, 4.0_b14

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Tags: 3_1_2-approved

 Description   

d:\gf>asadmin create-node-dcom -W \pw --nodehost wnevins-lnr --install=true lnr
remote failure: Warning: some parameters appear to be invalid.
DCOM node not created. To force creation of the node with these parameters rerun the command using the --force option.
Could not find a remote Glassfish installation on host: wnevins-lnr at D:\glassfish3\glassfish

Well – duh! I asked you to create a remote GlassFish installation!

work-around == install-node



 Comments   
Comment by Byron Nevins [ 09/Dec/11 ]

the --install option is problematic. We should never have supported this!

Anyways – now it doesn't require that GF be installed remotely when it is going to install remotely!

d:\gf\branches\3.1.2\cluster>svn commit d:\gf\trunk\main\nucleus\cluster d:\gf\branches\3.1.2\cluster
Sending D:\gf\branches\3.1.2\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Sending D:\gf\branches\3.1.2\cluster\compare.bat
Sending D:\gf\branches\3.1.2\cluster\copyy.bat
Sending D:\gf\branches\3.1.2\cluster\setfiles.bat
Sending D:\gf\trunk\main\nucleus\cluster\admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
Transmitting file data .....
Committed revision 51438.





[GLASSFISH-17384] DCOM - Finish support for Windows Domains Created: 06/Oct/11  Updated: 11/Oct/11  Resolved: 09/Oct/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2, 4.0

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Finish up implementation of Windows domain support.
The domain is needed for running remote processes but unnecessary for remote File I/O
It is in the Node config object

It should be supported in create-node-dcom for instance

High priority to keep it obvious. THis ought to be fixed by 10/14/11



 Comments   
Comment by Byron Nevins [ 09/Oct/11 ]
      • trunk ***
        d:\gf\main\nucleus\cluster\admin>svn commit
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\CreateNodeSshCommand.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\CreateRemoteNodeCommand.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
        Transmitting file data ....
        Committed revision 50154.
      • 3.1.2 Branch ***
        d:\3.1.2\cluster\admin>svn commit
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\CreateNodeSshCommand.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\CreateRemoteNodeCommand.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\NodeUtils.java
        Sending admin\src\main\java\com\sun\enterprise\v3\admin\cluster\dcom\CreateNodeDcom.java
        Transmitting file data ....
        Committed revision 50155.
Comment by Byron Nevins [ 09/Oct/11 ]

Done.





[GLASSFISH-17375] stop-instance: Don't look for local pid file when the instance is remote Created: 30/Sep/11  Updated: 09/Dec/11  Resolved: 02/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: 3.1.2_b14

Type: Bug Priority: Major
Reporter: Byron Nevins Assignee: carlavmott
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Very easy to reproduce:

1. brand-new domain. No instances
2. Create >= 1 *remote* instance
3. start the instance
4. Call asadmin stop-instance
The code below skips all of the ssh/dcom code below it for no good reason.

InstanceDirs throws an IOException because there is no nodes directory since there are no local instances/nodes.

Anyways – it should not be looking locally for files for a node that is remote!

Look for the XXXXXXXXXX

// we think the instance is down but it might not be completely down so do further checking
// get the node name and then the node
// if localhost check if files exists
// else if SSH check if file exists on remote system
// else can't check anything else.
String nodeName = instance.getNodeRef();
Node node = nodes.getNode(nodeName);
String nodeHost = node.getNodeHost();
InstanceDirUtils insDU = new InstanceDirUtils(node, serverContext);
try

{ pidFile = new File (insDU.getLocalInstanceDir(instance.getName()) , "config/pid"); }

catch (java.io.IOException eio)

{ // could not get the file name so can't see if it still exists. Need to exit return; // XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX }

// this should be replaced with method from Node config bean.



 Comments   
Comment by Joe Di Pol [ 30/Sep/11 ]

This:

        InstanceDirUtils insDU = new InstanceDirUtils(node, serverContext);
        try {
            pidFile = new File (insDU.getLocalInstanceDir(instance.getName()) , "config/pid");
        } catch (java.io.IOException eio){
            // could not get the file name so can't see if it still exists.  Need to exit
            return;
        }

should be moved to the "if (node.isLocal()){" case.

Comment by Byron Nevins [ 30/Sep/11 ]

Not that simple. SSH is also using "pidFile". You have t dig it out of the node and handle null nodedir, etc.

Comment by carlavmott [ 04/Nov/11 ]

I ran the following test

asadmin create-node-ssh --nodehost adc2190111 --installdir /scratch/gf/cmott/glassfish3 nodeA
asadmin create-instance --node nodeA ins1
asadmin start-instance ins1
asadmin stop-instance ins1

I set a break point in stop-instance and looked at what getLocalInstanceDir returned and see that the path is the instance dir on the remote machine. It seems that this is working. Did I miss something?

Comment by Byron Nevins [ 07/Nov/11 ]

Yes you missed something!

You probably were not following the instructions exactly. Did you do the first step:

1. brand-new domain. No instances
?

======

Do this:

1. get rid of any local instances. Make sure you have no "nodes" directory on DAS.

2. Now the code is guaranteed to throw an IOException and all of that SSH code in that block will not run. There will be no SSH calls to the remote machine asking if the pid file exists.

========

InstanceDirs will not put up with any errors. It is immutable and is either totally valid and reliable or – empty garbage. Its constructor is where the IOException is coming from.

So if you have no "nodes" folder, then InstanceDirs will definitely throw an IOException and your code will return and never look remotely by using SSH.

In any case InstanceDirs is exclusively used for local paths – not remote paths.
==========

Recommended fix – look at how I did it for DCOM (just below the SSH block)

You have to write some code that will create the full path of the pid file relative to the remote computer.

Comment by carlavmott [ 08/Nov/11 ]

Yes I have a brand new domain.

Just to be sure I deleted my installation of glassfish and reinstalled. I still don't see the problem. I then reinstalled again and ran with the debugger so I could see the actual file that it was looking for and it was correct. I'm running a build from the trunk. Is the code for stop-instance different in the trunk and 3.1.2? What are the exact steps to recreate this?

cd glassfish3/glassfish/bin/
glassfishs-macbookpro53:bin cmott$ ls ..
bin config docs domains legal lib modules osgi
glassfishs-macbookpro53:bin cmott$ ls ../nodes
ls: ../nodes: No such file or directory
glassfishs-macbookpro53:bin cmott$ asadmin start-domain
Waiting for domain1 to start ................
Successfully started the domain : domain1
domain Location: /Users/cmott/gf-v3/glassfish3/glassfish/domains/domain1
Log File: /Users/cmott/gf-v3/glassfish3/glassfish/domains/domain1/logs/server.log
Admin Port: 4848
Command start-domain executed successfully.
glassfishs-macbookpro53:bin cmott$ asadmin create-node-ssh --nodehost adc2190111 --installdir /scratch/gf/cmott/glassfish3 nodeA
Command create-node-ssh executed successfully.
glassfishs-macbookpro53:bin cmott$ asadmin create-instance --node nodeA ins1
Command _create-instance-filesystem executed successfully.
Port Assignments for server instance ins1:
JMX_SYSTEM_CONNECTOR_PORT=28686
JMS_PROVIDER_PORT=27676
HTTP_LISTENER_PORT=28080
ASADMIN_LISTENER_PORT=24848
JAVA_DEBUGGER_PORT=29009
IIOP_SSL_LISTENER_PORT=23820
IIOP_LISTENER_PORT=23700
OSGI_SHELL_TELNET_PORT=26666
HTTP_SSL_LISTENER_PORT=28181
IIOP_SSL_MUTUALAUTH_PORT=23920
The instance, ins1, was created on host adc2190111
Command create-instance executed successfully.
glassfishs-macbookpro53:bin cmott$ asadmin start-instance ins1
Waiting for ins1 to start .............
Successfully started the instance: ins1
instance Location: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1
Log File: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1/logs/server.log
Admin Port: 24848
Command start-local-instance executed successfully.
The instance, ins1, was started on host adc2190111
Command start-instance executed successfully.
glassfishs-macbookpro53:bin cmott$ asadmin stop-instance ins1
The instance, ins1, is stopped.
Command stop-instance executed successfully.

From the server log:

[#|2011-11-07T14:51:01.576-0800|INFO|44.0|org.hibernate.validator.util.Version|_ThreadID=12;_ThreadName=Thread-2;|Hibernate Validator 4.1.0.Final|#]

[#|2011-11-07T14:51:01.587-0800|INFO|44.0|org.hibernate.validator.engine.resolver.DefaultTraversableResolver|_ThreadID=12;_ThreadName=Thread-2;|Instantiated an instance of org.hibernate.validator.engine.resolver.JPATraversableResolver.|#]

[#|2011-11-07T14:51:17.324-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=13;_ThreadName=Thread-2;|Command _validate-das-options executed successfully.|#]

[#|2011-11-07T14:51:19.089-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=13;_ThreadName=Thread-2;|Command _create-instance-filesystem executed successfully.|#]

[#|2011-11-07T14:51:46.630-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=12;_ThreadName=Thread-2;|Waiting for ins1 to start .............
Successfully started the instance: ins1
instance Location: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1
Log File: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1/logs/server.log
Admin Port: 24848
Command start-local-instance executed successfully.|#]

[#|2011-11-07T14:51:56.913-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=12;_ThreadName=Thread-2;|Instance ins1 shutdown initiated|#]

I then reinstalled glassfish on both machines and started the server with debug and set a break point in
InstanceDirUtils in methode getLocalInstanceDir

InstanceDirs instanceDirs = new InstanceDirs(nodeDirFile.toString(), node.getName(), instance);

nodeDirFile shows the following:
/scratch/gf/cmott/glassfish3/glassfish/nodes

Which is what I expected.

The server output shows (note I started and stopped the instance twice):

[#|2011-11-07T14:58:57.595-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=13;_ThreadName=admin-listener(1);|Command _create-instance-filesystem executed successfully.|#]

[#|2011-11-07T14:59:51.660-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=12;_ThreadName=admin-listener(2);|Waiting for ins1 to start ...........
Successfully started the instance: ins1
instance Location: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1
Log File: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1/logs/server.log
Admin Port: 24848
Command start-local-instance executed successfully.|#]

[#|2011-11-08T14:02:58.913-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=13;_ThreadName=admin-listener(1);|Instance ins1 shutdown initiated|#]

[#|2011-11-08T14:03:28.653-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=12;_ThreadName=admin-listener(2);|CLI801 Instance is already synchronized
Waiting for ins1 to start .........
Successfully started the instance: ins1
instance Location: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1
Log File: /scratch/gf/cmott/glassfish3/glassfish/nodes/nodeA/ins1/logs/server.log
Admin Port: 24848
Command start-local-instance executed successfully.|#]

[#|2011-11-08T14:03:33.597-0800|INFO|44.0|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin.cluster|_ThreadID=13;_ThreadName=admin-listener(1);|Instance ins1 shutdown initiated|#]

Comment by Joe Di Pol [ 18/Nov/11 ]

I've confirmed the problem is still there. To see the problem you really need to step through with the debugger. To reproduce:

1) New install, no instances.
2) Start the domain in --debug mode. Attach a debugger. Set a breakpoint in StopInstanceCommand.execute()
3) Create an SSH node for a remote system
4) Create an instance on that SSH node
5) start the instance
6) stop the instance

In the debugger step through the code. You will see that this line throws an IO exception because down in insDU.getLocalInstanceDir() it assumes you are operating on local paths, and you are not. You are operating on a path for the remote system which may not exist on the local system.

        } else if (node.getType().equals("SSH")) {
            try {
                pidFile = new File (insDU.getLocalInstanceDir(instance.getName()) , "config/pid");
            } catch (java.io.IOException eio){

To fix this we need to generate the path without using InstanceDirUtils

Comment by carlavmott [ 30/Nov/11 ]

The issue is that all the code that was added to get the instance directory name was written such that it assumes a local node and not that the node could be on another machine. The code checks the validity of the path as it is built so there are many places where this will fail in the case where the node is remote.

How important is it to check that the path being built is correct as it is built? Can we build the path and check that it is correct once it is built? I tried to remove the checks but got to a point where some test was failing. I'm still investigating if all the checks are necessary for building the path.

Comment by carlavmott [ 02/Dec/11 ]

I have updated the code to create the path based on the information in the node. I check that the config directory exists before looking for the pid file since I can not tell if an error is because the pid file has been deleted as expected or if the path to the pid file is incorrect. If the config dir exists then I look to see if the pid file exists.

Comment by carlavmott [ 02/Dec/11 ]

marking this bug as fixed as code has been integrated.





[GLASSFISH-17373] DCOM start-cluster Created: 29/Sep/11  Updated: 24/Oct/11  Resolved: 30/Sep/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1.2_b01
Fix Version/s: 3.1.2_b02, 4.0

Type: Bug Priority: Critical
Reporter: Byron Nevins Assignee: Byron Nevins
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

start-instance works perfectly over DCOM
start-cluster seems to get stuck with just one started instance.

I created a cluster with 3 instances.
ran start-cluster.
1 instance started and then the command hung (I didn't wait 10 minutes for it to timeout)

then I started the other 2 instances with start-instances

THEN start-cluster finished and reported success



 Comments   
Comment by Byron Nevins [ 30/Sep/11 ]

What fun. Distributed Concurrency Bug!

I was always using the same file to store the auth-token. But the cluster commands run start or stop commands in parallel.

Fix - simple. Make sure the filename is unique.





[GLASSFISH-17327] Add devtests for update-node-ssh Created: 21/Sep/11  Updated: 09/Oct/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

There are no devtests for update-node-ssh. I don't know about the other commands but we ought to have tests for all of them.

I found this out the usual way. I ran update-node-ssh and discovered I had broken something. I assumed the SSH devtests was exercising the command. Not so.

create-node-ssh
delete-node-ssh
list-nodes-ssh
ping-node-ssh
setup-ssh
update-node-ssh






[GLASSFISH-17311] SSH - Junk Processes Can Pile Up Created: 16/Sep/11  Updated: 16/Nov/11

Status: Open
Project: glassfish
Component/s: distributed management
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Byron Nevins Assignee: Joe Di Pol
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

CreateRemoteNodeCommand (which until recently was CreateSshNodeCommand) creates an asadmin process, and sets a (huge!) timeout.

ProcessManager does not kill the process. The caller needs to do that. But CreateRemoteNodeCommand does NOT kill the hung process. If you start something that hangs then the process will live on forever – or at least until the next reboot.

Solution:

Easy! Don't just catch ProcessManagerException. Also catch ProcessManagerTimeoutException – this tells you that it timed out. Now destroy the spawned process.

– I'd just add the change but I'm not 100% positive if you had some reason for letting the process run on forever???



 Comments   
Comment by Byron Nevins [ 16/Sep/11 ]

Note that this is easy to reproduce – simply call

asadmin create-node-ssh

on a Windows machine that has no SSH daemon

Comment by Byron Nevins [ 03/Nov/11 ]

Note that the issue only occurs when the --install option is given.

Comment by Byron Nevins [ 03/Nov/11 ]

ProcessManager definitely calls destroy() on the Process object before throwing a timeout exception.

I think I originally saw this problem when I was killing asadmin – the timeout is a full 5 minutes after all! There is no code that
kills launched processes when the caller is abruptly killed. On WIndows.

To fix this well would require a shutdown hook.

Comment by Byron Nevins [ 03/Nov/11 ]

This may be obscure enough and difficult enough to drop to P4. I.e. the payoff isn't worth the effort, IMO.





[GLASSFISH-16248] Implement Windows alternative to SSH Created: 22/Mar/11  Updated: 06/Dec/11  Resolved: 06/Dec/11

Status: Resolved
Project: glassfish
Component/s: distributed management
Affects Version/s: 3.1
Fix Version/s: 3.1.2_b13

Type: Improvement Priority: Critical
Reporter: Joe Di Pol Assignee: Byron Nevins
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-17624 add support for validate-dcom Resolved
Tags: 3_1_2-review, dcom

 Description   

We currently require an SSH service to be installed on Windows to support SSH nodes. So that means a customer must install and configure Cygwin sshd or MKS ssh service.

It would be good if we had a more Windows friendly solution. A couple possibilities:

1) Provide our own SSHD service written in Java that is easy to configure and run. Apache Mina SSHD could be an option in this area.

2) Use DCOM. Some possible DCOM Java libraries to consider:
J-interop: http://www.j-interop.org/
Jcifs: http://jcifs.samba.org/



 Comments   
Comment by Tom Mueller [ 06/Apr/11 ]

Good to have for 3.2

Comment by Byron Nevins [ 03/May/11 ]

Umbrella RFE

Comment by scatari [ 06/Dec/11 ]

With DHQA of DCOM support complete as of 3.1.2 B13, I am marking this RFE as fixed.





Generated at Tue Jun 30 17:39:32 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.