Issue Details (XML | Word | Printable)

Key: GLASSFISH-18236
Type: Bug Bug
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Major Major
Assignee: Tim Quinn
Reporter: andydr
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
glassfish

Remote startup of an application client via webstart hangs for several minutes

Created: 23/Jan/12 12:11 AM   Updated: 20/Apr/12 08:37 PM   Resolved: 20/Apr/12 08:37 PM
Component/s: standalone_client
Affects Version/s: 3.1.1_b12, 3.1.1
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. File applicationTest.jnlp (31 kB) 25/Jan/12 08:12 AM - andydr
2. Text File client_javaconsole_output.txt (27 kB) 23/Jan/12 11:49 PM - andydr
3. Text File client_javaconsole_output2.txt (27 kB) 25/Jan/12 08:12 AM - andydr
4. Text File server_process.txt (6 kB) 23/Jan/12 11:49 PM - andydr
5. File VisualVmSnapshot.nps (41 kB) 25/Jan/12 06:54 PM - andydr
6. File VisualVmSnapshot2.nps (1.90 MB) 25/Jan/12 08:11 PM - andydr

Environment:

server-side: CentOS 6, jdk 1.6.0_30-b12
client-side: Windows Server 2003, jdk 1.7.0_2-b13


Tags: 3_1_2-exclude
Participants: andydr and Tim Quinn


 Description  « Hide

An application client in an EAR shows up on the server-side within some seconds. But if the client is called remotely, it takes about 6 minutes until the webstart process on the client-side starts.

The firewall settings are correct, since it works. Unfortunately no exception is written to the server.log file. No ORB activity on standard listener 3700 during the first minutes. I appreciate any hint, thanks!



Tim Quinn added a comment - 23/Jan/12 02:03 PM

Assigning to me, at least for the moment.

Do you see the same long delay the second and later times you launch the client from the same remote system?

During the first launch there are many large JAR files that need to be downloaded. This is mostly because the packaging of classes into JAR files has not been optimized to minimize the client-side footprint.

But, Java Web Start on the client side caches those files so once they have been downloaded once subsequent launches should be much faster.


andydr added a comment - 23/Jan/12 07:24 PM

Yes, I see the same delay always (and the webstart cache is activated, so the glassfish JARs won't be downloaded each time). The operating system process runs without any activity for about 6 minutes - even with a simple TestApplication that shows just a message dialog.


Tim Quinn added a comment - 23/Jan/12 08:47 PM

OK, we need to pin down exactly where the delay is occurring.

Have you, by chance, tried launching this same app client using the appclient command from the remote system? If so, do you experience the same delay? If not, can you try that?

To try this, the easiest approach is to temporarily install GlassFish on the remote system. Then on the remote system you can use

asadmin get-client-stubs --appname TestApp localdir

(I am assuming here that TestApp is the name of the app you deployed which contains the app client. You can use

asadmin list-applications

to double-check the name.)

This will create a small directory tree at localdir containing some downloaded files. Then you can run

appclient -client localdir/TestAppClient.jar

which will launch the client.

If you see the same delay, then it will be easier to diagnose using appclient. After you are sure the client is stalled (wait 20-30 seconds just to be sure) you can get a thread dump by pressing Ctrl-Break in the same window (since you are on a Windows client) and attach the output to this issue.

If you don't see the same delay during an appclient launch that you see during Java Web Start, then you'll need to turn on Java Web Start tracing. Take a look at this blog: http://java.sun.com/developer/technicalArticles/J2EE/jws-glassfish/part4.html About half-way down you will see how to use the Java Control Panel to turn on Java Web Start tracing and to turn on the Java console display for each Java Web Start launch. Then, when you launch the app client using Java Web Start you should see a separate console window open which will display logging messages as the launch proceeds. Assuming there are no errors reported there wait for the launch to stall as before, then in the console window press "v" which should trigger a thread dump.

When you turn on tracing all the console output is also sent to a trace file (see the blog to find out where the file will be - the blog is fairly old at this point but I think the info is still accurate). If the app client exits for some reason and the console window disappears you can look in the trace file.

Just as important is getting a server-side thread dump when the client launch stalls. You can trigger that by using

kill -3 pid-of-GlassFish-server

which will write the server's thread dump to the server.log file.

Once you have them please add the thread dump part of the server.log file and the client-side thread dump as attachments to this issue and we'll investigate further.


Tim Quinn added a comment - 23/Jan/12 09:43 PM

Andy,

I forgot an important preliminary step...

Have you tried this with GlassFish 3.1.2? There have been some changes from 3.1.1 to 3.1.2 in various areas that might affect this, and 3.1.2 is planned to be out soon. You can download promoted builds (even though 3.1.2 is not yet released) from here:

http://dlc.sun.com.edgesuite.net/glassfish/3.1.2/promoted/


andydr added a comment - 23/Jan/12 11:59 PM

Tim, I downloaded the latest 3.1.2 snapshot (b18) and did all tests with this version.

During the delay I tried to kill the server process as you recommended (-3 option), but the stacktrace was not written to the server.log and the process was not killed (see file server_process.txt).

The client side stacktrace is filed in client_javaconsole_output.txt. I made a snapshot during the delay and after the startup. In line 251 the client starts finally, so whats about the derby stuff - the application doesn't use derby.

As a next step I'll try your appclient-hint on the remote system.


andydr added a comment - 24/Jan/12 12:10 AM

The exported client stubs and the appclient-command work as expected and without any delay. So the problem seem to be related to the webstart technology?


Tim Quinn added a comment - 24/Jan/12 04:33 PM

Andy, thanks for doing the tests and reporting the results.

The kill command is not intended to kill the process; the -3 should trigger the thread dump but leave the process running. I do not know why the thread dump did not work.

The Derby message is informational and happens whether the client uses Derby or not.

The first thread shown in the first stack trace shows that GlassFish is busy initializing a number (quite a large number) of internal services and the top-listed frames are in file I/O. In the Java Web Start environment the Java security manager is turned on. I assume you did not turn on the security manager when you launched using the appclient command.

I will try to get access to a Windows system running Java 1.7-02 to see if I see similar delays. Could you try two things?

1. Turn on the Java security manager during your launch. Use appclient -Djava.security.manager rest-of-command. Let's see if that causes a similar delay.

2. If you have a Java 1.6 release on your client system, could you try using that for the Java Web Start launch and we'll see if this is something related to 1.7? You might need to use the Java control panel to temporarily disable 1.7 to force the system to use the earlier release.


Tim Quinn added a comment - 24/Jan/12 11:37 PM

I had GlassFish 3.1.2 b18 running on a Linux system and from a Windows XP Pro system with Java 1.7-02.
I used a very simple app for which the client invokes a single method on an EJB.

I saw no delay in starting the client (once the files were downloaded).

About the thread dump from the client that you posted, I want to make sure I understand something. From the timestamps in the thread dumps it looks as if they were taken about 45 seconds apart. Could you try that experiment again, capturing a thread dump very early in the launch and then again after the application window has appeared?

There have been some other issues opened related to ORB delays in looking up JNDI names (our naming look-ups do use the ORB). I would not expect a difference between the appclient launch and the Java Web Start launch, though, so that does not seem to be the problem you are facing. I'm not sure what the cause is.

Could you also please attach (or e-mail me if you prefer) the main JNLP document for launching your client? One way to get it is to run

javaws -viewer

You should see a list of applications. Find yours, then right click on it and choose Show JNLP File which should open a new window containing the generated JNLP content. Select it all and then paste it into a file to attach to this issue or to e-mail to me.


Tim Quinn added a comment - 24/Jan/12 11:58 PM

Another question, just so I'm clear. You mentioned that there is no activity on port 3700 on the server side. Does the client look up an EJB and then invoke a method on it?

It should not make any difference, but does the client use injection (@EJB) or does it do the look-up itself using a new InitialContext()?


andydr added a comment - 25/Jan/12 08:07 AM

Hi Tim, here some news I tried
1) appclient with security manager enabled - the same, fast startup
2) webstart with JRE 1.7 disabled, instead 1.6.0_b29 - the same, long delay before startup

I attached a new webstart trace with an early snapshot.

Yes, the real client contains @EJB injections, but even a simple dummy client with just a main class that brings up a JOptionPane has such a long delay.


Tim Quinn added a comment - 25/Jan/12 05:43 PM

OK, let's try this to find out where the time is being spent.

Before you launch the client, in another session start jvisualvm.

Then launch the client using Java Web Start. You should see the Java Web Start JVM appear in jvisualvm's left-hand-side list of JVMs.

Note that sometimes Java Web Start will choose to "re-launch." If this happens in your environment, then it's the second Java Web Start JVM you'll be interested in. (On my Mac system Java Web Start displays a dialog box warning me of the self-signed certificate that GlassFish uses to sign the JARs. By the time this dialog box appears the "re-launch" has already occurred.)

If Java Web Start displays the warning dialog box, then this is an ideal time to start profiling. Double-click on the Java Web Start JVM in jvisualvm's JVM list. The right-hand pane should then show some information about that JVM and you should see a Profiler tab. Click on that tab and then on the CPU icon next to the Profile: label. You might see a dialog box from jvisualvm explaining that it is setting up the profiling. Then you should see a few lines appear showing hot spots, times, and invocations which confirms that profiling has started.

Now click OK in the original Java Web Start warning dialog box to let the client actually start. The jvisualvm display should show activity.

After the client has finally gained control and displayed your window, click on the Snapshot button which records the accumulated data. Then click on the icon for Export to... and save the snapshot. Then attach the exported snapshot file to this issue.

Thanks.


andydr added a comment - 25/Jan/12 06:54 PM - edited

I hope the snapshot format is right one. The snapshot show a long duration in org.glassfish.appclient.client.acc.ACCModulesManager.prepareHabitat().

FYI: I started the same test application via Webstart on a Mac box and there was no delay problem (jdk 1.6.0_29).


Tim Quinn added a comment - 25/Jan/12 07:35 PM - edited

Thanks for the snapshot. I was able to read it on this end.

Yes, much time is spent in prepareHabitat. To really understand better, though, we need to break down that time even more. The prepareHabitat method itself is three lines long. The real work is done in method invocations from prepareHabitat.

In the Profiler tab is a Settings check-box. If you check that you see on the lower right-hand side a text area with radio buttons "Profile only classes" and "Do not profile classes." Probably the "Do not profile classes" button is selected and the box probably contains something like "java.*, javax.*, sun.*, sunw.*, com.sun.*"

Based on the thread dumps you provided earlier I suspect that much of the time is going to Windows I/O. If that's correct, then we want to find out just where the time is being spent.

One way is to remove the java.* and com.sun.* entries from the profiler filter. Another way is to assume that the time is indeed going to I/O and to select the "Profile only classes" button and change the filter to org.glassfish.* and java.io.* and com.sun.enterprise.*

The first approach might take longer because far more classes will be instrumented for profiling. The second approach might miss some classes that are important but that we don't know about yet. If you can, please try the first approach and if that does not seem to work for some reason try the second approach.

I wish I could reproduce this problem myself so you would not have to go through all this.


andydr added a comment - 25/Jan/12 08:11 PM

Here is the second snapshot that points to file i/o.


Tim Quinn added a comment - 25/Jan/12 08:59 PM

Indeed. When 46 invocations of File.exists take 6+ minutes something is not right.

I have asked some of our performance experts if they have seen things like this on Windows. My own quick searches have not turned anything up but we are still looking. The files being checked as part of the prepareHabitat are the downloaded GlassFish JARs. These files will be in the Java Web Start cache.

If your cache has a lot of files in it then, perhaps, that contributes to this problem. If that cache contains many files you do not need then perhaps deleting the unused ones might help.

Is the Java Web Start cache directory on your local hard drive, as opposed to a network drive?

If there is anything else "unusual" about the directory or drive where the Java Web Start cache exists then that might contribute to the problem...but I cannot really say what would qualify it as "unusual."


andydr added a comment - 25/Jan/12 11:12 PM - edited

The cache directory is no network drive and contains only the required files of the test application. I also cleared the cache frequently. Even changing the cache directory in the webstart viewer configuration hasn't changed the behavior.

I wrote a simple Java application which called the File.exists() method for each File in the webstart folder. It took about a second on the command prompt. As we've seen earlier the appclient command works properly. That does not call the file i/o methods, right?

In the Sun bug database I found this issue http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6712190 . Can this be linked to my problem, since I run Webstart in a remote desktop client?

The Windows Server 2003 R2 SP2 has a couple of available updates and I'll ask the administrator to get these patches. May be another straw..


Tim Quinn added a comment - 26/Jan/12 12:48 AM

To answer some of your questions...

I think the problem is not with the Java Web Start folder and the cached JARs but rather with accessing those files from a Java Web Start launch, and specifically from the createHabitat (and lower-level) logic.

I do not see a connection between the bug you found and this problem, but I could be wrong.

The Windows updates might make a difference but I'm not very familiar with Windows Server 2003 and what the updates might provide, so I'll be interested if you see any change once the updates have been applied.


andydr added a comment - 26/Jan/12 12:11 PM

Our admin told me that the local drive, where the cache is located, is a virtual drive on a raid 5.


andydr added a comment - 31/Jan/12 03:46 PM

Tim, the system updates didn't solve the file I/O problem unfortunately. A bit weird that it only happens under Webstart, with appclient or file I/O in a standalone application work propertly. Any idea what I can try else?


Tim Quinn added a comment - 31/Jan/12 07:05 PM

Andy,

I do not know what to suggest next but I am still thinking about this. The slowdown occurs as GlassFish is examining the JAR files to find internal services. I suspect that the problem occurs only in Java Web Start launches because Java Web Start accesses the downloaded and cached JAR files through URLs (rather than using File objects), but this is a guess only.

We should look for bugs or discussions about this problem with respect to Java Web Start in general, not just in the GlassFish context, and see if any of those might apply to this situation.

I will continue thinking about this and if I think of other things to try I will note them here.


andydr added a comment - 19/Apr/12 09:54 PM

Hi Tim, the problem vanished during the last weeks and I have no clue, what went wrong. We restarted the server, maybe this was it?! But thanks a lot for your help!


Tim Quinn added a comment - 20/Apr/12 08:37 PM

Thanks for the update. I am going to close this issue, considering that we were never able to reproduce it here and it seems to no longer be a problem for you.

I wish we knew what had changed. If this problem returns please re-open the issue.