Issue Details (XML | Word | Printable)

Key: GLASSFISH-15412
Type: Bug Bug
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Critical Critical
Assignee: Hong Zhang
Reporter: Anissa Lam
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
glassfish

[PERF] processApplication() needs improvement

Created: 03/Jan/11 11:01 AM   Updated: 18/Oct/11 04:00 PM   Resolved: 18/Oct/11 04:00 PM
Component/s: performance
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Dependency
 

Tags: 3_1-exclude 3_1_1-scrubbed
Participants: Anissa Lam, Hong Zhang, ludo, Nazrul, scatari and Tom Mueller


 Description  « Hide

We are looking at the long loading time that Admin Console user is experiencing.
Jason did some measurement, and see that this one line of code

> habitat.getComponent(ApplicationLoaderService.class).processApplication(config, ref, log);

takes 18 seconds. Refer to GLASSFISH-15393.
We need someone to look into this.
I'll start with 'admin', please reassign if needed.



Tom Mueller added a comment - 03/Jan/11 11:04 AM

Hong, can you please take a look at this performance issue with getting the admin console started?


Hong Zhang added a comment - 03/Jan/11 11:13 AM

Jason: can you get numbers for habitat.getComponent(ApplicationLoaderService.class).processApplication(config, ref, log) with v3 on the same system? Wondering if this is a regression in 3.1. or we have always been slow in this part. Thanks.


Hong Zhang added a comment - 03/Jan/11 11:28 AM

Jason: could you also clarify if the problem only happens when the secure admin is enabled? The steps you documented has the secure admin enabled, but not sure if similar problem was also observed when the secure admin is not enabled..


Anissa Lam added a comment - 04/Jan/11 09:07 AM

Providing the info for v3 means we have to checkout and rebuild and rerun v3 with the logTimer added. Not trivial, but we can go through this.
But then the condition that Harshad mentioned in GLASSFISH-15393 isn't supported in v3, so i don't know how useful this can be.
He said:
1. DAS only (secure admin enabled): 30+ sec
2. Single node, two instance cluster (secure admin not enabled): 100+
3. Two node, two instance cluster (secure admin enabled): 200+ sec

However, i will try to get you some number if turning on secure-admin makes any difference, which is your second question.
I am requesting Jason to attach the AdminConsoleAdapter that has the logTiimer call, so we can all compile with that and get the number we want to compare under different situation that we are interested.


Hong Zhang added a comment - 04/Jan/11 09:12 AM

This particular change does not help improve the performance, but it helps to diagnose how the time is spent in the application loading. It enables the deployment tracing in the application loading path through ApplicationLoaderService (which the admin console application loading uses). We previously only had this tracing enabled for regular deployment code path.

How bad is its impact? (Severity)
Help diagnose the performance of admin console application loading

How often does it happen? (Frequency)
The admin console application will be loaded the first time a user acesses the admin console url.

How much effort is required to fix it? (Cost)
Change is available and under review.

What is the risk of fixing it? (Risk)
Very small


Hong Zhang added a comment - 04/Jan/11 09:28 AM

Anissa: I originally thought this issue was about the regression in application loading time of admin console application compared to v3, that's why I asked for comparison with v3. Nazrul later clarified for me that this issue is mainly for the admin console performance degrading in the cluster scenario, and we already had an issue tracking the slow loading of the admin console application on DAS (which is mainly due to the slower loading in web container).
I will check in some changes that enables the deployment tracing for the admin console application loading code path which will give us some idea of how the admin console application loading compares in DAS-only scenario vs cluster scenario. In my initial measurements, I did not see a significant difference between the DAS only and single node/two cluster instance scenarios w.r.t admin console application loading. But once I check in that change, you and Jason can try it out in your environment too to see how the numbers turn out there.


Hong Zhang added a comment - 04/Jan/11 02:00 PM

I have checked in the changes to enable the deployment tracing, to turn on the tracing, add this jvm option to domain.xml after you install the server bits (then start domain access console as usual):
<jvm-options>-Dorg.glassfish.deployment.trace=true</jvm-options>

I have measured the admin console loading time in two scenarios: the DAS-only scenario and the single-node two cluster instance scenario.

1. The DAS only scenario:
Install a fresh server. Add the jvm option to the domain.xml. Start domain and access console. Read the tracing numbers in the server.log.

The admin console application loading took 7.974 s for me. And the main time was spent in the container initialization and loading components in web container: the container initialization took 3.594 s (the time difference for mark 1 and mark 2) and the loading components in web container took 3.334 s (the time difference for mark 3 and mark 4).

M1.[#|2011-01-04T16:04:14.207-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=87;_ThreadName=Thread-1;|Container : com.sun.enterprise.web.WebContainer Mark BEFORE_CONTAINER_SETUP at 878|#]

M2.[#|2011-01-04T16:04:14.209-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=87;_ThreadName=Thread-1;|Mark CONTAINERS_SETUP_DONE at 4472|#]

M3.[#|2011-01-04T16:04:14.214-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=87;_ThreadName=Thread-1;|Container : web Mark START at 4623|#]

M4.[#|2011-01-04T16:04:14.214-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=87;_ThreadName=Thread-1;|Container : web Mark STARTED at 7957|#]

2. The single node, two cluster instance scenario:
Install a fresh server. Add the jvm option to the domain.xml. Start domain, create a cluster with two instances, start cluster, and access console. Read the tracing numbers in the server.log.

The admin console application loading took 8.736s for me. And the main time was also spent in the container initialization and loading components in web container: the container initialization took 3.622 s (the time difference for mark 1 and mark 2) and the loading components in web container took 3.840 s (the time difference for mark 3 and mark 4).

M1.[#|2011-01-04T16:27:01.009-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=135;_ThreadName=Thread-1;|Container : com.sun.enterprise.web.WebContainer Mark BEFORE_CONTAINER_SETUP at 1173|#]

M2.[#|2011-01-04T16:27:01.011-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=135;_ThreadName=Thread-1;|Mark CONTAINERS_SETUP_DONE at 4795|#]

M3.[#|2011-01-04T16:27:01.016-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=135;_ThreadName=Thread-1;|Container : web Mark START at 4880|#]

M4.[#|2011-01-04T16:27:01.017-0500|INFO|glassfish3.1|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=135;_ThreadName=Thread-1;|Container : web Mark STARTED at 8720|#]

Compared the two scenarios, the container initialization in the cluster scenario is about the same as the DAS-only scenario (slightly more), the application loading in the cluster scenario is about 0.5 second more than the DAS-only scenario, which accounts for the main difference of the console application loading time in the two scenarios.

Jason will try the similar in his environment to see how the numbers compare and break down there.


ludo added a comment - 04/Jan/11 04:06 PM

I reach more or less the same conclusion...8-10 sec is slow, but ok for most of us.

We are all testing on Mac or Linux right?

The next step is to use the Sparc t1000 systems (from QE?) and enable the trace flag and see if there is something specific to Sparc in this init process...


Hong Zhang added a comment - 04/Jan/11 05:48 PM

Yes, we will request SQE to try the same thing in his environment. And one thing to keep in mind is that the numbers SQE measured in issue 15393 is the overall console accessing time which includes display screens etc, and this particular sub issue is focusing on the application loading time, so even the application loading time is acceptable, there might be other aspects of console accessing time need to be looked at.


Tom Mueller added a comment - 05/Jan/11 12:55 PM

Here is some timing information gathered from glassfish-sparc-1.sfbay.sun.com, which is a Niagra system (SPARC, Solaris 10, 64 threads).

After starting the domain with no clusters, no instances, no secure admin, the first attempt to access the admin console takes 73 seconds (using a stopwatch) for the console page to display. Here are some timestamps from the log:

44:15 Admin Console is installing
44:17 Admin Console is already installed, but not yet loaded
44:21 Created HTTP listener (first of several messages associated with web container startup)
44:31 Initializing Mojarra 2.1.0 (SNAPSHOT b09)
44:37 Listening to REST requests
44:45 Loading application [__admingui]
44:45 Loading application __admingui done in 27,304 ms
45:04 Admin Console: Initializing Session Attributes...

The log messages do not account for all of the 73 seconds.

The next test consists of creating a cluster with 30 local instances, starting those instances, restarting the DAS, and then accessing the admin console. The time to access the admin console the first time after the DAS restart is 68 seconds. Again, the times from the log messages:

09:14 Admin Console is already installed, but not yet loaded
09:18 Created HTTP listener (first of several messages associated with web container startup)
09:33 Initializing Mojarra 2.1.0 (SNAPSHOT b09)
09:36 Listening to REST requests
09:43 Loading application [__admingui]
09:43 Loading application __admingui done in 28,819 ms
10:07 Admin Console: Initializing Session Attributes...

The next test consists of stopping the cluster, running enable-secure-admin, starting the cluster, restarting the DAS, and then checking the time needed to load the admin console. This time, the admin console loaded in 89 seconds. Again, the times from the log messages:

41:15 Admin Console is already installed, but not yet loaded
41:18 Created HTTP listener (first of several messages associated with web container startup)
41:29 Initializing Mojarra 2.1.0 (SNAPSHOT b09)
41:41 Loading application [__admingui]
41:41 Loading application __admingui done in 26,213 ms
42:04 Listening to REST requests
42:18 Admin Console: Initializing Session Attributes...

Note that in this test, the Loading application __admingui message came before the Listening to REST requests. In the other tests, they came in the other order.


Hong Zhang added a comment - 05/Jan/11 02:03 PM

Tom: thanks for measuring the numbers on a Nigara system. I don't know about the particular order of the log messages, Anissa might have some insights on this. From your measurements, the GUI application loading times seems to be consistent with what Jason and I have seen, no signifant difference between DAS-only and cluster scecenarios, falling in 27.3 + 1.5 second range. Harshad has re-measured the numbers with the tracing turned on today, and his numbers are still pretty different for DAS-only and cluster scenario 7.119 vs 41.302 (see more details in issue 15393). From the finer tracing messages in the server.log, each step just takes much longer especially the container initialization and component loading. Do you have some idea what might be different? I have asked him to share more details on his set up.


Tom Mueller added a comment - 05/Jan/11 02:38 PM

One thing to check is the memory on the system. If it is memory starved, everything is going to take longer, and if several local instances are started, that will consume even more memory.
The Niagra system I tested had plenty of memory (64GB or something like that).


Nazrul added a comment - 11/Jan/11 01:19 AM

This problem exist on one specific platform. Excluding from 3.1 count. Lets look at this during 3.2.


scatari added a comment - 03/Jun/11 02:49 PM - edited

Issue isolated to the particular machine. Cannot reproduce anywhere else.


Hong Zhang added a comment - 18/Oct/11 04:00 PM

Closing the old issue as it was only reproducible on the particular sqe machine.