Issue Details (XML | Word | Printable)

Key: GLASSFISH-19107
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: jwells
Reporter: Scott Oaks
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
glassfish

[PERF] Large Regression in DomainXml parsing

Created: 25/Sep/12 06:44 PM   Updated: 22/Feb/13 04:30 PM   Resolved: 18/Feb/13 02:56 PM
Component/s: hk2
Affects Version/s: 4.0_b55
Fix Version/s: 4.0

Time Tracking:
Not Specified

Issue Links:
Dependency
 

Tags: PSRBUG devx_web
Participants: jwells, Masoud Kalali, Scott Oaks and Tom Mueller


 Description  « Hide

Domain start time has regressed significantly on all platforms since around build 50 (late August). Profiles show that the largest part of the regression comes from org.glassfish.config.support.DomainXml.parseDomainXml – this used to take a less than 100 ms in previous builds and now takes half a second. The size of the domain.xml is larger in later builds, but even if we substitute the earlier domain.xml, parsing still takes half a second, so this seems to be related to the changes that were made for HK2 around the end of August.



Tom Mueller added a comment - 25/Sep/12 07:07 PM

Masoud, might this be related to config modularity?


Scott Oaks added a comment - 25/Sep/12 08:02 PM

Also, deployment and redeployment have severely regressed because of time spent in com.sun.enterprise.v3.server.ApplicationLifecycle.registerAppInDomain – and in particular in org.jvnet.hk2.config.Transaction.commit to write out the new domain.xml. I am assuming these are both related to however the XML is being parsed and handled and will have the same root cause.


Masoud Kalali added a comment - 25/Sep/12 09:42 PM

The config modularity has no effect on parseDomainXml method and thus the issue cannot be of the config modularity. Looking at the annotations the only changes in DomainXml in late august has been made by John. The org.jvnet.hk2.config.Transaction.commit is at HK2 level so the bug should be discussed with HK2 team.


Scott Oaks added a comment - 25/Sep/12 10:00 PM

Yes, that makes sense to me; I originally put this in the HK2 category but it got reassigned to Tom. I will re-categorize it and explicitly assign it to John.


jwells added a comment - 25/Sep/12 10:10 PM

I have added a caching mechanism into hk2 on Sept. 21. It may or may not affect this case. At the very least though, we will need to see if it does affect the case (enough). I have my doubts the cache will be that effective in this scenario. If it does not in fact help, I'll also need some more information about where the time is being spent.


Scott Oaks added a comment - 25/Sep/12 10:26 PM

The same caching that affects the ServiceLocatorImpl.getAllServiceHandles (Glassfish-18987)? That would seem unrelated, but we will test when your changes are integrated. When will that version of HK2 be integrated into a glassfish build?


jwells added a comment - 26/Sep/12 12:04 AM

It is there now.


Scott Oaks added a comment - 26/Sep/12 03:48 PM

In last night's nightly build, the parsing regressions are still present (no change at all).

I will send pointers to the profiles separately. To summarize, in parseDomainXML, the time breaks down 65% to org.jvnet.hjk2.config.ConfigParser.handleElement (though a long series of recursive calls) and 29% to DomDocument.getModelByElementName (though a series of recursive calls to DomDocument.buildModel). In previous builds, the recursion chain for handleElement is much shorter and the entire sequences takes 1/100th the time as current builds (well, the profiling overhead will have affected that to some degree...); previous builds also show no recursioon to buildModel. So either the recursion chains here are for some reason much longer, or the methods themselves are taking longer and the sampling catches the recursion now that it didn't used to catch.

In the Transaction.commit(), the bulk of the time (44%) is in DescriptorImpl.getRanking, which is used by TreeMap.put; that entire chain of creating the IndexedListData.getSortedList is 62% of the time. 11% of the time is spent in ServiceLocatorImpl.validate, 5% in ServiceLocatorUtilities.addOneDescriptor, and 5% in Transaction.canCommit.


Scott Oaks added a comment - 16/Oct/12 05:41 PM

I have done some additional analysis on the parsing.

It turns out, as I thought might be the case, that the recursion chain itself is about the same length (actually slightly shorter in GF 4.0), but that the operations in the recursion leaves are longer, which leads to the regression. Specifically, the difference is in DomDocument.buildModel(String s). In 3.1.2 (HK 2.1.14), that would call habitat.getInhabitantByAnnotation(s). On 4.0 (HK 2.2.N), that calls habitat.getBestDescriptor(s) and then Utilities.getInhabitantFromActiveDescriptor.

Some rudimentary timing with System.nanoTime yields that the average time per call for getInabitantByAnnotation is 550 microseconds, but the average time for the two calls in 4.x is 3.2 milliseconds – a 6x difference. And unfortunately while the difference in absolute terms seems very small, the code path is called about 190 times while parsing the domain.xml, and the net result is the half-second regression.


jwells added a comment - 17/Oct/12 02:07 PM

I've checked in a change to DomDocument.buildModel which should fix the performance difference in most cases. This will come into GlassFish after 2.1.39 is accepted by GlassFish (which could take a couple of days).


Tom Mueller added a comment - 15/Feb/13 09:42 PM

John, did the change you check in resolve this problem? If so, please mark this issue as resolved. Otherwise, please resolve the problem.

Thanks.
Tom


Scott Oaks added a comment - 15/Feb/13 09:50 PM

I have just verified that the parseDomainXML performance has returned to 3.1.2 levels.


jwells added a comment - 18/Feb/13 02:56 PM

Since this regression is back to 3.1 levels I am closing this bug!