[GLASSFISH-19107] [PERF] Large Regression in DomainXml parsing Created: 25/Sep/12  Updated: 22/Feb/13  Resolved: 18/Feb/13

Status: Closed
Project: glassfish
Component/s: hk2
Affects Version/s: 4.0_b55
Fix Version/s: 4.0

Type: Bug Priority: Critical
Reporter: Scott Oaks Assignee: jwells
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks GLASSFISH-18693 [PERF] regression in startup/deployme... Resolved
Tags: PSRBUG, devx_web

 Description   

Domain start time has regressed significantly on all platforms since around build 50 (late August). Profiles show that the largest part of the regression comes from org.glassfish.config.support.DomainXml.parseDomainXml – this used to take a less than 100 ms in previous builds and now takes half a second. The size of the domain.xml is larger in later builds, but even if we substitute the earlier domain.xml, parsing still takes half a second, so this seems to be related to the changes that were made for HK2 around the end of August.



 Comments   
Comment by Tom Mueller [ 25/Sep/12 ]

Masoud, might this be related to config modularity?

Comment by Scott Oaks [ 25/Sep/12 ]

Also, deployment and redeployment have severely regressed because of time spent in com.sun.enterprise.v3.server.ApplicationLifecycle.registerAppInDomain – and in particular in org.jvnet.hk2.config.Transaction.commit to write out the new domain.xml. I am assuming these are both related to however the XML is being parsed and handled and will have the same root cause.

Comment by Masoud Kalali [ 25/Sep/12 ]

The config modularity has no effect on parseDomainXml method and thus the issue cannot be of the config modularity. Looking at the annotations the only changes in DomainXml in late august has been made by John. The org.jvnet.hk2.config.Transaction.commit is at HK2 level so the bug should be discussed with HK2 team.

Comment by Scott Oaks [ 25/Sep/12 ]

Yes, that makes sense to me; I originally put this in the HK2 category but it got reassigned to Tom. I will re-categorize it and explicitly assign it to John.

Comment by jwells [ 25/Sep/12 ]

I have added a caching mechanism into hk2 on Sept. 21. It may or may not affect this case. At the very least though, we will need to see if it does affect the case (enough). I have my doubts the cache will be that effective in this scenario. If it does not in fact help, I'll also need some more information about where the time is being spent.

Comment by Scott Oaks [ 25/Sep/12 ]

The same caching that affects the ServiceLocatorImpl.getAllServiceHandles (Glassfish-18987)? That would seem unrelated, but we will test when your changes are integrated. When will that version of HK2 be integrated into a glassfish build?

Comment by jwells [ 26/Sep/12 ]

It is there now.

Comment by Scott Oaks [ 26/Sep/12 ]

In last night's nightly build, the parsing regressions are still present (no change at all).

I will send pointers to the profiles separately. To summarize, in parseDomainXML, the time breaks down 65% to org.jvnet.hjk2.config.ConfigParser.handleElement (though a long series of recursive calls) and 29% to DomDocument.getModelByElementName (though a series of recursive calls to DomDocument.buildModel). In previous builds, the recursion chain for handleElement is much shorter and the entire sequences takes 1/100th the time as current builds (well, the profiling overhead will have affected that to some degree...); previous builds also show no recursioon to buildModel. So either the recursion chains here are for some reason much longer, or the methods themselves are taking longer and the sampling catches the recursion now that it didn't used to catch.

In the Transaction.commit(), the bulk of the time (44%) is in DescriptorImpl.getRanking, which is used by TreeMap.put; that entire chain of creating the IndexedListData.getSortedList is 62% of the time. 11% of the time is spent in ServiceLocatorImpl.validate, 5% in ServiceLocatorUtilities.addOneDescriptor, and 5% in Transaction.canCommit.

Comment by Scott Oaks [ 16/Oct/12 ]

I have done some additional analysis on the parsing.

It turns out, as I thought might be the case, that the recursion chain itself is about the same length (actually slightly shorter in GF 4.0), but that the operations in the recursion leaves are longer, which leads to the regression. Specifically, the difference is in DomDocument.buildModel(String s). In 3.1.2 (HK 2.1.14), that would call habitat.getInhabitantByAnnotation(s). On 4.0 (HK 2.2.N), that calls habitat.getBestDescriptor(s) and then Utilities.getInhabitantFromActiveDescriptor.

Some rudimentary timing with System.nanoTime yields that the average time per call for getInabitantByAnnotation is 550 microseconds, but the average time for the two calls in 4.x is 3.2 milliseconds – a 6x difference. And unfortunately while the difference in absolute terms seems very small, the code path is called about 190 times while parsing the domain.xml, and the net result is the half-second regression.

Comment by jwells [ 17/Oct/12 ]

I've checked in a change to DomDocument.buildModel which should fix the performance difference in most cases. This will come into GlassFish after 2.1.39 is accepted by GlassFish (which could take a couple of days).

Comment by Tom Mueller [ 15/Feb/13 ]

John, did the change you check in resolve this problem? If so, please mark this issue as resolved. Otherwise, please resolve the problem.

Thanks.
Tom

Comment by Scott Oaks [ 15/Feb/13 ]

I have just verified that the parseDomainXML performance has returned to 3.1.2 levels.

Comment by jwells [ 18/Feb/13 ]

Since this regression is back to 3.1 levels I am closing this bug!

Generated at Fri Feb 27 04:25:02 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.