glassfish
  1. glassfish
  2. GLASSFISH-19107

[PERF] Large Regression in DomainXml parsing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0_b55
    • Fix Version/s: 4.0
    • Component/s: hk2
    • Labels:
      None

      Description

      Domain start time has regressed significantly on all platforms since around build 50 (late August). Profiles show that the largest part of the regression comes from org.glassfish.config.support.DomainXml.parseDomainXml – this used to take a less than 100 ms in previous builds and now takes half a second. The size of the domain.xml is larger in later builds, but even if we substitute the earlier domain.xml, parsing still takes half a second, so this seems to be related to the changes that were made for HK2 around the end of August.

        Issue Links

          Activity

          Hide
          Tom Mueller added a comment -

          Masoud, might this be related to config modularity?

          Show
          Tom Mueller added a comment - Masoud, might this be related to config modularity?
          Hide
          Scott Oaks added a comment -

          Also, deployment and redeployment have severely regressed because of time spent in com.sun.enterprise.v3.server.ApplicationLifecycle.registerAppInDomain – and in particular in org.jvnet.hk2.config.Transaction.commit to write out the new domain.xml. I am assuming these are both related to however the XML is being parsed and handled and will have the same root cause.

          Show
          Scott Oaks added a comment - Also, deployment and redeployment have severely regressed because of time spent in com.sun.enterprise.v3.server.ApplicationLifecycle.registerAppInDomain – and in particular in org.jvnet.hk2.config.Transaction.commit to write out the new domain.xml. I am assuming these are both related to however the XML is being parsed and handled and will have the same root cause.
          Hide
          Masoud Kalali added a comment -

          The config modularity has no effect on parseDomainXml method and thus the issue cannot be of the config modularity. Looking at the annotations the only changes in DomainXml in late august has been made by John. The org.jvnet.hk2.config.Transaction.commit is at HK2 level so the bug should be discussed with HK2 team.

          Show
          Masoud Kalali added a comment - The config modularity has no effect on parseDomainXml method and thus the issue cannot be of the config modularity. Looking at the annotations the only changes in DomainXml in late august has been made by John. The org.jvnet.hk2.config.Transaction.commit is at HK2 level so the bug should be discussed with HK2 team.
          Hide
          Scott Oaks added a comment -

          Yes, that makes sense to me; I originally put this in the HK2 category but it got reassigned to Tom. I will re-categorize it and explicitly assign it to John.

          Show
          Scott Oaks added a comment - Yes, that makes sense to me; I originally put this in the HK2 category but it got reassigned to Tom. I will re-categorize it and explicitly assign it to John.
          Hide
          jwells added a comment -

          I have added a caching mechanism into hk2 on Sept. 21. It may or may not affect this case. At the very least though, we will need to see if it does affect the case (enough). I have my doubts the cache will be that effective in this scenario. If it does not in fact help, I'll also need some more information about where the time is being spent.

          Show
          jwells added a comment - I have added a caching mechanism into hk2 on Sept. 21. It may or may not affect this case. At the very least though, we will need to see if it does affect the case (enough). I have my doubts the cache will be that effective in this scenario. If it does not in fact help, I'll also need some more information about where the time is being spent.
          Hide
          Scott Oaks added a comment -

          The same caching that affects the ServiceLocatorImpl.getAllServiceHandles (Glassfish-18987)? That would seem unrelated, but we will test when your changes are integrated. When will that version of HK2 be integrated into a glassfish build?

          Show
          Scott Oaks added a comment - The same caching that affects the ServiceLocatorImpl.getAllServiceHandles (Glassfish-18987)? That would seem unrelated, but we will test when your changes are integrated. When will that version of HK2 be integrated into a glassfish build?
          Hide
          jwells added a comment -

          It is there now.

          Show
          jwells added a comment - It is there now.
          Hide
          Scott Oaks added a comment -

          In last night's nightly build, the parsing regressions are still present (no change at all).

          I will send pointers to the profiles separately. To summarize, in parseDomainXML, the time breaks down 65% to org.jvnet.hjk2.config.ConfigParser.handleElement (though a long series of recursive calls) and 29% to DomDocument.getModelByElementName (though a series of recursive calls to DomDocument.buildModel). In previous builds, the recursion chain for handleElement is much shorter and the entire sequences takes 1/100th the time as current builds (well, the profiling overhead will have affected that to some degree...); previous builds also show no recursioon to buildModel. So either the recursion chains here are for some reason much longer, or the methods themselves are taking longer and the sampling catches the recursion now that it didn't used to catch.

          In the Transaction.commit(), the bulk of the time (44%) is in DescriptorImpl.getRanking, which is used by TreeMap.put; that entire chain of creating the IndexedListData.getSortedList is 62% of the time. 11% of the time is spent in ServiceLocatorImpl.validate, 5% in ServiceLocatorUtilities.addOneDescriptor, and 5% in Transaction.canCommit.

          Show
          Scott Oaks added a comment - In last night's nightly build, the parsing regressions are still present (no change at all). I will send pointers to the profiles separately. To summarize, in parseDomainXML, the time breaks down 65% to org.jvnet.hjk2.config.ConfigParser.handleElement (though a long series of recursive calls) and 29% to DomDocument.getModelByElementName (though a series of recursive calls to DomDocument.buildModel). In previous builds, the recursion chain for handleElement is much shorter and the entire sequences takes 1/100th the time as current builds (well, the profiling overhead will have affected that to some degree...); previous builds also show no recursioon to buildModel. So either the recursion chains here are for some reason much longer, or the methods themselves are taking longer and the sampling catches the recursion now that it didn't used to catch. In the Transaction.commit(), the bulk of the time (44%) is in DescriptorImpl.getRanking, which is used by TreeMap.put; that entire chain of creating the IndexedListData.getSortedList is 62% of the time. 11% of the time is spent in ServiceLocatorImpl.validate, 5% in ServiceLocatorUtilities.addOneDescriptor, and 5% in Transaction.canCommit.
          Hide
          Scott Oaks added a comment -

          I have done some additional analysis on the parsing.

          It turns out, as I thought might be the case, that the recursion chain itself is about the same length (actually slightly shorter in GF 4.0), but that the operations in the recursion leaves are longer, which leads to the regression. Specifically, the difference is in DomDocument.buildModel(String s). In 3.1.2 (HK 2.1.14), that would call habitat.getInhabitantByAnnotation(s). On 4.0 (HK 2.2.N), that calls habitat.getBestDescriptor(s) and then Utilities.getInhabitantFromActiveDescriptor.

          Some rudimentary timing with System.nanoTime yields that the average time per call for getInabitantByAnnotation is 550 microseconds, but the average time for the two calls in 4.x is 3.2 milliseconds – a 6x difference. And unfortunately while the difference in absolute terms seems very small, the code path is called about 190 times while parsing the domain.xml, and the net result is the half-second regression.

          Show
          Scott Oaks added a comment - I have done some additional analysis on the parsing. It turns out, as I thought might be the case, that the recursion chain itself is about the same length (actually slightly shorter in GF 4.0), but that the operations in the recursion leaves are longer, which leads to the regression. Specifically, the difference is in DomDocument.buildModel(String s). In 3.1.2 (HK 2.1.14), that would call habitat.getInhabitantByAnnotation(s). On 4.0 (HK 2.2.N), that calls habitat.getBestDescriptor(s) and then Utilities.getInhabitantFromActiveDescriptor. Some rudimentary timing with System.nanoTime yields that the average time per call for getInabitantByAnnotation is 550 microseconds, but the average time for the two calls in 4.x is 3.2 milliseconds – a 6x difference. And unfortunately while the difference in absolute terms seems very small, the code path is called about 190 times while parsing the domain.xml, and the net result is the half-second regression.
          Hide
          jwells added a comment -

          I've checked in a change to DomDocument.buildModel which should fix the performance difference in most cases. This will come into GlassFish after 2.1.39 is accepted by GlassFish (which could take a couple of days).

          Show
          jwells added a comment - I've checked in a change to DomDocument.buildModel which should fix the performance difference in most cases. This will come into GlassFish after 2.1.39 is accepted by GlassFish (which could take a couple of days).
          Hide
          Tom Mueller added a comment -

          John, did the change you check in resolve this problem? If so, please mark this issue as resolved. Otherwise, please resolve the problem.

          Thanks.
          Tom

          Show
          Tom Mueller added a comment - John, did the change you check in resolve this problem? If so, please mark this issue as resolved. Otherwise, please resolve the problem. Thanks. Tom
          Hide
          Scott Oaks added a comment -

          I have just verified that the parseDomainXML performance has returned to 3.1.2 levels.

          Show
          Scott Oaks added a comment - I have just verified that the parseDomainXML performance has returned to 3.1.2 levels.
          Hide
          jwells added a comment -

          Since this regression is back to 3.1 levels I am closing this bug!

          Show
          jwells added a comment - Since this regression is back to 3.1 levels I am closing this bug!

            People

            • Assignee:
              jwells
              Reporter:
              Scott Oaks
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: