glassfish
  1. glassfish
  2. GLASSFISH-18987

[PERF] ServiceHandle iterator has huge performance regression compared to injection

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0_b48
    • Fix Version/s: 4.0
    • Component/s: hk2
    • Labels:
      None

      Description

      In our EJB benchmarks, we are presenting spending about 15% of our time in InvocationManagerImpl.preInvoke and postInvoke. Virtually all of this time is spent waiting for a lock in org.jvnet.hk2.internal.ServiceLocatorImpl.getAllServiceHandles as it iterates through the component invocation handlers.

      Presumably the service handles don't change frequently and this lock could be changed to a read-write lock? Unless they don't change at all and we could just initialize once...

        Activity

        Hide
        jwells added a comment -

        I am closing this because we have fixed the specific issue described in this bug. If there are other instances of the old hk2 using an [] as an injection point that we can fix lets open other bugs about the specific issues.

        Show
        jwells added a comment - I am closing this because we have fixed the specific issue described in this bug. If there are other instances of the old hk2 using an [] as an injection point that we can fix lets open other bugs about the specific issues.
        Hide
        jwells added a comment -

        I've done two of the three things suggested above.

        1. pre-calculate hashCode and construct the CacheKey outside the lock
        2. Modified BaseContainer to only look up the service in the constructor

        The last thing (modifying SerialContext to use a static cache) does not seem right to me, as it is possible these services are called in a different context in that case and hence these services may be different on different calls.

        Item #1 has not yet been merged into GlassFish. #2 is a GlassFish change. I'd like to see how the numbers look after #1 has been merged back into GlassFish. I'll work on getting that merged back in today.

        Show
        jwells added a comment - I've done two of the three things suggested above. 1. pre-calculate hashCode and construct the CacheKey outside the lock 2. Modified BaseContainer to only look up the service in the constructor The last thing (modifying SerialContext to use a static cache) does not seem right to me, as it is possible these services are called in a different context in that case and hence these services may be different on different calls. Item #1 has not yet been merged into GlassFish. #2 is a GlassFish change. I'd like to see how the numbers look after #1 has been merged back into GlassFish. I'll work on getting that merged back in today.
        Hide
        Scott Oaks added a comment -

        We are looking at some different test cases, which are showing us essentially the same thing but in a different context: ServiceLocatorImpl.getService() is proving to be far slower than Habitat.getComponent().

        For example, if we create and remove stateful beans in 4.0, there is about a 5-10% regression vs. 3.1.2. This all tracks back to extra time spent in getService() vs getComponent(). In this particular use case, the calls to getService come from four locations: two calls in com.sun.enterprise.naming.impl.SerialContext.<init> (the first to retrieve the ProcessEnvironment, the second to retrieve the common class loader), and two calls in com.sun.ejb.containers.BaseContainer – one in createEjbInstanceAndContext and one in injectEjbInstance, each of which is retrieving the JCDI service.

        Part of the regression is that there is still some lock contention on the service LRU cache. The remaining lock contention could be somewhat lessened if the CacheKey were constructed outside of holding the lock, and if the constructor for the CacheKey generated the hashcode and stored it in an instance variable to be returned by the hashCode() method. I did that and shaved the regression down by 33% (to 6.6%). [Though to be clear, this is a pretty focused micro-benchmark, and lock contention points are greatly exaggerated by it.]

        I changed the glassfish code for the four calls in question so that the first call saved the service as a static and subsequent calls used that; that eliminted the remaining regression for this test case. I'm still not quite clear on the semantics here; I guess that since these are not injected it is not similar to the InvocationManagerImpl case, and possibly in this case it isn't legal to cache the service retrieval like this. Still, the root of the iterator case seems to be this as well: the new, somewhat slower calls to the service locator. [Though if I'm wrong and it should be a different bug, let me know and I'll open another one.]

        Show
        Scott Oaks added a comment - We are looking at some different test cases, which are showing us essentially the same thing but in a different context: ServiceLocatorImpl.getService() is proving to be far slower than Habitat.getComponent(). For example, if we create and remove stateful beans in 4.0, there is about a 5-10% regression vs. 3.1.2. This all tracks back to extra time spent in getService() vs getComponent(). In this particular use case, the calls to getService come from four locations: two calls in com.sun.enterprise.naming.impl.SerialContext.<init> (the first to retrieve the ProcessEnvironment, the second to retrieve the common class loader), and two calls in com.sun.ejb.containers.BaseContainer – one in createEjbInstanceAndContext and one in injectEjbInstance, each of which is retrieving the JCDI service. Part of the regression is that there is still some lock contention on the service LRU cache. The remaining lock contention could be somewhat lessened if the CacheKey were constructed outside of holding the lock, and if the constructor for the CacheKey generated the hashcode and stored it in an instance variable to be returned by the hashCode() method. I did that and shaved the regression down by 33% (to 6.6%). [Though to be clear, this is a pretty focused micro-benchmark, and lock contention points are greatly exaggerated by it.] I changed the glassfish code for the four calls in question so that the first call saved the service as a static and subsequent calls used that; that eliminted the remaining regression for this test case. I'm still not quite clear on the semantics here; I guess that since these are not injected it is not similar to the InvocationManagerImpl case, and possibly in this case it isn't legal to cache the service retrieval like this. Still, the root of the iterator case seems to be this as well: the new, somewhat slower calls to the service locator. [Though if I'm wrong and it should be a different bug, let me know and I'll open another one.]
        Hide
        jwells added a comment -

        So I looked at the old old version of InvocationManagerImpl and realized that in the olden days HK2 used to inject a static array that would not change based on changes to the registry. Hence, before we moved to the new HK2 this code NEVER actually did an HK2 operation (and hence would never have picked up dynamic changes, but hey, that's the way the code used to work...)

        So I am going to make yet another change to InvocationManagerImpl to work more like the old way it worked, which should remove that entire regression (at least any part of it that comes from hk2).

        However, since you renamed the Bug, and since I guess that this use case is still somewhat slower than the old hk2 I guess we can leave this bug open...

        Show
        jwells added a comment - So I looked at the old old version of InvocationManagerImpl and realized that in the olden days HK2 used to inject a static array that would not change based on changes to the registry. Hence, before we moved to the new HK2 this code NEVER actually did an HK2 operation (and hence would never have picked up dynamic changes, but hey, that's the way the code used to work...) So I am going to make yet another change to InvocationManagerImpl to work more like the old way it worked, which should remove that entire regression (at least any part of it that comes from hk2). However, since you renamed the Bug, and since I guess that this use case is still somewhat slower than the old hk2 I guess we can leave this bug open...
        Hide
        Scott Oaks added a comment -

        Updating the subject since we now understand that the regression is in the iteration itself and not just contention.

        I've run some tests with the new InvocationManagerImpl, and as we would expect, it has brought down the regression by about 50% since we make only half as many calls to the iterator.

        On some newer tests, we started with a 42% regression from this; it will be down to 21% when we test the promoted build with this fix.

        Show
        Scott Oaks added a comment - Updating the subject since we now understand that the regression is in the iteration itself and not just contention. I've run some tests with the new InvocationManagerImpl, and as we would expect, it has brought down the regression by about 50% since we make only half as many calls to the iterator. On some newer tests, we started with a 42% regression from this; it will be down to 21% when we test the promoted build with this fix.

          People

          • Assignee:
            Scott Oaks
            Reporter:
            Scott Oaks
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: