glassfish
  1. glassfish
  2. GLASSFISH-15592

[STRESS] Slow Memory growth observed over 24x7.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1_b37
    • Fix Version/s: 3.1_b40
    • Component/s: failover
    • Labels:
      None

      Description

      Details of the scenario are in the parent issue for this bug: http://java.net/jira/browse/GLASSFISH-15423

      The re-run of this RichAccess 24x7 scenario with build 37 shows slow but evident memory growth. One of the instance's (instance101) jmap (-histo:live) logs were taken every 20 minutes. Three of those are attached here and so is the CPU/Mem plot. The jmap -dump and the -histo:lie logs will be sent directly to Mahesh.

      The jmap -histo:live log files from the instance indicate a slow rise in number/size the following data structures:

      • org.shoal.ha.cache.impl.store.DataStoreEntry
      • java.util.concurrent.ConcurrentHashMap$HashEntry
      • com.sun.ejb.base.sfsb.util.SimpleKeyGenerator$SimpleSessionKey

      The observed memory growth is perhaps slow due to the low number of simultaneous users (100 per instance).

      1. instance101-jmap-live.out-1
        375 kB
        varunrupela
      2. instance101-jmap-live.out-34
        374 kB
        varunrupela
      3. instance101-jmap-live.out-67
        375 kB
        varunrupela
      1. cpu-mem.jpg
        267 kB

        Issue Links

          Activity

          varunrupela created issue -
          varunrupela made changes -
          Field Original Value New Value
          Attachment instance101-jmap-live.out-1 [ 27627 ]
          Attachment instance101-jmap-live.out-34 [ 27628 ]
          Attachment instance101-jmap-live.out-67 [ 27629 ]
          Description Details of the scenario are in the parent issue for this bug: http://java.net/jira/browse/GLASSFISH-15423

          The re-run of this RichAccess 24x7 scenario with build 37 shows slow but evident memory growth. One of the instance's (instance101) jmap (-histo:live) logs were taken every 20 minutes. CPU/Mem plot of the run is attached. The jmap -dump and the -histo:lie logs will be sent directly to Mahesh.

          The jmap -histo:live log files from the instance indicate a slow rise in number/size the following data structures:
          - org.shoal.ha.cache.impl.store.DataStoreEntry
          - java.util.concurrent.ConcurrentHashMap$HashEntry
          - com.sun.ejb.base.sfsb.util.SimpleKeyGenerator$SimpleSessionKey

          The observed memory growth is perhaps slow due to the low number of simultaneous users (100 per instance).
          Details of the scenario are in the parent issue for this bug: http://java.net/jira/browse/GLASSFISH-15423

          The re-run of this RichAccess 24x7 scenario with build 37 shows slow but evident memory growth. One of the instance's (instance101) jmap (-histo:live) logs were taken every 20 minutes. Three of those are attached here and so is the CPU/Mem plot. The jmap -dump and the -histo:lie logs will be sent directly to Mahesh.

          The jmap -histo:live log files from the instance indicate a slow rise in number/size the following data structures:
          - org.shoal.ha.cache.impl.store.DataStoreEntry
          - java.util.concurrent.ConcurrentHashMap$HashEntry
          - com.sun.ejb.base.sfsb.util.SimpleKeyGenerator$SimpleSessionKey

          The observed memory growth is perhaps slow due to the low number of simultaneous users (100 per instance).
          varunrupela made changes -
          Link This issue blocks GLASSFISH-15423 [ GLASSFISH-15423 ]
          Hide
          Mahesh Kannan added a comment -

          Varun,
          Did we see the growth on b36? There were no major changes in the replication module since b36.
          Can you point me to the server logs? One possibility is that instance1 could be acting as replica for more keys than the other two instances.

          FYI, SimpleSessionKeys are used as Keys for EJBs.

          Show
          Mahesh Kannan added a comment - Varun, Did we see the growth on b36? There were no major changes in the replication module since b36. Can you point me to the server logs? One possibility is that instance1 could be acting as replica for more keys than the other two instances. FYI, SimpleSessionKeys are used as Keys for EJBs.
          Nazrul made changes -
          Component/s failover [ 10629 ]
          Hide
          varunrupela added a comment -

          Checked all the older runs, there was only one run with nightly build from dec 21 that showed a really really small hint of memory growth. No other runs show it.

          This issue is being reported with build 37 and on Windows 2008. The other run with build 37 on OEL does not show a memory growth.

          Logs location is being sent by e-mail to you.

          Show
          varunrupela added a comment - Checked all the older runs, there was only one run with nightly build from dec 21 that showed a really really small hint of memory growth. No other runs show it. This issue is being reported with build 37 and on Windows 2008. The other run with build 37 on OEL does not show a memory growth. Logs location is being sent by e-mail to you.
          Hide
          Nazrul added a comment -

          Any update on the memory leak?

          Show
          Nazrul added a comment - Any update on the memory leak?
          Hide
          Mahesh Kannan added a comment -

          I looked into the two jmap s (using jhat -baseline <first map> <second map>) and it looks like the EJB references are slowly leaking. All these are residing in the ReplicaStore.

          There are two reasons why they could be leaking in instance 1

          1. instance 1 could be acting as replica instance for more keys than other two instances (though this means a non-uniform key distribution)

          2. The save commands could be arriving after remove commands. One way to handle this is to maintain a set that contains keys that were removed in the recent past (say 5 minutes?). Then, if the save commands arrive out of order we could throw away those that arrive remove commands. Obviously, this can be implemented only in 3.2. The other option is to rely of ejb idle processor to remove unused keys. Maybe running the longivity test with a lower re-move-timeout-in-sceonds might eliminate this slow growth.

          Show
          Mahesh Kannan added a comment - I looked into the two jmap s (using jhat -baseline <first map> <second map>) and it looks like the EJB references are slowly leaking. All these are residing in the ReplicaStore. There are two reasons why they could be leaking in instance 1 1. instance 1 could be acting as replica instance for more keys than other two instances (though this means a non-uniform key distribution) 2. The save commands could be arriving after remove commands. One way to handle this is to maintain a set that contains keys that were removed in the recent past (say 5 minutes?). Then, if the save commands arrive out of order we could throw away those that arrive remove commands. Obviously, this can be implemented only in 3.2. The other option is to rely of ejb idle processor to remove unused keys. Maybe running the longivity test with a lower re-move-timeout-in-sceonds might eliminate this slow growth.
          Nazrul made changes -
          Tags 3_1-review
          Hide
          Nazrul added a comment -

          Tracking bug at this point. Excluding from un-scrubbed list

          Show
          Nazrul added a comment - Tracking bug at this point. Excluding from un-scrubbed list
          Hide
          Mahesh Kannan added a comment -

          Elena could run richAccess for 4 days without any memory leaks. The jvm though crashed. I believe there is a separate issue for that.

          I am marking this issue as resolved. Please reopen if you see this issue again.

          Show
          Mahesh Kannan added a comment - Elena could run richAccess for 4 days without any memory leaks. The jvm though crashed. I believe there is a separate issue for that. I am marking this issue as resolved. Please reopen if you see this issue again.
          Mahesh Kannan made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.1_b40 [ 14576 ]
          Resolution Fixed [ 1 ]
          Hide
          Mahesh Kannan added a comment -

          If you see a growth, please rerun the app with the following system property -Dorg.shoal.ha.cache.mbean.register=true

          Show
          Mahesh Kannan added a comment - If you see a growth, please rerun the app with the following system property -Dorg.shoal.ha.cache.mbean.register=true

            People

            • Assignee:
              Mahesh Kannan
              Reporter:
              varunrupela
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: