sailfin
  1. sailfin
  2. SAILFIN-1731

Slow but Steady memory growth [ST Subscribe-Refresh with failure injection]

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Incomplete
    • Affects Version/s: 2.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: Linux
      Platform: Other

      Description

      Parent Issue 1727.

      See parent issue for basic scenario steps.

      Time Line:
      2009-04-22T17:06:02 - Traffic started
      2009-04-22T21:06:56 - FAILURE_EVENT detected (failure inject a few seconds
      before that on instance110)

      • Test was allowed to run for 24x1
      • The failed instance was NOT restarted.
      • Steady Memory growth was observed on all other instances after failure.

      Logs Location:
      /net/asqe-logs.sfbay.sun.com//export1/SailFin/Results/ST/2.0/subscribe-refresh-failure/fail/b10-1failures-mem-growth/

      top, jmap -histo and jstack logs (collected at different times) are available
      for each instance. Unzip the instance<x>-sift-agent-logs.zip file under each
      instance directory.

        Issue Links

          Activity

          Hide
          varunrupela added a comment -

          added keyword and dependency

          Show
          varunrupela added a comment - added keyword and dependency
          Hide
          Scott Oaks added a comment -

          I don't see any memory leak in the referenced logs.

          The thing that would actually show a memory leak is the gc.log file for each
          instance – in particular, for lines of this format:
          2009-04-22T17:04:54.270+0530: 29.593: [GC 29.593: [ParNew: 64640K->0K(65088K),
          0.0240510 secs] 139590K->80009K(6143552K), 0.0243850 secs] [Times: user=0.06
          sys=0.00, real=0.02 secs]

          The amount of space the heap is occupying is the 80009K number. There is a jump
          when the instance fails, of course, but that is to be expected. But, for example
          in the instance101 gc.log, that value settles down and fluctuates slightly
          around 3900000K. I will attach the graph that the JDK team provided me showing that.

          The jmap output doesn't show anything much different. I'm not sure if the jmap
          output was taken with the live option or not; without the live option it is a
          little difficult to know if there is actually a leak or not. But of course,
          using the live option under load will create some errors during the GC pause.

          At any rate, if I look at the number of [B objects from instance101, it ranges
          from 4511249 objects to 4629249 objects, but the numbers are not monotonically
          increasing. That's true for all the key objects (ReplicationState in particular
          being the SSR-related one of interest) in all the instances. They go up a
          little, they go down a little, but they don't seem to be monotonically increasing.

          Show
          Scott Oaks added a comment - I don't see any memory leak in the referenced logs. The thing that would actually show a memory leak is the gc.log file for each instance – in particular, for lines of this format: 2009-04-22T17:04:54.270+0530: 29.593: [GC 29.593: [ParNew: 64640K->0K(65088K), 0.0240510 secs] 139590K->80009K(6143552K), 0.0243850 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] The amount of space the heap is occupying is the 80009K number. There is a jump when the instance fails, of course, but that is to be expected. But, for example in the instance101 gc.log, that value settles down and fluctuates slightly around 3900000K. I will attach the graph that the JDK team provided me showing that. The jmap output doesn't show anything much different. I'm not sure if the jmap output was taken with the live option or not; without the live option it is a little difficult to know if there is actually a leak or not. But of course, using the live option under load will create some errors during the GC pause. At any rate, if I look at the number of [B objects from instance101, it ranges from 4511249 objects to 4629249 objects, but the numbers are not monotonically increasing. That's true for all the key objects (ReplicationState in particular being the SSR-related one of interest) in all the instances. They go up a little, they go down a little, but they don't seem to be monotonically increasing.
          Hide
          Scott Oaks added a comment -

          Created an attachment (id=1005)
          GC Graph

          Show
          Scott Oaks added a comment - Created an attachment (id=1005) GC Graph
          Hide
          varunrupela added a comment -

          Can you take a look also at the top output that is attached.
          The resident memory (shown by top) continues to grow for the entire duration of
          the test after failure is injected. The growth is faster around the failure and
          then slows down. The resident memory grows up to 6.0g while the maximum memory
          usage shown by gc is around 4322285k. Is this indicative of fragmentation ?

          Show
          varunrupela added a comment - Can you take a look also at the top output that is attached. The resident memory (shown by top) continues to grow for the entire duration of the test after failure is injected. The growth is faster around the failure and then slows down. The resident memory grows up to 6.0g while the maximum memory usage shown by gc is around 4322285k. Is this indicative of fragmentation ?
          Hide
          Scott Oaks added a comment -

          The resident memory is just what is loaded into physical memory. If there were a
          memory leak, it would show up in the virtual memory – the memory allocated to
          the process. Conceivably, there could be a native memory leak there that
          wouldn't show up in the heap stuff I looked at.

          But the virtual memory of the process is quite stable at around 6651m –
          sometimes it goes down to 6649m and sometimes it goes up to 6657m. That doesn't
          imply a leak to me either.

          Show
          Scott Oaks added a comment - The resident memory is just what is loaded into physical memory. If there were a memory leak, it would show up in the virtual memory – the memory allocated to the process. Conceivably, there could be a native memory leak there that wouldn't show up in the heap stuff I looked at. But the virtual memory of the process is quite stable at around 6651m – sometimes it goes down to 6649m and sometimes it goes up to 6657m. That doesn't imply a leak to me either.
          Hide
          Scott Oaks added a comment -

          Not an issue; both heap used and total memory used are stable.

          Show
          Scott Oaks added a comment - Not an issue; both heap used and total memory used are stable.

            People

            • Assignee:
              Scott Oaks
              Reporter:
              varunrupela
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: