sailfin
  1. sailfin
  2. SAILFIN-1195

Too many open sockets with replication enabled

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.0
    • Fix Version/s: milestone 1
    • Component/s: session_replication
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      We have a SailFin cluster with 10 instances. We are running some tests with SSR
      enabled. After about 12 hours when I check the tcp connections on the instances
      I see a lot of GMS connections(netstat -an | grep 9701). I see about 150 - 200
      connections between instances.

      On the DAS machine there are about 50 connections. DAS has about 3 - 6
      connections with each instance.

      Attaching the netstat output from DAS and an instance.

      1. das-netstat.out
        4 kB
        sonymanuel
      2. histo.out
        310 kB
        sonymanuel
      3. instance1-netstat.out
        14 kB
        sonymanuel

        Activity

        Hide
        Joe Fialli added a comment -

        There are 230 TcpMessengers in the jmap heap dump.
        This matches closely to the 228 Tcp Sockets at port 9701 in this report.

        Approximately half were accounted for in my previous comments.

        The largest group of unaccounted for instances are 117 TcpMessengers
        referenced by net.jxta.impl.endpoint.BlockingMessenger$1. 115 of the
        BlockingMessenger are referenced by java.util.TimerTask. I did not
        know what to make of these so I had not added them to the report.
        These probably are related to input pipes but I only found 64 InputPipes.
        (jxta bidi pipe consists of both an output and an input pipe).

        Show
        Joe Fialli added a comment - There are 230 TcpMessengers in the jmap heap dump. This matches closely to the 228 Tcp Sockets at port 9701 in this report. Approximately half were accounted for in my previous comments. The largest group of unaccounted for instances are 117 TcpMessengers referenced by net.jxta.impl.endpoint.BlockingMessenger$1. 115 of the BlockingMessenger are referenced by java.util.TimerTask. I did not know what to make of these so I had not added them to the report. These probably are related to input pipes but I only found 64 InputPipes. (jxta bidi pipe consists of both an output and an input pipe).
        Hide
        lwhite added a comment -

        Since this issue is not blocking anything and there is
        no resource leak, we are downgrading this to P3.

        Show
        lwhite added a comment - Since this issue is not blocking anything and there is no resource leak, we are downgrading this to P3.
        Hide
        Mahesh Kannan added a comment -

        Added to CC

        Show
        Mahesh Kannan added a comment - Added to CC
        Hide
        lwhite added a comment -

        Based on discussions including consultation
        with Sriram we are down-grading this issue to P4.

        the approach to this issue is to look to downgrade or close it.
        the number of sockets used by GMS and replication is known.
        When the issue was initially filed there was a GMS issue that resulted
        in too more sockets than GMS expected to use. This was fixed a long
        time back.
        There is no clear definition of "too many", nor exit criteria associated with it.
        There is a commitment to watch out for a socket resource leak though
        none has ever been reported.

        Show
        lwhite added a comment - Based on discussions including consultation with Sriram we are down-grading this issue to P4. the approach to this issue is to look to downgrade or close it. the number of sockets used by GMS and replication is known. When the issue was initially filed there was a GMS issue that resulted in too more sockets than GMS expected to use. This was fixed a long time back. There is no clear definition of "too many", nor exit criteria associated with it. There is a commitment to watch out for a socket resource leak though none has ever been reported.
        Hide
        lwhite added a comment -

        This issue has been on watch for some time
        and the problem has not been observed for a long
        time - many weeks. So we are closing it as
        WORKSFORME.

        Show
        lwhite added a comment - This issue has been on watch for some time and the problem has not been observed for a long time - many weeks. So we are closing it as WORKSFORME.

          People

          • Assignee:
            lwhite
            Reporter:
            sonymanuel
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: