shoal
  1. shoal
  2. SHOAL-75

messages not being delivered over jxta OutputPipe.send

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: 1.1
    • Component/s: GMS
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      75

      Description

      This issue was reported by shoal developer forum post at
      https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=111

      To summarize the issue, there needs to be common place added in shoal
      that checks the result of calling net.jxta.pipe.OutputPipe.send() for
      whether it returns true or false. When the method returns false, the
      caller should wait some small enough amount of time and then try to send again.
      The send returning false means the send could not be attempted due to be out of
      system resources to perform the send. Trying again will work.

      So all places that Shoal is calling OutputPipe.send() should be altered to call
      this common method.

      The forum email confirms that it is possible to get OutputPipe.send() to return
      false, thus when this happens, a message that could be delivered just does not
      get sent.

      Proposed fix is to add a method in com.sun.enterprise.jxtamanagment.JxtaUtil
      that all methods in shoal that call net.jxta.OutputPipe.send() would call so
      that the resend logic when OutputPipe.send is in one source code location.

      Here is a first pass on that method that will be tried soon.

      public static boolean sendMessage(OutputPipe pipe, PeerID peerId, Message
      message) throws IOException {
      boolean result = false;
      final int MAX_SEND_ATTEMPTS = 3; // is this right amount of retries.
      final int RETRY_DELAY = XXX; // in milliseconds find out what this should be
      result = pipe.send(message);
      int sendAttempts = 1;
      while (!result && sendAttempts <= MAX_SEND_ATTEMPTS) {
      try

      { Thread.sleep(RETRY_DELAY); }

      catch (InterruptedException ie) {
      }
      result = pipe.send(message);
      sendAttempts++;
      }
      if (!result) {
      if (LOG.isLoggable(Level.FINE))

      { final String to = peerId == null ? "<broadcast to cluster>" : peerId.toString(); LOG.fine("unable to send message " + message.toString() + " to " + to + " after " + sendAttempts); }

      }
      return result;
      }

        Activity

        Hide
        shreedhar_ganapathy added a comment -

        I think the proposed fix can be addressed by the LWRMulticast class. For p2p
        messages, the recipient list can be a set of 1 member. Not sure if it
        specifically uses a propagate pipe or a blocking wire output pipe (bwop). It
        should preferably use a bwop for reliability, retransmission and flow control.

        The retry logic within LWRMulticast should be vary of such failures as network
        failures or hardware failures of the recipient so that it can come out of the
        tcp close wait. Thus a send message operation should not be such that it would
        block for the duration of the tcp retransmission timeout and once it comes out
        of such a case, it should not retry. Such protections may be necessary to make
        it more robust.

        Show
        shreedhar_ganapathy added a comment - I think the proposed fix can be addressed by the LWRMulticast class. For p2p messages, the recipient list can be a set of 1 member. Not sure if it specifically uses a propagate pipe or a blocking wire output pipe (bwop). It should preferably use a bwop for reliability, retransmission and flow control. The retry logic within LWRMulticast should be vary of such failures as network failures or hardware failures of the recipient so that it can come out of the tcp close wait. Thus a send message operation should not be such that it would block for the duration of the tcp retransmission timeout and once it comes out of such a case, it should not retry. Such protections may be necessary to make it more robust.
        Hide
        Joe Fialli added a comment -

        fix checked into shoal trunk and integrated into sailfin communication as 1.5
        nightly

        Show
        Joe Fialli added a comment - fix checked into shoal trunk and integrated into sailfin communication as 1.5 nightly

          People

          • Assignee:
            Joe Fialli
            Reporter:
            Joe Fialli
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: