glassfish
  1. glassfish
  2. GLASSFISH-7059

no exception thrown to client when error occurs while sending big data packet

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: v2.1
    • Fix Version/s: 4.0
    • Component/s: orb
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      7,059

      Description

      While sending a big data packet (250 MBytes)from an EJB client running in the
      ACC, I got the following message outputted in my window.
      No exception was thrown to the client code, so it was unaware that the IIOP
      request failed.
      Second problem is the unsubstituted

      {4} in the IOP00410227 message.

      [java] 2008/02/23 11:17:24
      com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl write
      [java] 致命的: "IOP00410227: (COMM_FAILURE) Unexpected exception when writing
      with a temporary selector: bytes written = 0, total bytes requested to write =
      4,096, time spent waiting = 7,280 ms, max time to wait = {4}

      ."
      [java] org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 227 completed: No
      [java] at
      com.sun.corba.ee.impl.logging.ORBUtilSystemException.exceptionWhenWritingWithTemporarySelector(ORBUtilSystemException.java:3416)

      [java] at
      com.sun.corba.ee.impl.logging.ORBUtilSystemException.exceptionWhenWritingWithTemporarySelector(ORBUtilSystemException.java:3442)

      [java] at
      com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl.write(SocketOrChannelConnectionImpl.java:741)

      [java] at
      com.sun.corba.ee.impl.encoding.CDROutputObject.writeTo(CDROutputObject.java:196)

      1. remarshalExc.patch
        3 kB
        Dies Koper
      2. stacktrace-client.txt
        27 kB
        kurti
      3. stacktrace-server.txt
        12 kB
        kurti

        Activity

        Dies Koper created issue -
        Hide
        Dies Koper added a comment -

        Created an attachment (id=2221)
        proposed patch (based on V3 src from Kenai)

        Show
        Dies Koper added a comment - Created an attachment (id=2221) proposed patch (based on V3 src from Kenai)
        Hide
        Dies Koper added a comment -

        Sometimes when I try to reproduce the error I get an IOP00410225 (exceeded TCP
        timeout) instead, but the problem seems to be as follows:

        It tries to marshal the 250MByte array, fails, retries, fails again and throws a
        RemarshalException. The exception is not propagated properly.

        The patch also fixes the trivial problem with the final argument in the message.

        Show
        Dies Koper added a comment - Sometimes when I try to reproduce the error I get an IOP00410225 (exceeded TCP timeout) instead, but the problem seems to be as follows: It tries to marshal the 250MByte array, fails, retries, fails again and throws a RemarshalException. The exception is not propagated properly. The patch also fixes the trivial problem with the final argument in the message.
        Hide
        Ken Cavanaugh added a comment -

        The patch for the exception messages is fine; I think
        the error may have been left over from an earlier fix
        that took care of a problem with lack of exception chaining.

        Your fix is interesting, but I'm not sure it's correct.
        Propagting RemarshalException through the ValueHandler seems
        unfortunate (the ValueHandler should be relatively independent
        of such details), but may be necessary. I'm more concerned
        about why the operation failed in the first place.

        Do you have more details on the stack trace when the operation fails?
        I'm particularly interested in what happens in the
        SocketOrChannelConnectionImpl.write call. It sounds like either the
        getSocketChannel().write() call fails, or else the timeout
        (which is configurable) trips, because the socket is so backed up
        with data that it does not accept more data for longer than the
        timeout.

        A 250 MB request will generate around 63000 4 KB GIOP messages and do it
        fairly quickly. I wonder what is happening at the OS level in this case?
        I understand getting held off because no kernel buffers are available, but
        why is there a failure here? More details would be very helpful.

        Show
        Ken Cavanaugh added a comment - The patch for the exception messages is fine; I think the error may have been left over from an earlier fix that took care of a problem with lack of exception chaining. Your fix is interesting, but I'm not sure it's correct. Propagting RemarshalException through the ValueHandler seems unfortunate (the ValueHandler should be relatively independent of such details), but may be necessary. I'm more concerned about why the operation failed in the first place. Do you have more details on the stack trace when the operation fails? I'm particularly interested in what happens in the SocketOrChannelConnectionImpl.write call. It sounds like either the getSocketChannel().write() call fails, or else the timeout (which is configurable) trips, because the socket is so backed up with data that it does not accept more data for longer than the timeout. A 250 MB request will generate around 63000 4 KB GIOP messages and do it fairly quickly. I wonder what is happening at the OS level in this case? I understand getting held off because no kernel buffers are available, but why is there a failure here? More details would be very helpful.
        Hide
        kurti added a comment -

        Created an attachment (id=2850)
        I get the same problem, but only, if I have I direct network LAN connection. If I am using a wireless connection - even if I am in the same IP subnet - the problem goes away. This is the stacktrace on the server.

        Show
        kurti added a comment - Created an attachment (id=2850) I get the same problem, but only, if I have I direct network LAN connection. If I am using a wireless connection - even if I am in the same IP subnet - the problem goes away. This is the stacktrace on the server.
        Hide
        kurti added a comment -

        Created an attachment (id=2851)
        And this the stacktrace on the client. Server is Ubuntu 7.10 i686, JDK 1.6.0_13; client is WindowsXP JDK 1.6.0_13

        Show
        kurti added a comment - Created an attachment (id=2851) And this the stacktrace on the client. Server is Ubuntu 7.10 i686, JDK 1.6.0_13; client is WindowsXP JDK 1.6.0_13
        Hide
        kurti added a comment -

        I think I found the reason and a workaround for this problem:

        The default Corba timeouts are 2 seconds. If the server is fast, the network
        connection is fast but the client is too slow and you have lots of data, the
        server hits this timeout and the exception occurs. This matches my observations
        that it works when connected via WLAN but does not work via LAN.

        The workaround is to increase the network timeout: open domain.xml, look for the
        jvm-options and add a line like this:

        <jvm-options>-Dcom.sun.corba.ee.transport.ORBTCPTimeouts=20000:60000:20</jvm-options>

        The first number is the timeout in milliseconds, the second the maximum timeout
        and the last an increase percent (default: 2000:6000:20). According to the
        original exception, the server only waited 2000ms, so something went wrong here.
        With 20000 it works for me. You can find an explanation of this option here:

        http://blogs.sun.com/ejcorba/entry/more_on_tcp_timeouts_in

        Hey bug owner, this is you own page. Thanks for that, it showed me the right
        direction.

        Show
        kurti added a comment - I think I found the reason and a workaround for this problem: The default Corba timeouts are 2 seconds. If the server is fast, the network connection is fast but the client is too slow and you have lots of data, the server hits this timeout and the exception occurs. This matches my observations that it works when connected via WLAN but does not work via LAN. The workaround is to increase the network timeout: open domain.xml, look for the jvm-options and add a line like this: <jvm-options>-Dcom.sun.corba.ee.transport.ORBTCPTimeouts=20000:60000:20</jvm-options> The first number is the timeout in milliseconds, the second the maximum timeout and the last an increase percent (default: 2000:6000:20). According to the original exception, the server only waited 2000ms, so something went wrong here. With 20000 it works for me. You can find an explanation of this option here: http://blogs.sun.com/ejcorba/entry/more_on_tcp_timeouts_in Hey bug owner, this is you own page. Thanks for that, it showed me the right direction.
        Hide
        kurti added a comment -

        Unfortunately, the command line option does not work as expected. I set it to
        60000, but after about 10 seconds I got the very same error message with "time
        spent waiting = 60.000 ms". So it looks as if the time is set but not used.

        Show
        kurti added a comment - Unfortunately, the command line option does not work as expected. I set it to 60000, but after about 10 seconds I got the very same error message with "time spent waiting = 60.000 ms". So it looks as if the time is set but not used.
        Hide
        kurti added a comment -

        I think I found the reason why the option does not work: issue #4382

        I get

        "IOP00110231: (BAD_PARAM) Timeout data must be 3 positive decimal integers
        separated by :"
        org.omg.CORBA.BAD_PARAM: vmcid: SUN minor code: 231 completed: No
        at
        com.sun.corba.ee.impl.logging.ORBUtilSystemException.badTimeoutDataLength(ORBUtilSystemException.java:2370)
        at
        com.sun.corba.ee.impl.logging.ORBUtilSystemException.badTimeoutDataLength(ORBUtilSystemException.java:2392)
        at
        com.sun.corba.ee.impl.transport.TcpTimeoutsImpl.<init>(TcpTimeoutsImpl.java:62)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at
        sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
        sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
        com.sun.corba.ee.spi.orb.OperationFactoryExt$ConvertAction.operate(OperationFactoryExt.java:73)
        at
        com.sun.corba.ee.impl.orb.NormalParserAction.apply(NormalParserAction.java:58)
        at com.sun.corba.ee.spi.orb.PropertyParser.parse(PropertyParser.java:81)
        at com.sun.corba.ee.spi.orb.ParserImplBase.init(ParserImplBase.java:81)
        at
        com.sun.corba.ee.impl.orb.ORBDataParserImpl.<init>(ORBDataParserImpl.java:481)
        at com.sun.corba.ee.impl.orb.ORBImpl.postInit(ORBImpl.java:587)
        at com.sun.corba.ee.impl.orb.ORBImpl.set_parameters(ORBImpl.java:701)
        at org.omg.CORBA.ORB.init(ORB.java:337)
        at com.sun.enterprise.util.ORBManager.initORB(ORBManager.java:546)
        at com.sun.enterprise.util.ORBManager.getORB(ORBManager.java:278)
        at com.sun.enterprise.util.ORBManager.getORB(ORBManager.java:289)
        at
        com.sun.enterprise.server.ondemand.EjbServiceGroup.createORB(EjbServiceGroup.java:511)
        at
        com.sun.enterprise.server.ondemand.EjbServiceGroup.startORB(EjbServiceGroup.java:437)
        at
        com.sun.enterprise.server.ondemand.EjbServiceGroup._start(EjbServiceGroup.java:156)
        at
        com.sun.enterprise.server.ondemand.EjbServiceGroup.start(EjbServiceGroup.java:143)
        at
        com.sun.enterprise.server.ondemand.ServiceGroup$1.run(ServiceGroup.java:193)
        at java.security.AccessController.doPrivileged(Native Method)
        at
        com.sun.enterprise.server.ondemand.ServiceGroup.startChildren(ServiceGroup.java:190)
        at
        com.sun.enterprise.server.ondemand.MainServiceGroup.start(MainServiceGroup.java:58)
        at
        com.sun.enterprise.server.ondemand.ServerEntryListenerImpl.notifyEntry(ServerEntryListenerImpl.java:85)
        at
        com.sun.enterprise.server.ondemand.entry.ServerEntryHelper.sendEvent(ServerEntryHelper.java:75)
        at
        com.sun.enterprise.server.ondemand.entry.ServerEntryHelper.generateStartUpEntryContext(ServerEntryHelper.java:64)
        at
        com.sun.enterprise.server.ondemand.OnDemandServer.generateEntryContext(OnDemandServer.java:154)
        at
        com.sun.enterprise.server.ondemand.OnDemandServer.onStartup(OnDemandServer.java:133)
        at com.sun.enterprise.server.PEMain.run(PEMain.java:409)
        at com.sun.enterprise.server.PEMain.main(PEMain.java:336)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
        sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.sun.enterprise.server.PELaunch.main(PELaunch.java:415)

        Changing the line in domain.xml to

        <jvm-options>-Dcom.sun.corba.ee.transport.ORBTCPTimeouts=60000:180000:20:2000000</jvm-options>

        does not cause the Exception. I cannot test if anything differs now, as we
        reduced the object size in our code.

        Show
        kurti added a comment - I think I found the reason why the option does not work: issue #4382 I get "IOP00110231: (BAD_PARAM) Timeout data must be 3 positive decimal integers separated by :" org.omg.CORBA.BAD_PARAM: vmcid: SUN minor code: 231 completed: No at com.sun.corba.ee.impl.logging.ORBUtilSystemException.badTimeoutDataLength(ORBUtilSystemException.java:2370) at com.sun.corba.ee.impl.logging.ORBUtilSystemException.badTimeoutDataLength(ORBUtilSystemException.java:2392) at com.sun.corba.ee.impl.transport.TcpTimeoutsImpl.<init>(TcpTimeoutsImpl.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.sun.corba.ee.spi.orb.OperationFactoryExt$ConvertAction.operate(OperationFactoryExt.java:73) at com.sun.corba.ee.impl.orb.NormalParserAction.apply(NormalParserAction.java:58) at com.sun.corba.ee.spi.orb.PropertyParser.parse(PropertyParser.java:81) at com.sun.corba.ee.spi.orb.ParserImplBase.init(ParserImplBase.java:81) at com.sun.corba.ee.impl.orb.ORBDataParserImpl.<init>(ORBDataParserImpl.java:481) at com.sun.corba.ee.impl.orb.ORBImpl.postInit(ORBImpl.java:587) at com.sun.corba.ee.impl.orb.ORBImpl.set_parameters(ORBImpl.java:701) at org.omg.CORBA.ORB.init(ORB.java:337) at com.sun.enterprise.util.ORBManager.initORB(ORBManager.java:546) at com.sun.enterprise.util.ORBManager.getORB(ORBManager.java:278) at com.sun.enterprise.util.ORBManager.getORB(ORBManager.java:289) at com.sun.enterprise.server.ondemand.EjbServiceGroup.createORB(EjbServiceGroup.java:511) at com.sun.enterprise.server.ondemand.EjbServiceGroup.startORB(EjbServiceGroup.java:437) at com.sun.enterprise.server.ondemand.EjbServiceGroup._start(EjbServiceGroup.java:156) at com.sun.enterprise.server.ondemand.EjbServiceGroup.start(EjbServiceGroup.java:143) at com.sun.enterprise.server.ondemand.ServiceGroup$1.run(ServiceGroup.java:193) at java.security.AccessController.doPrivileged(Native Method) at com.sun.enterprise.server.ondemand.ServiceGroup.startChildren(ServiceGroup.java:190) at com.sun.enterprise.server.ondemand.MainServiceGroup.start(MainServiceGroup.java:58) at com.sun.enterprise.server.ondemand.ServerEntryListenerImpl.notifyEntry(ServerEntryListenerImpl.java:85) at com.sun.enterprise.server.ondemand.entry.ServerEntryHelper.sendEvent(ServerEntryHelper.java:75) at com.sun.enterprise.server.ondemand.entry.ServerEntryHelper.generateStartUpEntryContext(ServerEntryHelper.java:64) at com.sun.enterprise.server.ondemand.OnDemandServer.generateEntryContext(OnDemandServer.java:154) at com.sun.enterprise.server.ondemand.OnDemandServer.onStartup(OnDemandServer.java:133) at com.sun.enterprise.server.PEMain.run(PEMain.java:409) at com.sun.enterprise.server.PEMain.main(PEMain.java:336) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.enterprise.server.PELaunch.main(PELaunch.java:415) Changing the line in domain.xml to <jvm-options>-Dcom.sun.corba.ee.transport.ORBTCPTimeouts=60000:180000:20:2000000</jvm-options> does not cause the Exception. I cannot test if anything differs now, as we reduced the object size in our code.
        Hide
        kumara added a comment -

        Changing version from 9.1.1 to v2.1 to reflect new name/version.

        Show
        kumara added a comment - Changing version from 9.1.1 to v2.1 to reflect new name/version.
        Hide
        Ken Cavanaugh added a comment -

        Not for the current v3 release.

        Show
        Ken Cavanaugh added a comment - Not for the current v3 release.
        Hide
        jagadesh added a comment -

        Will not be fixed for V2.1.1

        Show
        jagadesh added a comment - Will not be fixed for V2.1.1
        Hide
        Ken Cavanaugh added a comment -

        Given the number of votes on this issue, I'd like to know if there is still a
        problem on v3, which has the ORBTCPTimeouts parsing error fixed. I'll write an
        ORB test sending large amounts of data and see what happens in the current ORB.

        If the problem is still present, I can increase the timeouts, but sending
        250 MB messages to an EJB is unusual enough that perhaps adjusting the
        timeouts should be expected. Another possibility is to investigate
        automatic adjustment of the timeouts when a very large transfer is detected.

        Show
        Ken Cavanaugh added a comment - Given the number of votes on this issue, I'd like to know if there is still a problem on v3, which has the ORBTCPTimeouts parsing error fixed. I'll write an ORB test sending large amounts of data and see what happens in the current ORB. If the problem is still present, I can increase the timeouts, but sending 250 MB messages to an EJB is unusual enough that perhaps adjusting the timeouts should be expected. Another possibility is to investigate automatic adjustment of the timeouts when a very large transfer is detected.
        Hide
        Ken Cavanaugh added a comment -

        Moved to v3.1

        Show
        Ken Cavanaugh added a comment - Moved to v3.1
        Hide
        Ken Cavanaugh added a comment -

        No time to write the test for GF 3.1. Moving to 3.2.

        Show
        Ken Cavanaugh added a comment - No time to write the test for GF 3.1. Moving to 3.2.
        kenaiadmin made changes -
        Field Original Value New Value
        issue.field.bugzillaimportkey 7059 38663
        Jill Sato made changes -
        Fix Version/s 4.0 [ 10970 ]
        Fix Version/s 3.2 [ 10969 ]
        Ed Bratt made changes -
        Assignee Ken Cavanaugh [ kcavanaugh ] Harshad Vilekar [ hvilekar ]

          People

          • Assignee:
            Harshad Vilekar
            Reporter:
            Dies Koper
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: