glassfish
  1. glassfish
  2. GLASSFISH-17504

High Availability (HA) webapps slow, corrupted sessions, and java.util.concurrent.TimeoutException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.1
    • Fix Version/s: 3.1.2_b17
    • Component/s: web_container
    • Labels:
      None
    • Environment:

      Description

      I set up a cluster, and deployed my JSP application onto it.
      It works great until I turn on high-availability for this application via the Admin console.
      Once I do that, it becomes very slow, and session state gets lost every 2 requests or so.
      Disabling high-availability cures the problem.

      I did run verity_multicast, GMS is running, cluster health is good, followed the documentation,
      and didn't do anything 'weird' or customized'.
      I also have in my web application.

      There are no errors in the log files. When I turn on high-availability, I do get this error very frequently:
      [#|2011-03-06T02:13:00.297-0500|WARNING|glassfish3.1|org.shoal.ha.cache.command.load_request|_ThreadID=27;_ThreadName=Thread-1;|LoadRequestCommand timed out while waiting for result java.util.concurrent.TimeoutException|#]

        Activity

        Hide
        Mahesh Kannan added a comment -

        Tested with the ctestservlet mentioned in 15575. The real issue here is that the app uses multiple gifs/jpegs that causes a browser to make concurrent requests to the server. Due to the absence of relaxVersionSemantics in sun-web.xml, the web container makes approximately 7 load_requests to the replication layer for every page access!

        Some of the load_requests were lost because we do batching (using a map) based on sessionid.

        I have fixed the loss of load_requests with fix to shoal (commit version 1732).
        After adding the relaxVersionSemantics to the app, there were no session loss.

        <comment from submitter>
        cluster has 2 nodes, both are full (not virtual) machines. There is no traffic (test server) just sitting trying to use the app with one browser
        </comment from submitter>

        I would like to add that if there are multiple physical machines, you have to use a load balancer otherwise jsessionid cookie will not be automatically sent by the browser. This has nothing to with replication or web container. This is how browsers work.

        Show
        Mahesh Kannan added a comment - Tested with the ctestservlet mentioned in 15575. The real issue here is that the app uses multiple gifs/jpegs that causes a browser to make concurrent requests to the server. Due to the absence of relaxVersionSemantics in sun-web.xml, the web container makes approximately 7 load_requests to the replication layer for every page access! Some of the load_requests were lost because we do batching (using a map) based on sessionid. I have fixed the loss of load_requests with fix to shoal (commit version 1732). After adding the relaxVersionSemantics to the app, there were no session loss. <comment from submitter> cluster has 2 nodes, both are full (not virtual) machines. There is no traffic (test server) just sitting trying to use the app with one browser </comment from submitter> I would like to add that if there are multiple physical machines, you have to use a load balancer otherwise jsessionid cookie will not be automatically sent by the browser. This has nothing to with replication or web container. This is how browsers work.
        Hide
        Joe Fialli added a comment -

        Shoal 1.6.17 integrated into bg trunk as part of svn version 52009 on January 10, 2012.
        Fix should be in next promoted build which is 4.0 b19.

        Show
        Joe Fialli added a comment - Shoal 1.6.17 integrated into bg trunk as part of svn version 52009 on January 10, 2012. Fix should be in next promoted build which is 4.0 b19.
        Hide
        Joe Fialli added a comment -

        Shoal 1.6.17 integrated into bg trunk as part of svn version 52009 on January 10, 2012.
        Fix should be in next promoted build which is 4.0 b19.

        Show
        Joe Fialli added a comment - Shoal 1.6.17 integrated into bg trunk as part of svn version 52009 on January 10, 2012. Fix should be in next promoted build which is 4.0 b19.
        Hide
        lprimak added a comment -

        Looks like this is confirmed fixed now. Thanks a lot for your efforts.
        I didn't even need to do this: <property name="relaxCacheVersionSemantics" value="true"/>
        and it still works great!

        Show
        lprimak added a comment - Looks like this is confirmed fixed now. Thanks a lot for your efforts. I didn't even need to do this: <property name="relaxCacheVersionSemantics" value="true"/> and it still works great!
        Hide
        lprimak added a comment -

        Looks the replication problems are not fixed in 3.1.2b20,
        Some session attributes are getting lost, seemingly being overwritten
        by another node in the cluster with older data.
        The TimeoutExceptinos and slow performance are fixed though.

        I opened another issue regarding this:
        http://java.net/jira/browse/GLASSFISH-18322

        Show
        lprimak added a comment - Looks the replication problems are not fixed in 3.1.2b20, Some session attributes are getting lost, seemingly being overwritten by another node in the cluster with older data. The TimeoutExceptinos and slow performance are fixed though. I opened another issue regarding this: http://java.net/jira/browse/GLASSFISH-18322

          People

          • Assignee:
            Mahesh Kannan
            Reporter:
            lprimak
          • Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: