[GLASSFISH-20681] TimeoutException when delivering static web sites Created: 04/Jul/13  Updated: 19/Sep/14  Resolved: 15/Nov/13

Status: Resolved
Project: glassfish
Component/s: None
Affects Version/s: 4.1
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Wasomumba Assignee: Ryan Lubke
Resolution: Fixed Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:
  • Glassfish V4.0.1 B01
  • JDK V1.7.0_25 x64
  • Windows 2008R2

Attachments: Java Archive File nucleus-grizzly-all.jar    

 Description   

GF needs a long time to deliver some static files and throws timeout-exceptions.

Stacktrace:

[2013-06-27T19:58:07.615+0200] [glassfish 4.0] [WARNING] [] [javax.enterprise.web] [tid: _ThreadID=21 _ThreadName=http-listener-1(4)] [timeMillis: 1372355887615] [levelValue: 900] [[
  StandardWrapperValve[default]: Servlet.service() for servlet default threw exception
java.io.IOException: java.util.concurrent.TimeoutException
	at org.glassfish.grizzly.utils.Exceptions.makeIOException(Exceptions.java:81)
	at org.glassfish.grizzly.http.io.OutputBuffer.blockAfterWriteIfNeeded(OutputBuffer.java:958)
	at org.glassfish.grizzly.http.io.OutputBuffer.write(OutputBuffer.java:682)
	at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:355)
	at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:342)
	at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:161)
	at org.apache.catalina.servlets.DefaultServlet.copy(DefaultServlet.java:2069)
	at org.apache.catalina.servlets.DefaultServlet.serveResource(DefaultServlet.java:1092)
	at org.apache.catalina.servlets.DefaultServlet.doGet(DefaultServlet.java:518)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:318)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
	at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
	at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:357)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:260)
	at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:188)
	at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:191)
	at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:168)
	at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:189)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:838)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:113)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:564)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:544)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
	at java.util.concurrent.FutureTask.get(Unknown Source)
	at org.glassfish.grizzly.http.io.OutputBuffer.blockAfterWriteIfNeeded(OutputBuffer.java:951)
	... 36 more
]]

Steps to reproduce the issue:

1.) Set up a Win2008R2 Server

2.) Install JDK 1.7.0_25 x64

3.) Download the file http://dlc.sun.com.edgesuite.net/glassfish/4.0.1/promoted/glassfish-4.0.1-b01.zip

4.) Unzip the downloaded zip file to "c:\". This will result in a new folder "c:\glassfish4".

5.) Download the file http://eching-feg.de/docroot.zip

6.) The downloaded zip file contains a folder "docroot". Please replace the folder "c:\glassfish4\glassfish\domains\domain1\docroot" with the "docroot" folder from the zip file. After this step, you should have a folder "c:\glassfish4\glassfish\domains\domain1\docroot\ui".

7.) open a command shlell (cmd.exe)

8.) In the command shell, go to the folder "c:\glassfish4\glassfish\bin".

9.) In the command shell, start glassfish with the command "startserv"

10.) After glassfish has finished his startup process, open a web browser and go to http://127.0.0.1:4848/ and wait until the page has loaded

11.) Check that there is a new Folder "generated" in the folder "C:\glassfish4\glassfish\domains\domain1" and that the folder "C:\glassfish4\glassfish\domains\domain1\config" contains a file ".consolestate".

12.) Open http://127.0.0.1:8080/ui/index.html in your web browser

13.) Clear your browser cache

14.) reload the webpage from step 12

15.) now you will see, that it takes a long time (up to one minute) to load the web page

16.) After the page has loaded, you will see multiple timeoutexceptions in the console window.



 Comments   
Comment by wherisat [ 31/Jul/13 ]

I get this immediately after forcing the user to login. It happens sporadically (probably once in every 4 logins). After login the user should be redirected to our single page app, which is a bit large, but when successful it loads in about 2-3 seconds. I've increased the network listener timeout and disabled the upload timeout (for another potential issue) to no avail. I'm still seeing this happen. The exceptions show up in the glassfish server log about 20-30 seconds after the user logs in.

Comment by httpeter [ 11/Sep/13 ]

I've got exactly the same error. I'd like to add that it happens when I use http-listener on port 443 and especially when I create multiple sessions with different browsers from the same machine. Please fix this because it's a bit of a show stopper..

Comment by Ryan Lubke [ 21/Oct/13 ]

@wherisat, @httpeter: Could you please include details on your OS and JVM?

Comment by bw22942 [ 21/Oct/13 ]

OS: RHEL Linux 5.8
JVM: JDK 1.7_40

GlassFish 4 Community Edition. I use GlassFish4 for hosting my Google Web Toolkit application which comes in the form of a WAR file. When a user navigates to http://<IP>:8080/UI, it takes a long time initially and eventually the browser times out and shows a white blank, screen. A check of the logs shows the servlet exception with TimeoutException. I refresh the browser, and the application loads successfully.

Comment by Ryan Lubke [ 21/Oct/13 ]

So far, I've been unable to reproduce. Would it be possible for one of you, using wireshark, to capture the network traffic when the issue occurs and send the resulting capture to ryan.lubke@oracle.com?

Comment by bw22942 [ 21/Oct/13 ]

I am not permitted to run WireShark here. Is there another way I can assist?

Comment by wherisat [ 21/Oct/13 ]

OS: Win 7 64-bit. JAVA: JDK 1.7 update 11 or so. I've since increased all timeouts in glassfish, updated my JDK, and reduced the size of my WAR and I no longer see this exception. I don't know which step killed the exception but it's definitely gone. If it creeps back in I'll fire up wireshark, although I haven't used it before so I'm not sure what to grab for you.

Comment by Ryan Lubke [ 21/Oct/13 ]

@bw22942: If you're able to create a scaled down test case that demonstrates the issue in your environment, that would be extremely useful.

@wherisat: Wireshark is pretty easy to use. I would recommend reviewing [1].

[1] http://www.wireshark.org/docs/wsug_html_chunked/ChapterCapture.html

Comment by sventrax [ 22/Oct/13 ]

@wherisat: I have the same issue. Which timeout have you increased? Now I have updated to java jdk1.7.0_45.

Comment by bw22942 [ 22/Oct/13 ]

Would this have anything to do with org.apache.catalina.servlet.DefaultServlet defined in default-web.xml? I don't have a default servlet assigned in my web.xml (never have), but I noticed GlassFish has one that comes with it.

When I increased the input buffer size to 99999 and the output buffer size to 99999, the DefaultServlet does not cause a TimeoutException anymore. My guess is what happens is initially on doing a CTRL+F5 (refresh all resources from web site), the DefaultServlet can't swap out fast enough the number of GWT static resources, pictures, CSS, HTML, and javascript fast enough. Therefore, it times out. When you refresh again, most of the resources are already loaded so it works just fine.

I am not sure what changed between GlassFish 3 and GlassFish 4 to cause this behavior so that now I have to use a larger buffer size, but this seems to fit well with what a user said above where they "reduced the size of their application" and it worked.

Comment by Ryan Lubke [ 22/Oct/13 ]

Indeed. I may have not have had an issue with the test case provided as on my mac, the network buffer sizes, by default, are fairly large.

I've noticed that the blocking timeout is, incorrectly, governed by the transport read timeout (it should be write).
So setting the read timeout to be a large value would help. However, a large read timeout across the board may not be desired.
I'll make a change to that code to use the write timeout and attach a JAR here. I believe the write timeout is, by default, unlimited (I'll confirm when I attach the fix).

Comment by Ryan Lubke [ 22/Oct/13 ]

Attached JARs for GlassFish 4.0 and 4.0.1-b03.

The proper timeout is now used when writing. The default is 30 seconds for writes.
This can be increased or disabled (with a value of -1) by setting the write-timeout-millis attribute on the transport associated with the network-listener.

Comment by bw22942 [ 22/Oct/13 ]

Thanks, I will apply this fix and get back to you.

Comment by bw22942 [ 22/Oct/13 ]

I placed the updated nucleus-grizzly JAR file under /opt/glassfish4/glassfish/modules/nucleus-grizzly-all.jar. I renamed the original to nucleus-grizzly-all.jar.orig. Stopped domain. Commented out increased buffer sizes. Started domain. Did a CTRL+F5. Same problem.

Comment by Ryan Lubke [ 22/Oct/13 ]

As I stated, all I changed was which timeout was used. The default of 30 seconds still applies. If you set the write-timeout-millis as suggested to a higher value, or disable it, do you still see the issue?

Comment by bw22942 [ 22/Oct/13 ]

Ok, what is the asadmin command to do this? I have never done this before.

Comment by Ryan Lubke [ 22/Oct/13 ]

asadmin set configs.config.server-config.network-config.transports.transport.<transport-name>.write-timeout-millis=<value>

The default transport name in the domain.xml is tcp.

Comment by bw22942 [ 22/Oct/13 ]

Yep, that worked. Took about a minute to finally load my app. Once it is loaded, the wait time drops considerably on subsequent refreshes. I will have to evaluate whether this is desirable before fully upgrading to GF4. Any idea how to speed this up?

Comment by Ryan Lubke [ 22/Oct/13 ]

A few questions:

  • Using 3.1.2.2, how long did the initial request to load your application?
  • What is the socket write buffer size on your system?
Comment by bw22942 [ 22/Oct/13 ]

3.1.2.2, it was just a few seconds

cat /proc/sys/net/ipv4/tcp_rmem

4096 87380 4194304

cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304

Comment by bw22942 [ 23/Oct/13 ]

Is there anything in this guide that you think would help?

https://glassfish.java.net/docs/4.0/performance-tuning-guide.pdf

Comment by Ryan Lubke [ 23/Oct/13 ]

Well, you can certainly increase the buffer sizes as you did before, but I feel that's just a workaround as I still don't understand why the blocking is occurring to the extent you're describing (1 min to load the page).

When the page is loaded, what is the size, in bytes, of the page, the number of auxiliary resources that are downloaded as well as their sizes (should be easy to obtain with the developer plugins like firebug, etc).

Comment by Ryan Lubke [ 23/Oct/13 ]

Finally can reproduce. Will be digging in further now to understand what's going on.

Comment by bw22942 [ 24/Oct/13 ]

Great! Look forward to your findings.

Comment by Ryan Lubke [ 24/Oct/13 ]

Okay, I have some information to share.

First some background, which should offer some insight as to why there is a change in behavior between GF 3.x and 4.

GF 4 uses Grizzly 2.3, which, for all intents and purposes, was a re-write of Grizzly 1.9 (used in GF 3.x). Grizzly 2.3 is much more sensitive to memory use than 1.9, and in this case, it has kind of bitten us.

There are two issues in play that are causing the problem. First is the small socket write buffer size. The default write buffer size on RH and Ubuntu is 16K. In Grizzly, in order to conserve memory, will only
allow 4x the write buffer size to be queued before we block the writer. In the case of the attached test case, there are several artifacts that are over 200K. When these are written, the initial writes made by the default Servlet will exceed the max queued data size and cause the writer to block. Because all worker threads are blocked, the async write queue is no longer being drained and the page, as far as the user is concerned, hangs when loading. Here's where the second issue comes in. By default, there are only five worker threads. In the case of the test case, there are at least five requests coming in that are requesting large resources. Due to this, all of the worker threads are blocked and all I/O comes to a halt.

As was observed, increasing the write buffer size that GF uses, helps alleviate the issue, but at the cost of memory per connection. One could also increase the number of threads, but there's no guarantee that all of those threads wouldn't eventually be blocked when up scaling the number of clients.

So, the proper fix will be to make sure that when we do block, that all async queue I/O is performed in the selector vs the worker thread. However, we need to be careful of which thread we use to perform the notification to unblock the thread and continue writing. Additionally, we should probably consider re-writing the default servlet to leverage the non-blocking api added in 3.1 [1]. (NOTE: these details are more for me). Will be working on this fix today. Hopefully, I'll have something to test later today or tomorrow.

[1] https://java.net/jira/browse/GLASSFISH-20872

Comment by bw22942 [ 24/Oct/13 ]

Can Grizzly 1.9 be used in place of Grizzly 2.3 for GF 4.x?

Comment by Ryan Lubke [ 24/Oct/13 ]

It cannot.

Comment by Ryan Lubke [ 24/Oct/13 ]

Patch JAR for GlassFish 4.0.

Comment by Ryan Lubke [ 24/Oct/13 ]

@bw22942 and/or others. Please try the latest attach JAR file for Glassfish 4.0. You should be able to revert any timeout/buffer/thread count changes and see behavior as you did in 3.1.x.

Comment by sventrax [ 25/Oct/13 ]

I have a problem: I stopped my domain, replaced the old nucleus-grizzly-all.jar with your attachment and restarted the domain. After that, I'm not able to log in in my admin console.

With this exceptions:

[glassfish 4.0] [SEVERE] [java_security.expired_certificate] [javax.enterprise.system.ssl.security.com.sun.enterprise.security.ssl.impl] [tid: _ThreadID=671 _ThreadName=admin-listener(7)] [timeMillis: 1382702314841] [levelValue: 1000] [[
SEC5054: Certificate has expired....

[glassfish 4.0] [SEVERE] [] [javax.enterprise.admin.rest] [tid: _ThreadID=103 _ThreadName=admin-listener(4)] [timeMillis: 1382702314893] [levelValue: 1000] [[

java.lang.ClassNotFoundException: org.glassfish.admin.rest.resources.generatedASM.DomainResource not found by org.glassfish.main.admin.rest-service [235]....

[glassfish 4.0] [INFO] [NCLS-REST-00003] [javax.enterprise.admin.rest] [tid: _ThreadID=103 _ThreadName=admin-listener(4)] [timeMillis: 1382702314894] [levelValue: 800] [[
An error occurred while processing the request. Please see the server logs for details.
java.lang.IllegalArgumentException: Registered component class cannot be null.

What happens? How to solve this?

Comment by bw22942 [ 25/Oct/13 ]

As soon as I call start-domain, I get a lot of exceptions thrown:

Caused by: org.osgi.framework.BundleException: Unresolved constraint in bundle org.glassfish.main.core.kernel [150]: Unable to resolve 150.0: missing requirement [150.0] osgi.wiring.package...

Command start-domain failed.

Something must of gotten corrupted too because now I can't start my glassfish4 domain at all. I tried this on two different machines.

UPDATE: I think my osgi-cache is bad when the new grizzly module is used.

Comment by Ryan Lubke [ 25/Oct/13 ]

If you delete the contents of GF_HOME/domains/domain1/osgi-cache, do the wiring issues go away?

As to the cert issue - I'm not sure where that is coming from. I've tested the patch in the exact same fashion as you would and haven't seen that problem. I would recommend first clearing the osgi cache as well and go from there.

Comment by Ryan Lubke [ 25/Oct/13 ]

Additionally, if there was a corruption, it would have been in the download or upload of the artifact.

Here are the shasum and md5sum results of what I uploaded:

[524][catsnac.gonzo: /Users/catsnac]$ shasum nucleus-grizzly-all.jar
917d816a6714fd66c01fe08965bca2905f82f9ba nucleus-grizzly-all.jar

[525][catsnac.gonzo: /Users/catsnac]$ md5sum nucleus-grizzly-all.jar
a21cb6d556152a48752312a25b0ec2e3 nucleus-grizzly-all.jar

Comment by bw22942 [ 25/Oct/13 ]

MD5 checks out.. the wiring issues do not go away when I do rm -rf osgi-cache/

I have tons of dots on the screen when doing start-domain.

I did remove the wait-millis for the tcp transport layer from the domain.xml.

Comment by Ryan Lubke [ 25/Oct/13 ]

Most likely used the wrong GF4 build (too many versions lying around). Uploaded a new patch with the proper version. Do clear out the osgi-cache before restarting the domain after putting the new jar in place.

md5sums and shasums below.

[595][catsnac.gonzo: /Users/catsnac]$ md5sum nucleus-grizzly-all.jar
e8c22fc804f2b71210a208acd4c08abe nucleus-grizzly-all.jar

[596][catsnac.gonzo: /Users/catsnac]$ shasum nucleus-grizzly-all.jar
277d9a888ca8222fa6e448571979955cff5a5a43 nucleus-grizzly-all.jar

Comment by bw22942 [ 25/Oct/13 ]

Yes! This works.

I am seeing a new exception for:

java.lang.ClassNotFoundException: com.sun.enterprise.security.Audit

However, just after that it says "SEC1011: Security Service(s) started Successfully."

I am running JDK1.7.0_40. Is this the right JDK?

Also, will this patch be backported to the GlassFish 4.0 Community Edition? Currently it is on Build 89.

I also have the same problem, I get the login screen and after trying to login, the browser times out. I have secure-admin enabled along with weak ciphers removed.

Comment by Ryan Lubke [ 25/Oct/13 ]

I am seeing a new exception for:

java.lang.ClassNotFoundException: com.sun.enterprise.security.Audit

However, just after that it says "SEC1011: Security Service(s) started Successfully."

Hmm, all the patch does is integrate a new version of Grizzly - no GF4 code is modified. Not sure what the relation is. Might be worth logging a new issue.

I am running JDK1.7.0_40. Is this the right JDK?

Should be fine. Though _45 was released this week.

Also, will this patch be backported to the GlassFish 4.0 Community Edition? Currently it is on Build 89.

I'm not sure what the current patch plans are for 4.0. So for the time being, you'll have to use this patch. If you have a support contract, I'm sure something more official would be offered.

Comment by wherisat [ 04/Nov/13 ]

FWIW the patch worked for me and I don't see any exceptions out of the ordinary. I am using JDK 1.7_40 on win7 64-bit. Thank you so much!!!!

Comment by Ryan Lubke [ 15/Nov/13 ]

Integrated Grizzly 2.3.7 into the 4.0.1 line. Should be available in the latest promoted builds.

Generated at Tue Mar 28 19:04:04 UTC 2017 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.