Issue Details (XML | Word | Printable)

Key: JERSEY-2002
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Jakub Podlesak
Reporter: kul
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
jersey

Jersey 2.x has less than 10x performance compared to 1.7.1

Created: 06/Aug/13 08:00 AM   Updated: 03/Mar/14 10:49 PM   Resolved: 15/Oct/13 01:37 PM
Component/s: core
Affects Version/s: 2.1
Fix Version/s: 2.4

Time Tracking:
Original Estimate: Not Specified
Remaining Estimate: 0 minutes
Remaining Estimate - 0 minutes
Time Spent: 3 days, 22 hours
Time Spent - 3 days, 22 hours

Environment:

Ubuntu 12.04 java 1.70_25

Issue Links:
Related

Tags: jersey-grizzly2 jersey benchmark
Participants: Jakub Podlesak, kul and Matt Hauck


 Description  « Hide

Old configuration:
"com.sun.jersey" % "jersey-core" % "1.17.1",
"com.sun.jersey" % "jersey-server" % "1.17.1",
"com.sun.jersey" % "jersey-grizzly2" % "1.17.1"

New configuration:
"org.glassfish.jersey.containers" % "jersey-container-grizzly2-http" % "2.0",
"org.glassfish.jersey.media" % "jersey-media-multipart" % "2.0"

For following rest resource

package org.bench.rest                                                                                                                 
                                                                                                                                                
import javax.ws.rs.GET                                                                                                                          
import javax.ws.rs.Path                                                                                                                         
                                                                                                                                                
@Path("/ping")                                                                                                                                  
class PingPong {                                                                                                                                
                                                                                                                                                
  @GET                                                                                                                                          
  def pong = "pong"                                                                                                                             
                                                                                                                                                
}

Server is created using the usual ServerFactory method and benchmarked using wrk.
Benchmarked using two machines one serving resources other running wrk.
Processor i3, memory 4g

1.7.1 performance:

% ./wrk -t12 -c256 -d30s http://169.254.10.190:9999/ping
Running 30s test @ http://169.254.10.190:9999/ping
  12 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   ± Stdev
    Latency     3.44ms    6.64ms 283.34ms   95.08%
    Req/Sec     6.62k     1.22k   15.00k    74.44%
  2253249 requests in 30.00s, 266.62MB read
Requests/sec:  75114.77
Transfer/sec:      8.89MB

2.x performance:

% ./wrk -t12 -c256 -d30s http://169.254.10.190:9999/ping
Running 30s test @ http://169.254.10.190:9999/ping
  12 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   ± Stdev
    Latency    45.09ms   14.37ms 153.63ms   78.27%
    Req/Sec   490.59     96.81   735.00     76.91%
  172721 requests in 30.00s, 17.30MB read
Requests/sec:   5757.37
Transfer/sec:    590.67KB


Jakub Podlesak added a comment - 06/Aug/13 08:17 AM

We do continuous performance testing for Jersey 2 to make sure we do not introduce serious regressions in performance.
Last year we did compare Jersey 1 and Jersey 2 and the results were not that bad (about 30 % drop).
So i am surprised to see this. Anyway, I am going to re-run Jersey 1/Jersey 2 performance comparison tests of my own
and will get back.


Jakub Podlesak added a comment - 06/Aug/13 08:18 AM

These might be related.


Jakub Podlesak added a comment - 03/Sep/13 08:13 AM - edited

Above results seem scary, however, the above numbers are IMHO impacted by insufficient warm-up time period.
The following is results from my measurement:

Jersey 2.3-SNAPSHOT:

jerseyrobot@japod-dell:~$ ./wrk -t16 -c256 -d180s http://10.163.76.25:8080/text
Running 3m test @ http://10.163.76.25:8080/text
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    27.06ms   13.85ms 228.29ms   93.72%
    Req/Sec   639.75    113.57   700.00     92.85%
  1837643 requests in 3.00m, 184.14MB read
  Socket errors: connect 0, read 0, write 0, timeout 106
Requests/sec:  10209.26
Transfer/sec:      1.02MB

Jersey 1.17.1:

jerseyrobot@japod-dell:~$ ./wrk -t16 -c256 -d180s http://10.163.76.25:8080/text
Running 3m test @ http://10.163.76.25:8080/text
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.82ms    9.06ms  83.61ms   90.01%
    Req/Sec     1.80k   434.70     2.16k    89.93%
  5157696 requests in 3.00m, 610.29MB read
  Socket errors: connect 0, read 0, write 0, timeout 91
Requests/sec:  28654.57
Transfer/sec:      3.39MB

After some hacking (improvements in HK2), i got (still Jersey 2.3-SNAPSHOT version, but updated HK2):

jerseyrobot@japod-dell:~$ ./wrk -t16 -c256 -d180s http://10.163.76.25:8080/text
Running 3m test @ http://10.163.76.25:8080/text
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.61ms    3.88ms  80.29ms   82.74%
    Req/Sec     0.88k   112.96     1.62k    79.39%
  2483412 requests in 3.00m, 248.85MB read
  Socket errors: connect 0, read 0, write 0, timeout 86
Requests/sec:  13796.77
Transfer/sec:      1.38MB

Which is still about 60 % drop, but not 10 times worse.

So far, we did testing internally with the Apache Benchmark tool, which now seem not to be generating enough load
in our test scenario. I am getting much higher numbers with the Google wrk tool.
Overall, the test setup was giving twisted results so far.
I need to re-work the tests to get more precise results and then focus on improving the numbers.


Jakub Podlesak added a comment - 15/Oct/13 01:37 PM - edited

Jersey versions from 2.0 up to version 2.3.2 suffer from over-synchronisation when running in multithreaded environment.
This causes the above mentioned Jersey 2.x versions not to be able to serve as many requests in parallel as previous Jersey 1.x versions
when running in the same environment. Latency of individual responses has not been impacted that much, so this problem should mainly
concern users who experience high volume of requests.

The major obstacles have been identified and fixed in HK2 and Jersey (on the server side), so that the above explained
"performance drop" is now less significant. Jersey 1.x could still serve about 1-3x more requests than Jersey 2.0-2.3.x
depending on concrete environment and use-case, but Jersey 2.x runtime should now scale up with the number of worker threads.

Closing this bug report as fixed as i have not seen 10x worse performance in Jersey 2.4-SNAPSHOT after the fix has been implemented.
When testing please bear in mind it could take JiT several minutes to make all necessary byte code optimisations, so be careful
when interpreting any numbers.

Another tasks will be likely filed to track further performance improvement efforts in Jersey 2.


Matt Hauck added a comment - 03/Mar/14 09:56 PM

I am still seeing pretty bad performance numbers on jersey 2.6.

I have a simple "status" API that doesn't really do anything except return a java object with an "OK" status, which gets marshalled into JSON. Using an internal load tool, when I call this 1,000 times (w/ 5ms delay in between calls), I see an average time of 14ms, 90th percentile time of 26ms, and max time of 163ms. Running it against jersey 2, I am seeing 481ms average time, 862ms for 90th percentile, and max time of 1s 214ms. Both of these are results after the server is well-warmed up.

We are using jersey w/ spring integration. When I tried using the jersey-spring3 module, I was seeing horrendous results, at least double of what I listed above for jersey2 performance. So, I dropped that and wrote a simple integration to just pull beans from spring and skip any hk2 injection. This has improved the numbers some, but I'm still seeing a jump from 14ms to 481ms! That is more than 30x worse.

I will check with jersey 2.4 to see if a regression has happened since then, but I am very saddened by this. I was hoping to upgrade to jersey2 to get some of the newer async response features, but I'm not sure we'll be able to upgrade now...

(fwiw, I'm comparing with a rather old version of jersey: 1.3. Not sure how much changed in between 1.3 and 1.18 latest...)


Matt Hauck added a comment - 03/Mar/14 10:49 PM

Okay, actually, I think most of the badness is in the spring3 module. I realized shortly after I commented here that I was running my tomcat server w/ jprofiler enabled. =P

However, the reason I turned on jprofiler was because the performance was really quite bad with the spring3 module in place. Using my own spring integration has helped out much actually, and on 2.6 I'm getting pretty good numbers down in the teens and low tens of milliseconds.

Good news. =)