glassfish
  1. glassfish
  2. GLASSFISH-18895

[PERF] Post Parameter Handling has large regression

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0_b43
    • Fix Version/s: 4.0
    • Component/s: grizzly-kernel
    • Labels:
      None

      Description

      In the webapp atomics, processing of POST commands has about a 10% regression in GF 3.x vs 4.0. The attached screen shots show the difference.

      The time spent in the actual character encoding is similar between the two tests (5K calls per test), but all the buffer manipulation time has impacted the parameter handling.

      I tried implementing the processParameters method to do it in a single shot on the character array (since that methods already exists, it was quite easy...). So essentially:
      byte[] b = ((org.glassfish.grizzly.memory.HeapBuffer) buffer).array();
      char[] c = new char[len];
      for (int i = start, pos = 0; pos < len; i+, pos+)

      { c[pos] = (char) b[i]; }

      processParameters(c, 0, len);

      Except that we would really need to call the encoder, but since my locale is 8-bit this was find for a quick test. That significantly improved the performance of the POST testing; GF 4 now comes in 10% faster that GF 3.1.2 (so even introducing the encoder will be fine at that point). And the requires that array() be a public method on HeapBuffer (as it is on the NIO class in the first place...) or some other mechanism. Still, then the point is is need take the setup/teardown overhead of the buffer structure only once instead of on every parameter as we do now.

      1. gf3.png
        62 kB
      2. gf4.png
        67 kB

        Activity

        Hide
        oleksiys added a comment -

        Scott, is it possible to check stats. per thread (not grouped), may be it's just one single thread breaks stats? Frankly I don't see the reason why those Buffer operations may take than long.

        To make it look more like Grizzly 1.9.x, we can make the following change to HeapBuffer:

            @Override
            public String toStringContent(Charset charset, final int position,
                    final int limit) {
                checkDispose();
                if (charset == null) {
                    charset = Charset.defaultCharset();
                }
        
        //        final boolean isRestoreByteBuffer = byteBuffer != null;
        //        int oldPosition = 0;
        //        int oldLimit = 0;
        //
        //        if (isRestoreByteBuffer) {
        //            // ByteBuffer can be used by outer code - so save its state
        //            oldPosition = byteBuffer.position();
        //            oldLimit = byteBuffer.limit();
        //        }
        //        
        //        final ByteBuffer bb = toByteBuffer0(position, limit, false);
                final ByteBuffer bb = ByteBuffer.wrap(heap, offset + position, limit - position);
        
        //        try {
                    return charset.decode(bb).toString();
        //        } finally {
        //            if (isRestoreByteBuffer) {
        //                Buffers.setPositionLimit(byteBuffer, oldPosition, oldLimit);
        //            }
        //        }
            }
        

        Can I ask you to apply this patch and check if it changes anything?

        Thanks!

        Show
        oleksiys added a comment - Scott, is it possible to check stats. per thread (not grouped), may be it's just one single thread breaks stats? Frankly I don't see the reason why those Buffer operations may take than long. To make it look more like Grizzly 1.9.x, we can make the following change to HeapBuffer: @Override public String toStringContent(Charset charset, final int position, final int limit) { checkDispose(); if (charset == null ) { charset = Charset.defaultCharset(); } // final boolean isRestoreByteBuffer = byteBuffer != null ; // int oldPosition = 0; // int oldLimit = 0; // // if (isRestoreByteBuffer) { // // ByteBuffer can be used by outer code - so save its state // oldPosition = byteBuffer.position(); // oldLimit = byteBuffer.limit(); // } // // final ByteBuffer bb = toByteBuffer0(position, limit, false ); final ByteBuffer bb = ByteBuffer.wrap(heap, offset + position, limit - position); // try { return charset.decode(bb).toString(); // } finally { // if (isRestoreByteBuffer) { // Buffers.setPositionLimit(byteBuffer, oldPosition, oldLimit); // } // } } Can I ask you to apply this patch and check if it changes anything? Thanks!
        Hide
        Scott Oaks added a comment -

        Do you mean to run the test with only a single user (one request thread)? I have done that; there is a regression still between 3.1.2 and 4.0. With the patch you have here, performance is little changed. The difference is really in the number of times we call the encoder – we get much, much better performance calling the encoder once for a long string than 100 times for a string chopped up into individual pieces (which is what we do in this test, which has 50 post parameters). That, by the way, is the same thing I think is hurting us in 18754.

        To be clearer about the single/multiple threads, here are the number of requests/second from 1 and 50 clients at a time, each sending the post request with no sleep time:

                               1 user    50 user
        GF 4.0                  4460      26500
        GF 3.1.2                5000      29000
        toStringContent patch   4900      26200
        processParameter patch  5579      31900
        
        
        Show
        Scott Oaks added a comment - Do you mean to run the test with only a single user (one request thread)? I have done that; there is a regression still between 3.1.2 and 4.0. With the patch you have here, performance is little changed. The difference is really in the number of times we call the encoder – we get much, much better performance calling the encoder once for a long string than 100 times for a string chopped up into individual pieces (which is what we do in this test, which has 50 post parameters). That, by the way, is the same thing I think is hurting us in 18754. To be clearer about the single/multiple threads, here are the number of requests/second from 1 and 50 clients at a time, each sending the post request with no sleep time: 1 user 50 user GF 4.0 4460 26500 GF 3.1.2 5000 29000 toStringContent patch 4900 26200 processParameter patch 5579 31900
        Hide
        oleksiys added a comment -

        fixed

        Show
        oleksiys added a comment - fixed

          People

          • Assignee:
            oleksiys
            Reporter:
            Scott Oaks
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: