json-processing-spec
  1. json-processing-spec
  2. JSON_PROCESSING_SPEC-1

Do not require that all content end up in String or char[]

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 1.0-pr
    • Labels:
      None

      Description

      In my work on the JRuby project, it has become painfully obvious that many Java APIs lose performance out of the gate because of the cost of decoding all incoming bytes to char[] before processing them. This also makes it difficult for JVM languages that use a different String representation to use those APIs.

      I propose that the JSON processing API for Java should not impose String or char[] on consumers unnecessarily. In the style of the "spymemcached" library, it should be possible to register a factor that can create strings of other forms directly from the incoming bytes, allowing for parsing and processing JSON without ever decoding. This would make it possible (and may be necessary) to match the performance of C-based libraries, and allows consumers that do not want decoded characters/strings to use the raw bytes directly.

      I will be monitoring this JSR once activity begins and discussions are made public.

        Activity

        Hide
        headius added a comment -

        My description should read "register a factory" instead of "register a factor".

        Show
        headius added a comment - My description should read "register a factory" instead of "register a factor".
        Show
        jitu added a comment - Updating the issue with some related discussion: http://java.net/projects/json-processing-spec/lists/users/archive/2012-04/message/0 http://java.net/projects/json-processing-spec/lists/users/archive/2012-04/message/2 http://java.net/projects/json-processing-spec/lists/users/archive/2012-04/message/22 http://java.net/projects/json-processing-spec/lists/users/archive/2012-04/message/23 http://java.net/projects/json-processing-spec/lists/users/archive/2012-04/message/25
        Hide
        jitu added a comment -

        We are supporting byte streams, but we are not exposing byte[] in parser or JsonString. I think that would be less useful for developers. Moreover, one needs to encoding of the underlying stream to use it.

        Show
        jitu added a comment - We are supporting byte streams, but we are not exposing byte[] in parser or JsonString. I think that would be less useful for developers. Moreover, one needs to encoding of the underlying stream to use it.
        Hide
        headius added a comment -

        I hope you can elaborate on that a bit. Supporting byte streams but still transcoding everything to UTF-16 char strings would defeat all the gains of being able to work directly with bytes.

        The factory suggestion still seems like the cleanest way. In an ideal world, we'd be able to register a factory that receives the incoming byte[] + offsets and we can then construct whatever string-like structure we want from that. It would avoid unnecessary transcoding for languages and libraries that can work directly with bytes, and it would eliminate lots of transient objects and overhead from going to String eagerly.

        I'd like to understand better what you mean by "supporting byte streams".

        Show
        headius added a comment - I hope you can elaborate on that a bit. Supporting byte streams but still transcoding everything to UTF-16 char strings would defeat all the gains of being able to work directly with bytes. The factory suggestion still seems like the cleanest way. In an ideal world, we'd be able to register a factory that receives the incoming byte[] + offsets and we can then construct whatever string-like structure we want from that. It would avoid unnecessary transcoding for languages and libraries that can work directly with bytes, and it would eliminate lots of transient objects and overhead from going to String eagerly. I'd like to understand better what you mean by "supporting byte streams".
        Hide
        jitu added a comment -

        We are supporting creation of parser objects using byte streams like InputStream (rather than character streams like Reader). Some of the provider impl take advantage of working with byte streams for certain encodings and don't convert to characters internally.

        At the application level, most users would be consuming them as String and the provider impls produce the String when it is asked. The pull parser is lazy in that sense. You are suggesting to add something like the following approaches:

        1) A way to register/use factory
        JsonParser

        { void setFactory(SomeFactory<T> f) // valid in VALUE_STRING, KEY_NAME states // go through SomeFactory to create T T getStringObject(byte[] buf, int offset, int len) }

        or

        2) one other alternative using already existing JsonString interface

        JsonParser

        { + JsonString getString(); }

        // May be existing JsonString is good enough(no need to add additional methods like getBytes etc)
        JsonString

        { .. + byte[] getBytes(); + Charset getCharset(); }

        I think exposing bytes doesn't work well at the application level for various reasons. One of the reasons is escaping of characters and one cannot expose the internal buffer directly as it is. I think a custom provider and a subtype of JsonParser would be good for this case.

        I will start a thread on the users list. Please follow there.

        Show
        jitu added a comment - We are supporting creation of parser objects using byte streams like InputStream (rather than character streams like Reader). Some of the provider impl take advantage of working with byte streams for certain encodings and don't convert to characters internally. At the application level, most users would be consuming them as String and the provider impls produce the String when it is asked. The pull parser is lazy in that sense. You are suggesting to add something like the following approaches: 1) A way to register/use factory JsonParser { void setFactory(SomeFactory<T> f) // valid in VALUE_STRING, KEY_NAME states // go through SomeFactory to create T T getStringObject(byte[] buf, int offset, int len) } or 2) one other alternative using already existing JsonString interface JsonParser { + JsonString getString(); } // May be existing JsonString is good enough(no need to add additional methods like getBytes etc) JsonString { .. + byte[] getBytes(); + Charset getCharset(); } I think exposing bytes doesn't work well at the application level for various reasons. One of the reasons is escaping of characters and one cannot expose the internal buffer directly as it is. I think a custom provider and a subtype of JsonParser would be good for this case. I will start a thread on the users list. Please follow there.
        Hide
        jitu added a comment -
        Show
        jitu added a comment - Resolving without any action as per the discussion http://java.net/projects/json-processing-spec/lists/users/archive/2012-11/message/81

          People

          • Assignee:
            Unassigned
            Reporter:
            headius
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: