[JSON_PROCESSING_SPEC-1] Do not require that all content end up in String or char[] Created: 04/Feb/12  Updated: 30/Nov/12  Resolved: 30/Nov/12

Status: Closed
Project: json-processing-spec
Component/s: None
Affects Version/s: None
Fix Version/s: 1.0-pr

Type: New Feature Priority: Major
Reporter: headius Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


In my work on the JRuby project, it has become painfully obvious that many Java APIs lose performance out of the gate because of the cost of decoding all incoming bytes to char[] before processing them. This also makes it difficult for JVM languages that use a different String representation to use those APIs.

I propose that the JSON processing API for Java should not impose String or char[] on consumers unnecessarily. In the style of the "spymemcached" library, it should be possible to register a factor that can create strings of other forms directly from the incoming bytes, allowing for parsing and processing JSON without ever decoding. This would make it possible (and may be necessary) to match the performance of C-based libraries, and allows consumers that do not want decoded characters/strings to use the raw bytes directly.

I will be monitoring this JSR once activity begins and discussions are made public.

Comment by headius [ 04/Feb/12 ]

My description should read "register a factory" instead of "register a factor".

Comment by jitu [ 13/Jun/12 ]

Updating the issue with some related discussion:


Comment by jitu [ 21/Nov/12 ]

We are supporting byte streams, but we are not exposing byte[] in parser or JsonString. I think that would be less useful for developers. Moreover, one needs to encoding of the underlying stream to use it.

Comment by headius [ 26/Nov/12 ]

I hope you can elaborate on that a bit. Supporting byte streams but still transcoding everything to UTF-16 char strings would defeat all the gains of being able to work directly with bytes.

The factory suggestion still seems like the cleanest way. In an ideal world, we'd be able to register a factory that receives the incoming byte[] + offsets and we can then construct whatever string-like structure we want from that. It would avoid unnecessary transcoding for languages and libraries that can work directly with bytes, and it would eliminate lots of transient objects and overhead from going to String eagerly.

I'd like to understand better what you mean by "supporting byte streams".

Comment by jitu [ 26/Nov/12 ]

We are supporting creation of parser objects using byte streams like InputStream (rather than character streams like Reader). Some of the provider impl take advantage of working with byte streams for certain encodings and don't convert to characters internally.

At the application level, most users would be consuming them as String and the provider impls produce the String when it is asked. The pull parser is lazy in that sense. You are suggesting to add something like the following approaches:

1) A way to register/use factory

{ void setFactory(SomeFactory<T> f) // valid in VALUE_STRING, KEY_NAME states // go through SomeFactory to create T T getStringObject(byte[] buf, int offset, int len) }


2) one other alternative using already existing JsonString interface


{ + JsonString getString(); }

// May be existing JsonString is good enough(no need to add additional methods like getBytes etc)

{ .. + byte[] getBytes(); + Charset getCharset(); }

I think exposing bytes doesn't work well at the application level for various reasons. One of the reasons is escaping of characters and one cannot expose the internal buffer directly as it is. I think a custom provider and a subtype of JsonParser would be good for this case.

I will start a thread on the users list. Please follow there.

Comment by jitu [ 30/Nov/12 ]

Resolving without any action as per the discussion

Generated at Tue May 26 17:05:23 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.