jaxb
  1. jaxb
  2. JAXB-614

JAXB generates illegal XML characters

    Details

    • Issuezilla Id:
      614

      Description

      Some characters (such as 0x1f) that are legal in Java strings are illegal in XML
      (and XML does not provide a way to escape such characters to make them legal).
      When JAXB marshals objects that contain these illegal characters in strings, it
      currently includes those characters in the XML, thus generating invalid XML.
      Later when it comes time to unmarshal the XML back into objects, an exception
      is thrown due to the illegal character. This could spell disaster for a system
      that, for example, write objects as XML or fast infoset into a database and then
      cannot read them back out later.
      JAXB should not be allowed to ever generate invalid XML. If an exception is
      going to be thrown, it should be thrown when generating the XML, not when trying
      to decode it. So a minimum requirement should be that JAXB throw an exception
      when attempting to generate invalid XML, or that it should at least strip out
      the characters that would be invalid (or have a property on the marshaller that
      allows this to be set).
      However, JAXB is also supposed to be converting an object to XML and back
      losslessly, so an even better solution would be to do a consistent kind of
      escaping of the offending characters in such a way that when the strings are
      marshalled back in, the original string can be reconstructed.
      It should be straightforward to come up with an escaping scheme that
      guarantees lossless translation from Strings to XML and back (e.g., convert 0x1f
      to "\u001f" or "JAXB_UNICODE_001f" or something unlikely to appear by
      accident). I don't know that it's possible to guarantee that XML generated
      through some other process won't ever be accidentally interpreted as containing
      "escaped" strings, but it can be made very unlikely.
      Below is the simplest unit test I could come up with that exposes the problem.

      public void testBinary() throws JAXBException

      { JAXBContext jxbc = JAXBContext.newInstance(OneString.class); OneString orig = new OneString(); orig.setString("\u001f"); ByteArrayOutputStream s = new ByteArrayOutputStream(); Marshaller m = jxbc.createMarshaller(); m.marshal(orig, s); String xml = s.toString(); OneString result = (OneString) jxbc.createUnmarshaller().unmarshal(new ByteArrayInputStream(xml.getBytes())); assertEquals("\u001f", result.getString()); }

      @XmlRootElement(name = "oneString")
      private static class OneString {
      String string;
      public String getString()

      { return string; }

      public void setString(String s)

      { this.string = s; }

      }

      There are workarounds for this issue, e.g., at http://tinyurl.com/cq9u58; but as
      it currently exists, this is a dangerous bug that can make data unreadable.

        Activity

        Hide
        ranboii added a comment -

        Actually, I meant to post this URL demonstrating a workaround:
        http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html
        which at least illustrates I'm not the only one to have seen this issue. Of
        course, a fix would be much better than a workaround.

        Show
        ranboii added a comment - Actually, I meant to post this URL demonstrating a workaround: http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html which at least illustrates I'm not the only one to have seen this issue. Of course, a fix would be much better than a workaround.
        Hide
        Pavel Bucek added a comment -

        partially fixed in trunk.

        IllegalArgumentException should be thrown whether you try marshal string with
        invalid xml content. But there is a catch. Invalid characters can occur when
        UTF-32 is used (it happens because of encoding its characters to UTF-16 which is
        java native encoding).

        Anyway, it is still far from perfect and needs some additional work.

        Adjusting priority and assigning to myself.

        Show
        Pavel Bucek added a comment - partially fixed in trunk. IllegalArgumentException should be thrown whether you try marshal string with invalid xml content. But there is a catch. Invalid characters can occur when UTF-32 is used (it happens because of encoding its characters to UTF-16 which is java native encoding). Anyway, it is still far from perfect and needs some additional work. Adjusting priority and assigning to myself.
        Hide
        Pavel Bucek added a comment -

        reassigning

        Show
        Pavel Bucek added a comment - reassigning
        Hide
        mnsam added a comment -

        As per the workaround, is this the list of unsupported characters ?

        "\u0000\u0001\u0002\u0003\u0004\u0005" +
        "\u0006\u0007\u0008\u000B\u000C\u000E\u000F\u0010\u0011\u0012" +
        "\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C" +
        "\u001D\u001E\u001F\uFFFE\uFFFF"

        Show
        mnsam added a comment - As per the workaround, is this the list of unsupported characters ? "\u0000\u0001\u0002\u0003\u0004\u0005" + "\u0006\u0007\u0008\u000B\u000C\u000E\u000F\u0010\u0011\u0012" + "\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C" + "\u001D\u001E\u001F\uFFFE\uFFFF"

          People

          • Assignee:
            Martin Grebac
            Reporter:
            ranboii
          • Votes:
            4 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: