jaxb
  1. jaxb
  2. JAXB-960

JAXB generates invalid XML (includes characters illegal in XML 1.0)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.6
    • Fix Version/s: None
    • Component/s: runtime
    • Labels:
      None

      Description

      As per the XML spec [1], the following characters are legal in XML 1.0:

      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

      However, JAXB allows other, illegal characters in input strings (e.g. bell character 0x0007, or vertical tab 0x000B), and marshals them into output XML without any errors or warnings.

      I know the solution is not to escape them, since they are illegal regardless of whether they are escaped or not (see JAXB-226), but the fact that JAXB generates invalid (and unparseable) XML without any sort of error or warning seems wrong to me.

      There are a number of workarounds out in the wild [2, 3] that rely on replacing the illegal characters with legal characters (e.g. space 0x0020, or replacement character 0xFFFD). Another option would be to eat the illegal characters and just not write them to the output.

      Regardless of the approach, I think it would be a good idea to at least provide an out-of-the-box way for users to ensure the correctness of JAXB-generated XML. Some options:

      • Add another property that can be used via Marshaller.setProperty(String, Object) to replace invalid characters with another character ("com.sun.xml.bind.illegalCharacterReplacement"?)
      • Add another property that can be used via Marshaller.setProperty(String, Object) to eat invalid characters ("com.sun.xml.bind.omitIllegalCharacters"?)
      • Enhance the out-of-the-box CharacterEscapeHandler classes to allow for this sort of replacement / omission.
      • Something else?

      [1] http://www.w3.org/TR/REC-xml/#NT-Char
      [2] http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html
      [3] http://camel.apache.org/jaxb.html#JAXB-IgnoringtheNonXMLCharacter

        Activity

        Hide
        Martin Grebac added a comment -

        Yardo, correct me if I'm wrong but we use JAXP for validating what we read/write. Thus, if valid, I think the issue should be filed against JAXP instead?

        Show
        Martin Grebac added a comment - Yardo, correct me if I'm wrong but we use JAXP for validating what we read/write. Thus, if valid, I think the issue should be filed against JAXP instead?

          People

          • Assignee:
            Iaroslav Savytskyi
            Reporter:
            gredler
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: