Affects Version/s: 2.2.6
Fix Version/s: None
As per the XML spec , the following characters are legal in XML 1.0:
However, JAXB allows other, illegal characters in input strings (e.g. bell character 0x0007, or vertical tab 0x000B), and marshals them into output XML without any errors or warnings.
I know the solution is not to escape them, since they are illegal regardless of whether they are escaped or not (see
JAXB-226), but the fact that JAXB generates invalid (and unparseable) XML without any sort of error or warning seems wrong to me.
There are a number of workarounds out in the wild [2, 3] that rely on replacing the illegal characters with legal characters (e.g. space 0x0020, or replacement character 0xFFFD). Another option would be to eat the illegal characters and just not write them to the output.
Regardless of the approach, I think it would be a good idea to at least provide an out-of-the-box way for users to ensure the correctness of JAXB-generated XML. Some options:
- Add another property that can be used via Marshaller.setProperty(String, Object) to replace invalid characters with another character ("com.sun.xml.bind.illegalCharacterReplacement"?)
- Add another property that can be used via Marshaller.setProperty(String, Object) to eat invalid characters ("com.sun.xml.bind.omitIllegalCharacters"?)
- Enhance the out-of-the-box CharacterEscapeHandler classes to allow for this sort of replacement / omission.
- Something else?