sjsxp
  1. sjsxp
  2. SJSXP-78

XMLStreamWriter emits unsupported character references for supplementary characters

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Java 1.6

      Description

      When writing code points in the supplementary range and the range is unsupported by the encoding (e.g. US-ASCII), the API does not emit the correct character references.

      out.writeCharacters("\uD835\uDD0A"); //U+1D50A

      The code above should emit the character reference "𝔊" or "𝔊". Instead, it emits the surrogate pair "��". (See Supplementary.java)

      The XML spec says of supported code points: "any Unicode character, excluding the surrogate blocks, FFFE, and FFFF." The character references are marked as errors in the W3C online validator.

      See AsciiCanEncode.java for naive escape code that iterates over code points instead of code units.

      1. AsciiCanEncode.java
        0.7 kB
        mcdowell
      2. Supplementary.java
        0.4 kB
        mcdowell

        Activity

        There are no comments yet on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            mcdowell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: