[JAXP-76] StaX: data corruption when reading Unicode SMP characters in UTF-8 XML Created: 17/Jan/13  Updated: 09/Apr/15  Resolved: 09/Apr/15

Status: Closed
Project: jaxp
Component/s: None
Affects Version/s: current
Fix Version/s: None

Type: Bug Priority: Major
Reporter: donvip Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

JRE 7u11



 Description   

The attached small XML file contains a chinese character and the first gothic character (U+10330 : http://www.unicode.org/charts/PDF/U10330.pdf)

When parsing this file using StaX, the attribute value containing the gothic character is corrupted: it contains also the chinese character from the previous attribute.

See the console output:

From XML chinese:[-16, -92, -83, -94]
Expected chinese:[-16, -92, -83, -94]
From XML gothic:[-16, -92, -83, -94, -16, -112, -116, -80]
Expected gothic:[-16, -112, -116, -80]

This issue comes from JOSM bug tracker: http://josm.openstreetmap.de/ticket/3290



 Comments   
Comment by donvip [ 17/Jan/13 ]

Sorry, how do we attach files ? If I am not allowed, they can be found here:

http://josm.openstreetmap.de/attachment/ticket/3290/gottic.osm
http://josm.openstreetmap.de/attachment/ticket/3290/Test.java

Comment by donvip [ 16/Sep/13 ]

Does anyone care about this public JIRA ?

Comment by donvip [ 29/Nov/14 ]

This bug has finally been addressed through https://bugs.openjdk.java.net/browse/JDK-8058175. Any chance to see it backported to JDK7 and JDK8?

Comment by Joe Wang [ 09/Apr/15 ]

Please note that the JAXP standalone (https://jaxp.java.net/) was retired. Please report issues to <a href="https://bugs.openjdk.java.net">the OpenJDK Bug System</a>.

Generated at Sat Jul 04 14:41:30 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.