
| Key: |
JAVACC-50
|
| Type: |
Bug
|
| Status: |
Closed
|
| Resolution: |
Cannot Reproduce
|
| Priority: |
Major
|
| Assignee: |
sriram
|
| Reporter: |
hfaber
|
| Votes: |
0
|
| Watchers: |
0
|
|
If you were logged in you would be able to see more operations.
|
|
|
|
Environment:
|
Operating System: Linux
Platform: PC
Operating System: Linux
Platform: PC
|
|
When a ISO8859-1 character that is not in the ASCII subset is used in the .jjt
file, this character is changed in the .jj file into the unicode escape sequence
\ufffd, making the language description incorrect.
Example:
In the .jjt file:
TOKEN :
{
<UPPERCASE_LETTER : ["A"-"Z",
"À", "�", "Â", "Ã", "Ä", "Å", "Æ",
"Ç", "È", "É", "Ê", "Ë", "Ì", "�", "Î", "�",
"�", "Ñ", "Ò", "Ó", "Õ", "Ô", "Ö", "Ø",
"Ù", "Ú", "Û", "Ü", "�", "Þ"] >
}
changes to this in the .jj file (which is clearly not the same):
TOKEN :
{
<UPPERCASE_LETTER : ["A"-"Z",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd"] >
}
|
|
Description
|
When a ISO8859-1 character that is not in the ASCII subset is used in the .jjt
file, this character is changed in the .jj file into the unicode escape sequence
\ufffd, making the language description incorrect.
Example:
In the .jjt file:
TOKEN :
{
<UPPERCASE_LETTER : ["A"-"Z",
"À", "�", "Â", "Ã", "Ä", "Å", "Æ",
"Ç", "È", "É", "Ê", "Ë", "Ì", "�", "Î", "�",
"�", "Ñ", "Ò", "Ó", "Õ", "Ô", "Ö", "Ø",
"Ù", "Ú", "Û", "Ü", "�", "Þ"] >
}
changes to this in the .jj file (which is clearly not the same):
TOKEN :
{
<UPPERCASE_LETTER : ["A"-"Z",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd",
"\ufffd", "\ufffd", "\ufffd", "\ufffd", "\ufffd",
"\ufffd"] >
} |
Show » |
Sort Order:
|
I am a bit confused. Are you using the non-ASCII characters literally in the
grammar file? If so, that could be an issue because JavaCC does not use the
correct reader with the correct encoding. So I suggest you simply use the \uxxxx
notation for all the non-ASCII chars and see if the problem goes away.
So while this is a bug in general, there might be other issues in your grammar
as well.