scripting
  1. scripting
  2. SCRIPTING-39

JRuby: wrong charset encoding used for writer

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: milestone 1
    • Component/s: www
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      39

      Description

      Test program:

      public static void main(String[] args) throws Exception

      { ScriptEngineManager manager = new ScriptEngineManager(); ScriptEngine engine = manager.getEngineByExtension("rb"); String testString = "\u30C6\u30B9\u30C8"; System.out.println("FYI, file.encoding = " + System.getProperty("file.encoding")); // Works: System.out.print("Evaluating string and then writing to System.out: "); System.out.println(engine.eval("'" + testString + "'")); // Doesn't work: System.out.print("Evaluating puts of string: "); engine.eval("puts '" + testString + "'"); // Doesn't work: ScriptContext context = new SimpleScriptContext(); StringWriter stringWriter = new StringWriter(); context.setWriter(stringWriter); engine.eval("puts '" + testString + "'", context); System.out.print("Evaluating puts of string to StringWriter and then writing to System.out: "); System.out.print(stringWriter.toString()); }

      All three of these should output the same string but in practice only the first
      one does.

      I ran a debugger and managed to confirm that WriterOutputStream is using MS1252
      encoding. On my system, sun.jnu.encoding is Cp1252, but file.encoding is UTF-8.
      The jumbled text looks like UTF-8, i.e. JRuby has chosen to use UTF-8 as
      specified by file.encoding, so it is possible that the script engine's
      getEncoding() is doing the logic the wrong way around and should be swapped.

      Workaround might be defining both to be UTF-8 but I'm not sure what impact that
      will have or whether it is safe to toy with the sun.jnu.encoding property.

        Activity

        Hide
        trejkaz added a comment -

        Actually on further testing, regardless of the value of both of these encodings,
        JRuby always outputs UTF-8 to the writer.

        So the fix would appear to be to disregard both of those system properties
        entirely and hard-code it to UTF-8.

        Show
        trejkaz added a comment - Actually on further testing, regardless of the value of both of these encodings, JRuby always outputs UTF-8 to the writer. So the fix would appear to be to disregard both of those system properties entirely and hard-code it to UTF-8.
        Hide
        trejkaz added a comment -

        Nope, I was wrong. It is using file.encoding after all, it's just that changing
        file.encoding at the top of main does nothing. Passing in some value via the
        command-line does take effect and JRuby uses that. So it seems the fix is
        really to make the script engine follow that logic. Then the test passes as
        long as the file encoding is actually capable of encoding the text.

        Show
        trejkaz added a comment - Nope, I was wrong. It is using file.encoding after all, it's just that changing file.encoding at the top of main does nothing. Passing in some value via the command-line does take effect and JRuby uses that. So it seems the fix is really to make the script engine follow that logic. Then the test passes as long as the file encoding is actually capable of encoding the text.
        Hide
        yokolet added a comment -

        Hi,

        I don't understand what value you passed from command line and exactly what you
        did. Did you try it by JRuby, not JRuby engine?

        -Yoko

        Show
        yokolet added a comment - Hi, I don't understand what value you passed from command line and exactly what you did. Did you try it by JRuby, not JRuby engine? -Yoko
        Hide
        yokolet added a comment -

        I updated the code in CVS repo so that Writer gets its encoding from
        file.encoding System property. Does this fix the problem?

        Show
        yokolet added a comment - I updated the code in CVS repo so that Writer gets its encoding from file.encoding System property. Does this fix the problem?
        Hide
        yokolet added a comment -

        Although I haven't had any reponse about the fixes, I'm going to set fixed
        status to this issue. In light of what the reporter said, this fix is thought to
        work well.

        Show
        yokolet added a comment - Although I haven't had any reponse about the fixes, I'm going to set fixed status to this issue. In light of what the reporter said, this fix is thought to work well.

          People

          • Assignee:
            scripting-issues
            Reporter:
            trejkaz
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: