Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: current
    • Fix Version/s: milestone 1
    • Component/s: www
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      13

      Description

      The current implementation is ignoring the actual stream's charset, but uses
      the local host's default charset (e. g. on Windows in Western Europe this is
      Cp1252) instead. This can lead to very nasty problems in case the output stream
      is containing special characters like Umlauts etc., as the XSL transformer in
      use on the end customer's platform (like SAXON instead of XALAN etc.) will
      presumably rely on correctly encoded characters.

      The core problem is that the byte stream gets lated into UTF-8 characters using
      an unspecified code page, leaving it open to the end user's platform to decide
      per request which code page to use. Unfortunately it does not decide on the
      actual content's XML encoding declaration, but instead uses the platform's
      setting.

      A correct solution would be to inspect the actual content's XML encoding
      declaration, or not to translate into UTF-8 characters at all (it is anyways
      doubtful whether this translation is needed at all).

        Activity

        Hide
        mkarg added a comment -

        Markus is working on this one already.

        Show
        mkarg added a comment - Markus is working on this one already.
        Hide
        mkarg added a comment -

        Created an attachment (id=5)
        Proposed solution: Do not translate into Java String (UTF-8 chars)

        Show
        mkarg added a comment - Created an attachment (id=5) Proposed solution: Do not translate into Java String (UTF-8 chars)
        Hide
        mkarg added a comment -

        Added propsed solution: Not translating the byte stream into a Java String (UTF-
        8 chars), but instead directly passing the untouched byte array to the
        transformer.

        Not only this correctly solves the problem, but it should have a theoretical
        performance benefit due to less unnecessary RAM consumption (UTF is in any case
        larger than the original byte array) and CPU cycles (no code page lookups).
        Actually it solves the problem and the result felt like being a bit faster
        (needs about half a second to refresh Vista' FileExplorer containing one
        thousand rows, using webdav-addressbook), but this was not really measurable in
        an objective way (mostly due to Java's unsteady performance behaviour, a
        typical problem of all micro benchmarks).

        Show
        mkarg added a comment - Added propsed solution: Not translating the byte stream into a Java String (UTF- 8 chars), but instead directly passing the untouched byte array to the transformer. Not only this correctly solves the problem, but it should have a theoretical performance benefit due to less unnecessary RAM consumption (UTF is in any case larger than the original byte array) and CPU cycles (no code page lookups). Actually it solves the problem and the result felt like being a bit faster (needs about half a second to refresh Vista' FileExplorer containing one thousand rows, using webdav-addressbook), but this was not really measurable in an objective way (mostly due to Java's unsteady performance behaviour, a typical problem of all micro benchmarks).

          People

          • Assignee:
            webdav-interop-issues
            Reporter:
            mkarg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: