Details

    • Issuezilla Id:
      86

      Description

      If I get this feed [see URL] which uses ISO-8859-1 encoding with the
      FeedFetcher, the contents are not converted to normal Java UTF-16 strings, this
      needs to be done manually!

      FeedFetcher fetcher = new HttpURLFeedFetcher(cache);
      SyndFeed feed = fetcher.retrieveFeed(url);
      ...
      String desc = entry.getDescription().getValue(); // desc is not UTF-16!
      desc = new String(desc.getBytes(Charset.forName("ISO-8859-1"))); // now OK

        Activity

        Hide
        snoopdave added a comment -

        Indicating that this is (probably) a Fetcher bug.

        Show
        snoopdave added a comment - Indicating that this is (probably) a Fetcher bug.
        Hide
        nicklothian added a comment -

        Not a fetcher bug. Here is some code to replicate it (run against
        http://www.expasy.org/spotlight/atom.xml):

        public static void main(String[] args) {
        boolean ok = false;
        if (args.length==1) {
        try {
        URL feedUrl = new URL(args[0]);
        SyndFeedInput input = new SyndFeedInput();

        SyndFeed feed = input.build(new XmlReader(feedUrl.openStream(),
        "text/html; charset=ISO-8859-1", true));

        List entries = feed.getEntries();
        for (Iterator iterator = entries.iterator(); iterator.hasNext()

        { SyndEntry entry = (SyndEntry) iterator.next(); String desc = entry.getDescription().getValue(); String encodedDesc = new String(desc.getBytes(Charset.forName("ISO-8859-1"))); System.out.println(desc); System.out.println(encodedDesc); }

        ok = true;
        }
        catch (Exception ex)

        { ex.printStackTrace(); System.out.println("ERROR: "+ex.getMessage()); }

        }

        if (!ok)

        { System.out.println(); System.out.println("FeedReader reads and prints any RSS/Atom feed type."); System.out.println("The first parameter must be the URL of the feed to read."); System.out.println(); }

        }

        XmlReader seems to correctly detect that it is ISO-8859-1, but
        entry.getDescription().getValue() and
        desc.getBytes(Charset.forName("ISO-8859-1")) get different things (I'm not
        entirely sure what should happen here..)

        Show
        nicklothian added a comment - Not a fetcher bug. Here is some code to replicate it (run against http://www.expasy.org/spotlight/atom.xml): public static void main(String[] args) { boolean ok = false; if (args.length==1) { try { URL feedUrl = new URL(args [0] ); SyndFeedInput input = new SyndFeedInput(); SyndFeed feed = input.build(new XmlReader(feedUrl.openStream(), "text/html; charset=ISO-8859-1", true)); List entries = feed.getEntries(); for (Iterator iterator = entries.iterator(); iterator.hasNext() { SyndEntry entry = (SyndEntry) iterator.next(); String desc = entry.getDescription().getValue(); String encodedDesc = new String(desc.getBytes(Charset.forName("ISO-8859-1"))); System.out.println(desc); System.out.println(encodedDesc); } ok = true; } catch (Exception ex) { ex.printStackTrace(); System.out.println("ERROR: "+ex.getMessage()); } } if (!ok) { System.out.println(); System.out.println("FeedReader reads and prints any RSS/Atom feed type."); System.out.println("The first parameter must be the URL of the feed to read."); System.out.println(); } } XmlReader seems to correctly detect that it is ISO-8859-1, but entry.getDescription().getValue() and desc.getBytes(Charset.forName("ISO-8859-1")) get different things (I'm not entirely sure what should happen here..)
        Hide
        nicklothian added a comment -

        Thanks to Martin Kurz for taking a look at this.

        It appears that the RSS feed referenced isn't ISO-8859-1 encoded - it's probably WINDOWS-1252.

        ROME can't detect or correct that kind of problem - it needs to be fixed on the
        server side.

        Show
        nicklothian added a comment - Thanks to Martin Kurz for taking a look at this. It appears that the RSS feed referenced isn't ISO-8859-1 encoded - it's probably WINDOWS-1252. ROME can't detect or correct that kind of problem - it needs to be fixed on the server side.

          People

          • Assignee:
            rome-issues
            Reporter:
            ejain
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: