jaxp
  1. jaxp
  2. JAXP-68

Infinite do-while loop in XMLDocumentScannerImpl$PrologDriver.next

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: current
    • Fix Version/s: milestone 1
    • Component/s: www
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Issuezilla Id:
      68

      Description

      It appears that the do-while loop in XMLDocumentScannerImpl$PrologDriver.next
      causes an infinite loop since there are if-else conditions that do not change
      the "scanner state". Similar code in XercesJ-2
      XMLDocumentScannerImpl$PrologDispather.dispatch appears to have corrected the
      problem.

      Partial thread dump from JDK 1.6.0_15 (although current source code in JAXP
      1.4.4 does not appear to have changed):
      "validateClockAction-9197" daemon prio=10 tid=0x00002aab84013800 nid=0x780f
      runnable [0x00000000414a6000]
      java.lang.Thread.State: RUNNABLE
      at
      com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next
      (XMLDocumentScannerImpl.java:931)
      at
      com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next
      (XMLDocumentScannerImpl.java:648)
      at
      com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next
      (XMLNSDocumentScannerImpl.java:140)
      at
      com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next
      (XMLStreamReaderImpl.java:548)
      at org.codehaus.xfire.soap.handler.ReadHeadersHandler.invoke
      (ReadHeadersHandler.java:44)

      Source excerpt from JAXP:
      do {
      switch (fScannerState) {
      case SCANNER_STATE_PROLOG: {
      fEntityScanner.skipSpaces();
      if (fEntityScanner.skipChar('<'))

      { setScannerState(SCANNER_STATE_START_OF_MARKUP); }

      else if (fEntityScanner.skipChar('&'))

      { setScannerState(SCANNER_STATE_REFERENCE); }

      else

      { setScannerState(SCANNER_STATE_CONTENT); }

      break;
      }

      case SCANNER_STATE_START_OF_MARKUP: {
      fMarkupDepth++;

      if (fEntityScanner.skipChar('?'))

      { setScannerState(SCANNER_STATE_PI); }

      else if (fEntityScanner.skipChar('!')) {
      if (fEntityScanner.skipChar('-')) {
      if (!fEntityScanner.skipChar('-'))

      { reportFatalError("InvalidCommentStart", null); }

      setScannerState(SCANNER_STATE_COMMENT);
      } else if (fEntityScanner.skipString(DOCTYPE)) {
      setScannerState(SCANNER_STATE_DOCTYPE);
      Entity entity =
      fEntityScanner.getCurrentEntity();
      if(entity instanceof Entity.ScannedEntity)

      { fStartPos=((Entity.ScannedEntity) entity).position; }

      fReadingDTD=true;
      if(fDTDDecl == null)
      fDTDDecl = new XMLStringBuffer();
      fDTDDecl.append("<!DOCTYPE");

      } else

      { reportFatalError ("MarkupNotRecognizedInProlog", null); >>>>>no change in fScannerState }

      } else if (XMLChar.isNameStart
      (fEntityScanner.peekChar()))

      { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDriver(fContentDriver); //from now onwards this would be handled by fContentDriver,in the same next() call return fContentDriver.next(); }

      else

      { reportFatalError("MarkupNotRecognizedInProlog", null); >>>>>no change in fScannerState }

      break;
      }
      }
      } while (fScannerState == SCANNER_STATE_PROLOG || fScannerState
      == SCANNER_STATE_START_OF_MARKUP );

        Activity

        Hide
        Joe Wang added a comment -

        Can you attach the testcase along with the xml? Thanks.

        Show
        Joe Wang added a comment - Can you attach the testcase along with the xml? Thanks.
        Hide
        Joe Wang added a comment -

        Also, justify P1 priority, could you explain what programs have been blocked?
        Thanks.

        Show
        Joe Wang added a comment - Also, justify P1 priority, could you explain what programs have been blocked? Thanks.
        Hide
        stevehale added a comment -

        My apologies for the "P1" status, the help on Priority just says "Most
        Important" and an infinite loop is pretty important. However since it
        doesn't "break" in that sense, I have moved it to "P2" priority.

        Unfortunately I have been unsuccessful at capturing one of the SOAP messages
        that causes the infinite loop (mainly because they are in an infinite loop), I
        only have a thread dump that shows 12 threads spinning at the same line of
        code, but I'm still working on it.

        Show
        stevehale added a comment - My apologies for the "P1" status, the help on Priority just says "Most Important" and an infinite loop is pretty important. However since it doesn't "break" in that sense, I have moved it to "P2" priority. Unfortunately I have been unsuccessful at capturing one of the SOAP messages that causes the infinite loop (mainly because they are in an infinite loop), I only have a thread dump that shows 12 threads spinning at the same line of code, but I'm still working on it.
        Hide
        Joe Wang added a comment -

        No problem at all. I was just trying to clarify/understand the issue since
        internally, similar to Apache's "blocker", we treat P1 as must-fix before any
        release can go out. We also call it a stopper. So if it's P1, I would need to
        explain to approvers why it's a must-fix or what type of applications have been
        blocked. Thx.

        Show
        Joe Wang added a comment - No problem at all. I was just trying to clarify/understand the issue since internally, similar to Apache's "blocker", we treat P1 as must-fix before any release can go out. We also call it a stopper. So if it's P1, I would need to explain to approvers why it's a must-fix or what type of applications have been blocked. Thx.
        Hide
        Joe Wang added a comment -

        Hi Steve,

        You indicated that similar code in XercesJ-2
        XMLDocumentScannerImpl$PrologDispather.dispatch appears to have corrected the
        problem, that is, set scanner state in the two place where errors are reported. Looking at the current Xerces code, it seems to me there's no setting scanner state when reporting errors, copied from the current version:
        else

        { reportFatalError("MarkupNotRecognizedInProlog", null); }

        Were you looking at sth. different from this: http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLDocumentScannerImpl.java?revision=572055&view=co?

        Show
        Joe Wang added a comment - Hi Steve, You indicated that similar code in XercesJ-2 XMLDocumentScannerImpl$PrologDispather.dispatch appears to have corrected the problem, that is, set scanner state in the two place where errors are reported. Looking at the current Xerces code, it seems to me there's no setting scanner state when reporting errors, copied from the current version: else { reportFatalError("MarkupNotRecognizedInProlog", null); } Were you looking at sth. different from this: http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLDocumentScannerImpl.java?revision=572055&view=co?
        Hide
        stevehale added a comment -

        Hey Joe, thanks so much for looking into this!

        While it is true that the newer "dispatch" code also doesn't set the scanner state on errors, the new code does leave the new "again" boolean variable set to false which breaks it out of the do-while loop if the input argument "complete" is also false. Here's an example at the top of the do loop:
        do {
        again = false;
        switch (fScannerState) {
        case SCANNER_STATE_PROLOG: {
        fEntityScanner.skipSpaces();
        if (fEntityScanner.skipChar('<'))

        { setScannerState(SCANNER_STATE_START_OF_MARKUP); again = true; }

        else if (fEntityScanner.skipChar('&'))

        { setScannerState(SCANNER_STATE_REFERENCE); again = true; }

        else

        { setScannerState(SCANNER_STATE_CONTENT); again = true; }

        break;
        }
        case SCANNER_STATE_START_OF_MARKUP: {
        fMarkupDepth++;
        if (fEntityScanner.skipChar('!')) {
        if (fEntityScanner.skipChar('-')) {
        if (!fEntityScanner.skipChar('-'))

        { reportFatalError("InvalidCommentStart", null); }

        setScannerState(SCANNER_STATE_COMMENT);
        again = true;
        }
        else if (fEntityScanner.skipString("DOCTYPE"))

        { setScannerState(SCANNER_STATE_DOCTYPE); again = true; }

        else

        { reportFatalError("MarkupNotRecognizedInProlog", null); >>>>>"again" is left false right here }

        }
        else if (isValidNameStartChar(fEntityScanner.peekChar()))

        { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDispatcher(fContentDispatcher); return true; }
        else if (fEntityScanner.skipChar('?')) { setScannerState(SCANNER_STATE_PI); again = true; }
        else if (isValidNameStartHighSurrogate(fEntityScanner.peekChar())) { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDispatcher(fContentDispatcher); return true; }

        else

        { reportFatalError("MarkupNotRecognizedInProlog", null); >>>>>"again" is left false right here }

        break;
        }

        ...

        } while (complete || again);

        The do-while loop in the older JAXP code only checks the scanner state before looping back, but since the scanner state isn't changed then it looks like it might loop forever. As I said, I do not have a reproducible XML example so I cannot be sure this is where the infinite loop is occurring but the thread dump shows line 931 which is the while line in the old "next" method:

        do {
        switch (fScannerState) {
        case SCANNER_STATE_PROLOG: {
        fEntityScanner.skipSpaces();
        if (fEntityScanner.skipChar('<'))

        { setScannerState(SCANNER_STATE_START_OF_MARKUP); }

        else if (fEntityScanner.skipChar('&'))

        { setScannerState(SCANNER_STATE_REFERENCE); }

        else

        { setScannerState(SCANNER_STATE_CONTENT); }

        break;
        }

        case SCANNER_STATE_START_OF_MARKUP: {
        fMarkupDepth++;

        if (fEntityScanner.skipChar('?'))

        { setScannerState(SCANNER_STATE_PI); }

        else if (fEntityScanner.skipChar('!')) {
        if (fEntityScanner.skipChar('-')) {
        if (!fEntityScanner.skipChar('-'))

        { reportFatalError("InvalidCommentStart", null); }

        setScannerState(SCANNER_STATE_COMMENT);
        } else if (fEntityScanner.skipString(DOCTYPE)) {
        setScannerState(SCANNER_STATE_DOCTYPE);
        Entity entity = fEntityScanner.getCurrentEntity();
        if(entity instanceof Entity.ScannedEntity)

        { fStartPos=((Entity.ScannedEntity)entity).position; }

        fReadingDTD=true;
        if(fDTDDecl == null)
        fDTDDecl = new XMLStringBuffer();
        fDTDDecl.append("<!DOCTYPE");

        } else

        { reportFatalError("MarkupNotRecognizedInProlog", null); }

        } else if (XMLChar.isNameStart(fEntityScanner.peekChar()))

        { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDriver(fContentDriver); //from now onwards this would be handled by fContentDriver,in the same next() call return fContentDriver.next(); }

        else

        { reportFatalError("MarkupNotRecognizedInProlog", null); }

        break;
        }
        }
        } while (fScannerState == SCANNER_STATE_PROLOG || fScannerState == SCANNER_STATE_START_OF_MARKUP );
        >>>>>>while above is line 931, where the infinite looping thread is at according to the thread dump

        Show
        stevehale added a comment - Hey Joe, thanks so much for looking into this! While it is true that the newer "dispatch" code also doesn't set the scanner state on errors, the new code does leave the new "again" boolean variable set to false which breaks it out of the do-while loop if the input argument "complete" is also false. Here's an example at the top of the do loop: do { again = false; switch (fScannerState) { case SCANNER_STATE_PROLOG: { fEntityScanner.skipSpaces(); if (fEntityScanner.skipChar('<')) { setScannerState(SCANNER_STATE_START_OF_MARKUP); again = true; } else if (fEntityScanner.skipChar('&')) { setScannerState(SCANNER_STATE_REFERENCE); again = true; } else { setScannerState(SCANNER_STATE_CONTENT); again = true; } break; } case SCANNER_STATE_START_OF_MARKUP: { fMarkupDepth++; if (fEntityScanner.skipChar('!')) { if (fEntityScanner.skipChar('-')) { if (!fEntityScanner.skipChar('-')) { reportFatalError("InvalidCommentStart", null); } setScannerState(SCANNER_STATE_COMMENT); again = true; } else if (fEntityScanner.skipString("DOCTYPE")) { setScannerState(SCANNER_STATE_DOCTYPE); again = true; } else { reportFatalError("MarkupNotRecognizedInProlog", null); >>>>>"again" is left false right here } } else if (isValidNameStartChar(fEntityScanner.peekChar())) { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDispatcher(fContentDispatcher); return true; } else if (fEntityScanner.skipChar('?')) { setScannerState(SCANNER_STATE_PI); again = true; } else if (isValidNameStartHighSurrogate(fEntityScanner.peekChar())) { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDispatcher(fContentDispatcher); return true; } else { reportFatalError("MarkupNotRecognizedInProlog", null); >>>>>"again" is left false right here } break; } ... } while (complete || again); The do-while loop in the older JAXP code only checks the scanner state before looping back, but since the scanner state isn't changed then it looks like it might loop forever. As I said, I do not have a reproducible XML example so I cannot be sure this is where the infinite loop is occurring but the thread dump shows line 931 which is the while line in the old "next" method: do { switch (fScannerState) { case SCANNER_STATE_PROLOG: { fEntityScanner.skipSpaces(); if (fEntityScanner.skipChar('<')) { setScannerState(SCANNER_STATE_START_OF_MARKUP); } else if (fEntityScanner.skipChar('&')) { setScannerState(SCANNER_STATE_REFERENCE); } else { setScannerState(SCANNER_STATE_CONTENT); } break; } case SCANNER_STATE_START_OF_MARKUP: { fMarkupDepth++; if (fEntityScanner.skipChar('?')) { setScannerState(SCANNER_STATE_PI); } else if (fEntityScanner.skipChar('!')) { if (fEntityScanner.skipChar('-')) { if (!fEntityScanner.skipChar('-')) { reportFatalError("InvalidCommentStart", null); } setScannerState(SCANNER_STATE_COMMENT); } else if (fEntityScanner.skipString(DOCTYPE)) { setScannerState(SCANNER_STATE_DOCTYPE); Entity entity = fEntityScanner.getCurrentEntity(); if(entity instanceof Entity.ScannedEntity) { fStartPos=((Entity.ScannedEntity)entity).position; } fReadingDTD=true; if(fDTDDecl == null) fDTDDecl = new XMLStringBuffer(); fDTDDecl.append("<!DOCTYPE"); } else { reportFatalError("MarkupNotRecognizedInProlog", null); } } else if (XMLChar.isNameStart(fEntityScanner.peekChar())) { setScannerState(SCANNER_STATE_ROOT_ELEMENT); setDriver(fContentDriver); //from now onwards this would be handled by fContentDriver,in the same next() call return fContentDriver.next(); } else { reportFatalError("MarkupNotRecognizedInProlog", null); } break; } } } while (fScannerState == SCANNER_STATE_PROLOG || fScannerState == SCANNER_STATE_START_OF_MARKUP ); >>>>>>while above is line 931, where the infinite looping thread is at according to the thread dump
        Hide
        Joe Wang added a comment -

        I see. So what did you get when you run your test, Out Of Memory Errors or StackOverflowError? It would be nice if you could reproduce it and attach an xml instance file (without sensitive information). The reason I ask this is that the dead loop you've seen would not happen after reporting fatal error. You're using StreamReader, the StaxErrorReporter would throw exception on fatal errors.

        Show
        Joe Wang added a comment - I see. So what did you get when you run your test, Out Of Memory Errors or StackOverflowError? It would be nice if you could reproduce it and attach an xml instance file (without sensitive information). The reason I ask this is that the dead loop you've seen would not happen after reporting fatal error. You're using StreamReader, the StaxErrorReporter would throw exception on fatal errors.
        Hide
        stevehale added a comment - - edited

        I apologize for not being able to capture an example XML that would consistently reproduce an infinite loop. Unfortunately we have only seen this in a production environment with heavy load, and only very rarely (7 occurrences out of tens of thousands). It did not get an Out of Memory, StackOverflow, nor any other error or exception; the only sign of trouble is max'd CPU because a few threads go into a very tight infinite loop at line 931 (while condition). Because it is so rare, it may be a multi-threading synchronization issue which might not present itself with example XML.

        If it is true that reportFatalError always throws an exception, then I agree that the problem isn't at the reportFatalError lines. So at this point I am at a loss. For the while loop to continue, fScannerState must be one of the two cases in the switch; yet I don't see how it could progress through the switch cases without changing the scanner state or reporting a fatal error.

        I should have included the following line from the thread dump to confirm the java version, maybe we're looking at the wrong source code.
        Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.1-b02 mixed mode):

        Our workaround: The JDK's StAX implementation in JAXP uses the JDK's SAX implementation classes directly which seems to have the problem. So although we upgraded Xerces to 2.10.0, we still had the problem. Finally we added the Woodstox StAX implementation jars to our WEB-INF/lib (already had Xerces 2.10.0 jars in there), and there have been no further issues. However we would prefer to use the StAX implementation provided by the JDK to remove the dependency on Woodstox.

        Show
        stevehale added a comment - - edited I apologize for not being able to capture an example XML that would consistently reproduce an infinite loop. Unfortunately we have only seen this in a production environment with heavy load, and only very rarely (7 occurrences out of tens of thousands). It did not get an Out of Memory, StackOverflow, nor any other error or exception; the only sign of trouble is max'd CPU because a few threads go into a very tight infinite loop at line 931 (while condition). Because it is so rare, it may be a multi-threading synchronization issue which might not present itself with example XML. If it is true that reportFatalError always throws an exception, then I agree that the problem isn't at the reportFatalError lines. So at this point I am at a loss. For the while loop to continue, fScannerState must be one of the two cases in the switch; yet I don't see how it could progress through the switch cases without changing the scanner state or reporting a fatal error. I should have included the following line from the thread dump to confirm the java version, maybe we're looking at the wrong source code. Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.1-b02 mixed mode): Our workaround: The JDK's StAX implementation in JAXP uses the JDK's SAX implementation classes directly which seems to have the problem. So although we upgraded Xerces to 2.10.0, we still had the problem. Finally we added the Woodstox StAX implementation jars to our WEB-INF/lib (already had Xerces 2.10.0 jars in there), and there have been no further issues. However we would prefer to use the StAX implementation provided by the JDK to remove the dependency on Woodstox.
        Hide
        Joe Wang added a comment - - edited

        I looked around the code and created a test, not that I could reproduce the situation you encountered under heavy load. First, it's a simple test to show that an exception would be thrown if there's a fatal error. It can be done easily by adding a prolog like this:
        <!wrong document SYSTEM "ExternalDTD.dtd" [
        <!ENTITY max "Substituted text">
        ]>

        I found there is a possibility to continue the loop after fatal error, if continue-after-fatal-error, a Xerces feature, is somehow being true. Since you're using StAX, I would think it's not possible since JDK's StAX does not support such a feature. For example, if I try to set it on input factory, I would get an error: java.lang.IllegalArgumentException: Property http://apache.org/xml/features/continue-after-fatal-error is not supported

        Is it possible it's being set by another thread? I don't see that possibility either since the factory has a private instance of property manager.

        I can create a jaxp jar file removing the "continue-after-fatal-error" condition from StAX for you to test. Would you want me to send it to your java.net email?

        Show
        Joe Wang added a comment - - edited I looked around the code and created a test, not that I could reproduce the situation you encountered under heavy load. First, it's a simple test to show that an exception would be thrown if there's a fatal error. It can be done easily by adding a prolog like this: <!wrong document SYSTEM "ExternalDTD.dtd" [ <!ENTITY max "Substituted text"> ]> I found there is a possibility to continue the loop after fatal error, if continue-after-fatal-error, a Xerces feature, is somehow being true. Since you're using StAX, I would think it's not possible since JDK's StAX does not support such a feature. For example, if I try to set it on input factory, I would get an error: java.lang.IllegalArgumentException: Property http://apache.org/xml/features/continue-after-fatal-error is not supported Is it possible it's being set by another thread? I don't see that possibility either since the factory has a private instance of property manager. I can create a jaxp jar file removing the "continue-after-fatal-error" condition from StAX for you to test. Would you want me to send it to your java.net email?
        Hide
        stevehale added a comment -

        Unfortunately I have not been able to capture a test case, and our client is now using the Woodstox StAX implementation to avoid this issue. I could not use your modified jaxp.jar even if you sent it to me. However I do not believe it is an issue with "continue-after-fatal-error", but I don't have much else to go on.

        The only thing I have read recently is some mention of a possible infinite loop caused by a premature EOF since SJSXP avoids throwing exceptions after receiving an EOF (at least according to what I read). I have not persued this possibility in any way, it was just something I read, so thought it might help.

        Show
        stevehale added a comment - Unfortunately I have not been able to capture a test case, and our client is now using the Woodstox StAX implementation to avoid this issue. I could not use your modified jaxp.jar even if you sent it to me. However I do not believe it is an issue with "continue-after-fatal-error", but I don't have much else to go on. The only thing I have read recently is some mention of a possible infinite loop caused by a premature EOF since SJSXP avoids throwing exceptions after receiving an EOF (at least according to what I read). I have not persued this possibility in any way, it was just something I read, so thought it might help.
        Hide
        kln550 added a comment -

        During load tests,we are still seeing infinite loop issue with JDK 1.6 update 26 stax implementation.Any resolutions for this issue.

        Show
        kln550 added a comment - During load tests,we are still seeing infinite loop issue with JDK 1.6 update 26 stax implementation.Any resolutions for this issue.
        Hide
        stevehale added a comment -

        kln550, I could never capture a test case in DEV environment so it makes sense that this only happens under load. Without a way to reproduce it, no one would look into it. Our workaround was to use Woodstox StAX imnplementation instead of that provided by the JDK within JAXP (see prior entries), although that would have been preferred. Maybe it will be researched if you can provide more details on how to reproduce it.

        Show
        stevehale added a comment - kln550, I could never capture a test case in DEV environment so it makes sense that this only happens under load. Without a way to reproduce it, no one would look into it. Our workaround was to use Woodstox StAX imnplementation instead of that provided by the JDK within JAXP (see prior entries), although that would have been preferred. Maybe it will be researched if you can provide more details on how to reproduce it.
        Hide
        tbutcher added a comment -

        We are seeing very similar symptoms with on a high-load live website. We are using Xerces to 2.10.0. The thread dump is similar but we are not using stax.

        The problem seems to occur only under productions conditions and we have not been able to reproduce it in a test environment. We need to restart appservers occasionally when the number of looping threads is large enough to cause a significant CPU load.

        We are not using 'continue-after-fatal-error'.

        Show
        tbutcher added a comment - We are seeing very similar symptoms with on a high-load live website. We are using Xerces to 2.10.0. The thread dump is similar but we are not using stax. The problem seems to occur only under productions conditions and we have not been able to reproduce it in a test environment. We need to restart appservers occasionally when the number of looping threads is large enough to cause a significant CPU load. We are not using 'continue-after-fatal-error'.
        Hide
        Joe Wang added a comment -

        Thanks for the information.

        Are you using Xerces 2.10 only, or have you tried with/without the Xerces jar? If it's Xerces 2.10 only, have you reported the issue to Apache?

        Thanks.

        Show
        Joe Wang added a comment - Thanks for the information. Are you using Xerces 2.10 only, or have you tried with/without the Xerces jar? If it's Xerces 2.10 only, have you reported the issue to Apache? Thanks.

          People

          • Assignee:
            jaxp-issues
            Reporter:
            stevehale
          • Votes:
            4 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: