xadisk
  1. xadisk
  2. XADISK-138

XASystemNoMoreAvailableException after interruption

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Works as designed
    • Affects Version/s: 1.2.1
    • Fix Version/s: None
    • Component/s: filesystem
    • Labels:
      None
    • Environment:

      Windows 7, JDK 7

      Description

      I got a XASystemNoMoreAvailableException thrown in NativeXAFileSystem.notifySystemFailure due to an Thread.interupt()-ion during completeReadOnlyTransaction() which caused a ClosedByInterruptException.
      I couldn't reproduce it (it occurred only once in a running test-system) even though I tried with the below test code. That code produces some other but less problematic issue.

      Note that I have an environment which relies heavily on interrupting cooperating Threads which run for too long or (partly) failed. Thus I cannot simply or easily avoid triggering interuption.

      The below test code produces an issue during shutdown, but not during the transaction:

      System.out.println("Booting XADisk...");
      StandaloneFileSystemConfiguration configuration = new StandaloneFileSystemConfiguration(
      "target/xadisk-fail/system", "id-1");
      XAFileSystem xafs = XAFileSystemProxy.bootNativeXAFileSystem(configuration);
      xafs.waitForBootup(-1);
      System.out.println("XADisk is now available for use.");

      File testFile = File.createTempFile("xa-file", ".dat");
      testFile.deleteOnExit();
      long waitUntil = System.currentTimeMillis() + Times.MILLIS_PER_SECOND;
      while (System.currentTimeMillis() < waitUntil)

      { Session session = xafs.createSessionForLocalTransaction(); XAFileOutputStream xaos = session.createXAFileOutputStream(testFile, false); xaos.write("Hello World!".getBytes(Misc.UTF_8)); Thread.currentThread().interrupt(); xaos.write("Hello World2!".getBytes(Misc.UTF_8)); Thread.currentThread().interrupt(); xaos.close(); Thread.currentThread().interrupt(); session.commit(); }

      Thread.interrupted();

      xafs.shutdown();
      System.out.println("XADisk is down.");

        Activity

        Hide
        Nitin Verma added a comment -

        A similar issue has been reported at thread: https://groups.google.com/forum/?hl=en#!topic/xadisk/sqYH82vOhDw by Ravi, where undeploying the application on jboss 7.2 triggers an interrupt. In that case, the thread which gets interrupted is the hornetq resource adapter's worker thread during delivery of a message to an mdb (the mdb performs some operations on xadisk).

        Show
        Nitin Verma added a comment - A similar issue has been reported at thread: https://groups.google.com/forum/?hl=en#!topic/xadisk/sqYH82vOhDw by Ravi, where undeploying the application on jboss 7.2 triggers an interrupt. In that case, the thread which gets interrupted is the hornetq resource adapter's worker thread during delivery of a message to an mdb (the mdb performs some operations on xadisk).
        Hide
        Nitin Verma added a comment -

        Hi Gunnar,

        I ran your test-case and to confirm, I see the ClosedByInterruptException during session.commit(), and the xadisk instance becomes unavailable after that.

        A detailed summary...

        When an application invokes the xadisk operations (not the remote xadisk invocation case, in which the application thread just does the network transfers for remote method calls), most of the work gets done in the application thread (in addition, there are a few xadisk worker threads running, doing their job asynchronously). XADisk uses NIO FileChannels heavily, which are interruptible. When an application thread calls an xadisk api which does any such FileChannel operation, and receives an interrupt, the channel would get closed and an exception will result. This channel may be for the transaction logs or any application data file etc. At this stage, the channel operation is incomplete (eg, an incomplete unknown amount of data was written), there is nothing much xadisk can do to continue working, as per its current design.

        Altering the current implementation to cater to such interrupts (see interrupt, clean-up, and let only the current transaction go away leaving xadisk in-memory and persistent data in a consistent state) looks quite complex.

        This should be a generic issue with any library which makes use of the interruptible io library like NIO. I tried finding out some workaround...

        1. Disabling interrupt for a thread or for a FileChannel does not seem possible as per current Java. So, the application can run such application-threads by overriding the interrupt() method of the Thread class, where it can allow delaying the effective delivery of such interrupts. But note that this approach can interfere with some features of xadisk like deadlock handling, which uses thread interrupt internally (for transaction in wait state for lock). I am not sure how prevalent is this approach; just found it here and sharing: http://cs.oswego.edu/pipermail/concurrency-interest/2008-March/005064.html

        2. Using the approach in 1. would involve frequent calls for enabling/disabling interrupts. As an alternative, the application can do away with the interrupt approach altogether and simply use a application level flag (instead of Thread.interrupt()) to communicate interrupts. Again, the application code would be filled-up with checks for this flag.

        3. Use xadisk's remote method invocation. This would decouple the application-thread from the actual thread of the xadisk instance doing the job, but it would lead to slower performance due to socket based communication.

        Please feel free to comment.

        Thanks & Regards,
        Nitin

        Show
        Nitin Verma added a comment - Hi Gunnar, I ran your test-case and to confirm, I see the ClosedByInterruptException during session.commit(), and the xadisk instance becomes unavailable after that. A detailed summary... When an application invokes the xadisk operations (not the remote xadisk invocation case, in which the application thread just does the network transfers for remote method calls), most of the work gets done in the application thread (in addition, there are a few xadisk worker threads running, doing their job asynchronously). XADisk uses NIO FileChannels heavily, which are interruptible. When an application thread calls an xadisk api which does any such FileChannel operation, and receives an interrupt, the channel would get closed and an exception will result. This channel may be for the transaction logs or any application data file etc. At this stage, the channel operation is incomplete (eg, an incomplete unknown amount of data was written), there is nothing much xadisk can do to continue working, as per its current design. Altering the current implementation to cater to such interrupts (see interrupt, clean-up, and let only the current transaction go away leaving xadisk in-memory and persistent data in a consistent state) looks quite complex. This should be a generic issue with any library which makes use of the interruptible io library like NIO. I tried finding out some workaround... 1. Disabling interrupt for a thread or for a FileChannel does not seem possible as per current Java. So, the application can run such application-threads by overriding the interrupt() method of the Thread class, where it can allow delaying the effective delivery of such interrupts. But note that this approach can interfere with some features of xadisk like deadlock handling, which uses thread interrupt internally (for transaction in wait state for lock). I am not sure how prevalent is this approach; just found it here and sharing: http://cs.oswego.edu/pipermail/concurrency-interest/2008-March/005064.html 2. Using the approach in 1. would involve frequent calls for enabling/disabling interrupts. As an alternative, the application can do away with the interrupt approach altogether and simply use a application level flag (instead of Thread.interrupt()) to communicate interrupts. Again, the application code would be filled-up with checks for this flag. 3. Use xadisk's remote method invocation. This would decouple the application-thread from the actual thread of the xadisk instance doing the job, but it would lead to slower performance due to socket based communication. Please feel free to comment. Thanks & Regards, Nitin
        Hide
        gunnar_zarncke added a comment -

        What about a lightweight decoupling:

        4. Decouple the interruptible threads (my client logic) from the non-interruptible XA-threads.
        Use one Thread per XAFile(Input|Output)Stream (or a ThreadPool) to do the non-interruptible reading/wrinting and pass the data to/from the client (Input|Output)Stream via a java.util.concurrent.Exchanger.
        Then if the clients thread gets interrupted at worst the Exchanger call gets interrupted. The XA thread will of course fail to sync with the exchanger but this can be handled by just waiting (for a later commit/rollback).
        Advantages: less overhead than full remoting; fully compatible with Java interruption (provided the XA threads don't get interrupted; this can be achieved by putting them into a different ThreadGroup).
        This can be implemented as an alternative to XAFile(Input|Output)StreamWrapper.

        Show
        gunnar_zarncke added a comment - What about a lightweight decoupling: 4. Decouple the interruptible threads (my client logic) from the non-interruptible XA-threads. Use one Thread per XAFile(Input|Output)Stream (or a ThreadPool) to do the non-interruptible reading/wrinting and pass the data to/from the client (Input|Output)Stream via a java.util.concurrent.Exchanger. Then if the clients thread gets interrupted at worst the Exchanger call gets interrupted. The XA thread will of course fail to sync with the exchanger but this can be handled by just waiting (for a later commit/rollback). Advantages: less overhead than full remoting; fully compatible with Java interruption (provided the XA threads don't get interrupted; this can be achieved by putting them into a different ThreadGroup). This can be implemented as an alternative to XAFile(Input|Output)StreamWrapper.
        Hide
        Nitin Verma added a comment -

        Hoping that this issue can be worked around by developers and realizing the complexity of a fix in xadisk for this, I am closing this issue for now.

        Show
        Nitin Verma added a comment - Hoping that this issue can be worked around by developers and realizing the complexity of a fix in xadisk for this, I am closing this issue for now.
        Hide
        gunnar_zarncke added a comment -

        An implementation (beta) of approach 4 exists and was sent to Nitin for review.

        Show
        gunnar_zarncke added a comment - An implementation (beta) of approach 4 exists and was sent to Nitin for review.
        Hide
        misisko added a comment -

        Hi @gunnar_zarncke, we have a big problem with this issue - what's about your patch? How did you solved this problem? @Nitin, do developers work on fix for this problem? I think, this problem is verry annoying.

        Show
        misisko added a comment - Hi @gunnar_zarncke, we have a big problem with this issue - what's about your patch? How did you solved this problem? @Nitin, do developers work on fix for this problem? I think, this problem is verry annoying.
        Hide
        gunnar_zarncke added a comment -

        The mentioned fix is productive since about a year and fairly stable. In the context of that application I do experience some strange thread and/or interrupt related issues (threads-pools not shutting down in time) but I'm fairly certain that it's not due to my patch and the XADisk transactions run fast and stable. So I'd call my fix (basically a thread-decoupling layer around the XADisk part stable and reliable. It can be used without patching XADisk but I didn't publish it because I expected Nitin to provide it as is in some sub packge. I can send it to you directly if you want, just mail me at gunnar dot zarncke at gmx dot de.

        Show
        gunnar_zarncke added a comment - The mentioned fix is productive since about a year and fairly stable. In the context of that application I do experience some strange thread and/or interrupt related issues (threads-pools not shutting down in time) but I'm fairly certain that it's not due to my patch and the XADisk transactions run fast and stable. So I'd call my fix (basically a thread-decoupling layer around the XADisk part stable and reliable. It can be used without patching XADisk but I didn't publish it because I expected Nitin to provide it as is in some sub packge. I can send it to you directly if you want, just mail me at gunnar dot zarncke at gmx dot de.
        Hide
        Nitin Verma added a comment -

        Hi Misisko,

        Code modification is done only by me; I have not planned any fix for this as of now.

        It just checked that I had missed reviewing the fix proposed by Gunnar. You may get it from Gunnar and see if it solves the problem.

        Thanks,
        Nitin

        Show
        Nitin Verma added a comment - Hi Misisko, Code modification is done only by me; I have not planned any fix for this as of now. It just checked that I had missed reviewing the fix proposed by Gunnar. You may get it from Gunnar and see if it solves the problem. Thanks, Nitin
        Hide
        misisko added a comment -

        Looking forward. Thanks for information.

        Show
        misisko added a comment - Looking forward. Thanks for information.

          People

          • Assignee:
            Nitin Verma
            Reporter:
            gunnar_zarncke
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: