xadisk
  1. xadisk
  2. XADISK-76

Moving a file with xADiskConnection.moveFile(...) gets stuck if java.io.File.rename() fails

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: current
    • Fix Version/s: 1.2.2
    • Component/s: filesystem
    • Labels:
      None
    • Environment:

      WinXP

      Description

      Scenario:
      An EJB tries to rename a file ("testfile") with XADisk (xADiskConnection.moveFile(...)), that has already been opened by someone else (in this case it's a simple new FileReader(new File("testfile")); inside a test).

      Expected:
      I would expect some notification about XADisk being unable to rename the file.

      Result:
      Instead the test "hangs" until a CORBA-timeout (after 30 minutes) triggers, interrupting the loop (see below) that is trying to rename the file.

      Suggestion:
      I tracked down the issue to the class org.xadisk.filesystem.utilities.FileIOUtility, where there is the following method:

      public static void renameTo(File src, File dest) throws IOException

      {...}

      My problem here is that if the rename fails (for the above reason) the loop in there won't end until some external interruption occurs. I would think that the method returns an IOException or similar in case the rename/delete goes wrong.

      So my suggestion would be to implement a counter and run through the loop only a certain number of times (depending on what the method "makeSpaceForGC(...)" is supposed to do...). If by eg the 5th time the file could not be deleted, delete the second file (created by the renameTo()-method and throw an exception.

        Activity

        Hide
        Nitin Verma added a comment -

        Hi Julius,

        Thanks for your detailed and clear explanation.

        XADisk maintains an assumption that there is no other software operating over the files/directories which are being modified through its APIs.
        This assumption was kept because if we are operating over a set of files in a transactional manner, we won't be having (in most scenarios) other applications operating over the same set of files in non-transactional manner.

        For your specific scenario, we can do any of these:

        a. write the other application also to use XADisk API (XAFileInputStream, you can wrap it with java's InputStreamReader) instead of using FileReader(). This way, when the first application goes to rename, it throws an exception (LockTimeOut) if the other one is reading it. Just make sure that the two applications invoke the same XADisk instance. If it is running on some other JVM, it can use remoting to invoke APIs on the XADisk instance.

        b. in the first application, which is using XADisk, write a check to see if the file is already locked or not (depending on the OS, a file read operation may not have locked the file), and proceed for rename only if the file is not locked. This solution need not work in cases when the other application comes and reads the file "anytime", because it may happen that the check for lock passes, and then the other application suddenly starts reading the file.

        c. if possible, modify the second application such that the FileReader is closed as soon as the reading job is done. This would let the rename continue and not wait for so long.

        Please feel free to write to me at [nitin_verma AT java.net] if you have any questions, suggestions or want to discuss more alternatives.

        Thanks,
        Nitin

        Show
        Nitin Verma added a comment - Hi Julius, Thanks for your detailed and clear explanation. XADisk maintains an assumption that there is no other software operating over the files/directories which are being modified through its APIs. This assumption was kept because if we are operating over a set of files in a transactional manner, we won't be having (in most scenarios) other applications operating over the same set of files in non-transactional manner. For your specific scenario, we can do any of these: a. write the other application also to use XADisk API (XAFileInputStream, you can wrap it with java's InputStreamReader) instead of using FileReader(). This way, when the first application goes to rename, it throws an exception (LockTimeOut) if the other one is reading it. Just make sure that the two applications invoke the same XADisk instance. If it is running on some other JVM, it can use remoting to invoke APIs on the XADisk instance. b. in the first application, which is using XADisk, write a check to see if the file is already locked or not (depending on the OS, a file read operation may not have locked the file), and proceed for rename only if the file is not locked. This solution need not work in cases when the other application comes and reads the file "anytime", because it may happen that the check for lock passes, and then the other application suddenly starts reading the file. c. if possible, modify the second application such that the FileReader is closed as soon as the reading job is done. This would let the rename continue and not wait for so long. Please feel free to write to me at [nitin_verma AT java.net] if you have any questions, suggestions or want to discuss more alternatives. Thanks, Nitin
        Hide
        juliusblank added a comment -

        Hi Nitin,

        time to reactivate this issue
        I agree with your explanation: other applications should not mess around with files that are under XADisks (transactional) management.
        As with databases or other data sources it should be ensured by external means (ie access restrictions and such) that no unwanted modifications can be made to the data.

        As it is still bad for our application if we run into a (CORBA) timeout in this "strange" case, I'd like to make a suggestion that should not harm XADisks design principles:
        How about introducing a configurable parameter that sets the number of retries in the mentioned scenario and which has a default of, say, 0 (or whatever value is suitable for saying "off"), which would imply an unlimited number of retries, leaving the current behaviour unchanged.
        That should solve our problem without infering with XADisk too much.

        What do you say?

        Cheers,
        Julius

        Show
        juliusblank added a comment - Hi Nitin, time to reactivate this issue I agree with your explanation: other applications should not mess around with files that are under XADisks (transactional) management. As with databases or other data sources it should be ensured by external means (ie access restrictions and such) that no unwanted modifications can be made to the data. As it is still bad for our application if we run into a (CORBA) timeout in this "strange" case, I'd like to make a suggestion that should not harm XADisks design principles: How about introducing a configurable parameter that sets the number of retries in the mentioned scenario and which has a default of, say, 0 (or whatever value is suitable for saying "off"), which would imply an unlimited number of retries, leaving the current behaviour unchanged. That should solve our problem without infering with XADisk too much. What do you say? Cheers, Julius
        Hide
        Nitin Verma added a comment -

        Hi Julius,

        I have implemented one fix for this. XADisk will come out of the commit/rollback when it will detect ioexceptions like the one reported in this bug. It will provide an interface for obtaining the identifier for such transactions and then removing them off from the xadisk's memory/txn-logs (so that xadisk can unlock those files, if still running, or come-out of the recovery, if such failure happens during recovery). It will not retry to auto-commit or rollback such transactions; the user would need to look for the transaction's changes (whatever took place) and make them consistent manually. And once done, the user will inform xadisk to now safely close the transaction; in other words, mark the transaction as complete. So that, rest of the transactions can keep going, unlike today when we bring the whole xadisk instance down.

        I will update in the javadoc in more detail. Please feel free to question and comment.

        Thanks & Regards,
        Nitin

        Show
        Nitin Verma added a comment - Hi Julius, I have implemented one fix for this. XADisk will come out of the commit/rollback when it will detect ioexceptions like the one reported in this bug. It will provide an interface for obtaining the identifier for such transactions and then removing them off from the xadisk's memory/txn-logs (so that xadisk can unlock those files, if still running, or come-out of the recovery, if such failure happens during recovery). It will not retry to auto-commit or rollback such transactions; the user would need to look for the transaction's changes (whatever took place) and make them consistent manually. And once done, the user will inform xadisk to now safely close the transaction; in other words, mark the transaction as complete. So that, rest of the transactions can keep going, unlike today when we bring the whole xadisk instance down. I will update in the javadoc in more detail. Please feel free to question and comment. Thanks & Regards, Nitin
        Hide
        Nitin Verma added a comment -

        Checked-in the changes to trunk (revision #538). Javadoc pending.

        Show
        Nitin Verma added a comment - Checked-in the changes to trunk (revision #538). Javadoc pending.

          People

          • Assignee:
            Nitin Verma
            Reporter:
            juliusblank
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: