TrueZIP
  1. TrueZIP
  2. TRUEZIP-305

Preserve last-modified timestamp on archives when copying them with cp_rp

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Works as designed
    • Affects Version/s: TrueZIP 7.6.6
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When recursively copying a ZIP file using TFile.cp_rp(TFile), the last-modified timestamp of the archive itself is not preserved. Neither is the last-modified timestamp of any archive within that ZIP file.

      The code I am executing looks like this:

      TFile in = new TFile("/home/vinny/PetClinic-1.0.war");
      TFile out = new TFile("/tmp/PetClinic-1.0.war");
      in.cp_rp(out);

      If I execute that code, the last modified timestamp of /tmp/PetClinic-1.0.war will not be the same as that of /home/vinny/PetClinic-1.0.war, even though the latter is a copy of the first. Also, the timestamp of any archive inside of PetClinic-1.0.war will be touched as well, even though they are also copied as-is from the source to the target ZIP. I would expect the timestamp of a new archive (top-level or within another archive) to be touched when I invoke TFile.mkdir(), not when I invoke TFile.cp_rp().

      We have a use case where this behaviour causes problems. In Deployit (a deployment automation product we build), we use TrueZIP to process deployable archives like WAR and EAR files to replace placeholders with environment specific values while we deploy them. A customer of ours uses that feature to modify configuration files inside the WEB-INF directory. But the last-modified timestamps of the JARs inside of WEB-INF/lib get touched in the process, even though they are not modified by our placeholder replacer. That

      (as discussed on the users mailing list on November 9th 2012)

        Activity

        Hide
        Christian Schlichtherle added a comment -

        Ok, because this issue is backed by a valid and interesting use case, it deserves a detailed answer why I am not going to support it. The answer will detail two summaries:

        1) It would confuse users and tools.
        2) It would downgrade performance.
        3) It would still not be a verbatim copy.
        4) There is another way to support the use case.

        Before I go into the details, please note that this "feature" was once supported in older versions of TrueZIP. However, I removed it for the reasons that I am now going to explain:

        1) It would confuse users and tools:

        TrueZIP/TrueVFS maintains a virtual file system for archive files. Like in a real file system, the last modification time of a directory changes if and only if the list of its (direct) children changes. Let's consider the following virtual file system path:

        application.war/WEB-INF/lib/library.jar/META-INF/MANIFEST

        Now if a tool would update the contents of the MANIFEST file, its last modification time would change. However, if all parent path elements were regular directories, then neither the last modification time of the directory META-INF nor its ancestors would change.

        Now imagine this would happen if application.war and library.jar are ZIP files rather than regular directories. Without using TrueZIP/TrueVFS, a tool or a user might not notice that the data in both ZIP files has changed. The side effect of this unnoticed change may be disastrous when using backup or synchronization tools because they might not notice that they have to process such an updated archive file - the application.war file in this case.
        To avoid these issues, whenever the state of the virtual file system changes, the TrueZIP/TrueVFS Kernel would reassemble the updated archive file with an updated last modification time.

        Now the very same logic applies when copying archive files because the kernel doesn't know what the user's intention is, i.e. it cannot tell that you would prefer to have an exact copy of the time stamp.

        2) It would downgrade performance:

        Changing the last modification time of an entity in a regular file system is simple and fast. Just call TFile.setLastModified() and be done with it.

        TrueZIP/TrueVFS expands this feature to be applicable for entries within archive files, too. However, although this is still simple, it may not be fast at all: Archive file formats like the ZIP file format et al require an entry's meta data to be written BEFORE the entry's content. That means you can't really change the last modification time of an entry AFTER you have written its contents, like you would when copying files in a regular directory. If you do this, then the kernel would automatically sync the archive file and restart another update cycle to ensure that the meta data is correctly set. However, this means the whole archive file gets copied and if you do this in a loop when iterating over all archive entries, the run time complexity would increase to O( s^2 ), where s is the total size of the archive file. Now consider a large archive file and it becomes clear that this would be a disaster.

        Now the access tier applies some dark art spells when talking to the kernel tier to ensure that this doesn't happen when using TFile.cp(...) and TFile.cp_rp(...). However, the spells have their limits when it comes to the virtual root directory of a nested archive file - for library.jar in the example.

        3) It would still not be a verbatim copy:

        Even if this could be solved (by applying even more dark art spells), then what you would get would still not be a verbatim copy. This is because all archive entry meta data needs to get processed by the driver tier and chances are that the support for a particular archive file format is incomplete. In fact, even the TrueZIP Driver ZIP supports only a fraction of the ZIP File Format Specification, so you may expect some "loss of precision" when copying archive files this way.

        4) There is another way to support the use case:

        To make a verbatim copy of an archive file, simply instruct the kernel to unmount its virtual file system and then perform the copy. Instructions for doing this in a safe, simple and fast way are provided in the Javadoc for the TFile class here: http://truezip.java.net/apidocs/de/schlichtherle/truezip/file/TFile.html#verbatimCopy

        I hope this helps.

        Show
        Christian Schlichtherle added a comment - Ok, because this issue is backed by a valid and interesting use case, it deserves a detailed answer why I am not going to support it. The answer will detail two summaries: 1) It would confuse users and tools. 2) It would downgrade performance. 3) It would still not be a verbatim copy. 4) There is another way to support the use case. Before I go into the details, please note that this "feature" was once supported in older versions of TrueZIP. However, I removed it for the reasons that I am now going to explain: 1) It would confuse users and tools: TrueZIP/TrueVFS maintains a virtual file system for archive files. Like in a real file system, the last modification time of a directory changes if and only if the list of its (direct) children changes. Let's consider the following virtual file system path: application.war/WEB-INF/lib/library.jar/META-INF/MANIFEST Now if a tool would update the contents of the MANIFEST file, its last modification time would change. However, if all parent path elements were regular directories, then neither the last modification time of the directory META-INF nor its ancestors would change. Now imagine this would happen if application.war and library.jar are ZIP files rather than regular directories. Without using TrueZIP/TrueVFS, a tool or a user might not notice that the data in both ZIP files has changed. The side effect of this unnoticed change may be disastrous when using backup or synchronization tools because they might not notice that they have to process such an updated archive file - the application.war file in this case. To avoid these issues, whenever the state of the virtual file system changes, the TrueZIP/TrueVFS Kernel would reassemble the updated archive file with an updated last modification time. Now the very same logic applies when copying archive files because the kernel doesn't know what the user's intention is, i.e. it cannot tell that you would prefer to have an exact copy of the time stamp. 2) It would downgrade performance: Changing the last modification time of an entity in a regular file system is simple and fast. Just call TFile.setLastModified() and be done with it. TrueZIP/TrueVFS expands this feature to be applicable for entries within archive files, too. However, although this is still simple, it may not be fast at all: Archive file formats like the ZIP file format et al require an entry's meta data to be written BEFORE the entry's content. That means you can't really change the last modification time of an entry AFTER you have written its contents, like you would when copying files in a regular directory. If you do this, then the kernel would automatically sync the archive file and restart another update cycle to ensure that the meta data is correctly set. However, this means the whole archive file gets copied and if you do this in a loop when iterating over all archive entries, the run time complexity would increase to O( s^2 ), where s is the total size of the archive file. Now consider a large archive file and it becomes clear that this would be a disaster. Now the access tier applies some dark art spells when talking to the kernel tier to ensure that this doesn't happen when using TFile.cp(...) and TFile.cp_rp(...). However, the spells have their limits when it comes to the virtual root directory of a nested archive file - for library.jar in the example. 3) It would still not be a verbatim copy: Even if this could be solved (by applying even more dark art spells), then what you would get would still not be a verbatim copy. This is because all archive entry meta data needs to get processed by the driver tier and chances are that the support for a particular archive file format is incomplete. In fact, even the TrueZIP Driver ZIP supports only a fraction of the ZIP File Format Specification, so you may expect some "loss of precision" when copying archive files this way. 4) There is another way to support the use case: To make a verbatim copy of an archive file, simply instruct the kernel to unmount its virtual file system and then perform the copy. Instructions for doing this in a safe, simple and fast way are provided in the Javadoc for the TFile class here: http://truezip.java.net/apidocs/de/schlichtherle/truezip/file/TFile.html#verbatimCopy I hope this helps.
        Hide
        vpartington added a comment -

        Hi Christian,

        Pity you won't fix this, but thank you for the clear motivation. Especially number #2 is a compelling argument. Dynamically determining whether to copy the existing last-modified timestamp or to take the current system time would be too slow, so you have to pick one of the two at the moment of creation of the ZIP. In that case, picking the current system time seems the safest choice.

        Is there a chance I could convince you to add an option to control this?

        Regards, Vincent.

        Show
        vpartington added a comment - Hi Christian, Pity you won't fix this, but thank you for the clear motivation. Especially number #2 is a compelling argument. Dynamically determining whether to copy the existing last-modified timestamp or to take the current system time would be too slow, so you have to pick one of the two at the moment of creation of the ZIP. In that case, picking the current system time seems the safest choice. Is there a chance I could convince you to add an option to control this? Regards, Vincent.
        Hide
        Christian Schlichtherle added a comment - - edited

        Well, I could sprinkle some more dark art spells over the code so you could have an FsSyncOption to control this. However, reason #1, #3 and #4 would still apply. Even if you don't care for #1 and #3, then #4 still gives you a viable alternative which is more accurate (no loss of precision in the meta data) and of slightly better performance (no mounting required).

        So I wonder why you would still want it? As far as I know other users are happily applying the procedure in #4. I think this should work well for you, too.

        Please let me you know experience with it.

        Show
        Christian Schlichtherle added a comment - - edited Well, I could sprinkle some more dark art spells over the code so you could have an FsSyncOption to control this. However, reason #1, #3 and #4 would still apply. Even if you don't care for #1 and #3, then #4 still gives you a viable alternative which is more accurate (no loss of precision in the meta data) and of slightly better performance (no mounting required). So I wonder why you would still want it? As far as I know other users are happily applying the procedure in #4. I think this should work well for you, too. Please let me you know experience with it.
        Hide
        vpartington added a comment -

        Hi Christian,

        Agreed. We'll try #4 first and let you know how it worked out. Thanks again for replying so quickly and thinking along with us.

        Regards, Vincent.

        Show
        vpartington added a comment - Hi Christian, Agreed. We'll try #4 first and let you know how it worked out. Thanks again for replying so quickly and thinking along with us. Regards, Vincent.

          People

          • Assignee:
            Christian Schlichtherle
            Reporter:
            vpartington
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: