jsr-333
  1. jsr-333
  2. JSR_333-68

Specification is not clear on semantics of removing version histories

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: api, spec
    • Labels:
      None

      Description

      Section 15.8 of the JSR-283 specification covers removal of specific versions within a version history. However, I'm unable to find any information in the spec about removing entire version histories. There certainly may be regulatory and policy reasons for preventing the removal of some data, but likewise there may be regulatory and policy reasons for removing all store data relating to some particular topic. IMO, a repository implementation can provide the tools to allow both, but cannot choose one specific behavior. We have to allow applications to implement their own policies and behavior, and thus it needs to be addressed in the specification and possible API if there are changes to JavaDoc and/or interfaces.

      The other problem with removing only the versions is that it is likely that the root version remain. Thus, when frequently creating, modifying and removing versionable nodes, the amount of version history storage will continue to increase and cannot be compacted (at least through activities instigated by standard API calls).

      So I think there are a couple of issues.

      1. Once a node is removed, it is not possible to find the version history. That's because the VersionManager.getVersionHistory(...) takes the path of the versionable node. Thus a client can only find a version history for a removed node only if it somehow remembers the path. Plus, I would suspect that most implementations do not store version histories based upon the path, but rather the identifier. Should we change the JavaDoc on method to explicitly say identifier paths are accepted, or should we add VersionManager.getVersionHistoryById(String versionableId)? (Yes, a client still needs to know the identifier, but at least that is durable.)
      2. Removing version histories is complicated by the fact that multiple workspaces may have corresponding versionable nodes. Thus, removal of a versionable node from one workspace does not mean the version history is no longer used. However, if no other workspace has a corresponding versionable node, then it should be possible to completely remove a version history. Currently the TCK tests check that this is not permitted, because it checks that Item.remove() does not work on a VersionHistory node. Considering that the spec defines an explicit way to remove Version nodes (rather than allow Item.remove() to work), we should add a VersionManager.removeVersionHistory(String path) (and based on #1 above perhaps also a VersionManager.removeVersionHistoryById(String versionableId. These methods should fail if there is a versionable node with that identifier in at least one workspace.
      3. What happens to the version history when a versionable node is made to be no longer be versionable? For example, if a node is versionable only because it explicitly has "mix:versionable" in the list of mixin types, what happens when "mix:versionable" is removed from the node (assuming the version history was already created prior to this)? Is the version history even valid anymore? Can an older version be restored to a node that is no longer versionable? (If not, then we need to clarify this in the section that talks about restoring.)
      4. *Does it make sense to add "Node.remove(boolean removeVersionHistories)", so that a subgraph (of 1 or more potentially versionable nodes) can be removed and their version histories be removed at the same time (assuming the version histories can be removed because there are no corresponding nodes in other workspaces)?

        Activity

        Hide
        Peeter Piegaze added a comment -

        The getXXXById methods are not really needed, since you can always pass a JCR path in identifier-based form, e.g.:

        [f81d4fae-7dec-11d0-a765-00a0c91e6bf6]

        See section 3.4.4. I am pretty sure you are supposed to be able to pass any kind of path to any method, so this should work already.

        So this leaves the suggestion to add VersionManager.removeVersionHistory(String path) and
        Node.remove(boolean removeVersionHistories)

        I guess I don't have an objection to adding these, but what exactly is wrong with leaving this up to the implemention anyway? Isn't it pretty much impl-sensitive or at least "policy-sensitive" in the first place. Do we nee to put it in the spec?

        Show
        Peeter Piegaze added a comment - The getXXXById methods are not really needed, since you can always pass a JCR path in identifier-based form, e.g.: [f81d4fae-7dec-11d0-a765-00a0c91e6bf6] See section 3.4.4. I am pretty sure you are supposed to be able to pass any kind of path to any method, so this should work already. So this leaves the suggestion to add VersionManager.removeVersionHistory(String path) and Node.remove(boolean removeVersionHistories) I guess I don't have an objection to adding these, but what exactly is wrong with leaving this up to the implemention anyway? Isn't it pretty much impl-sensitive or at least "policy-sensitive" in the first place. Do we nee to put it in the spec?
        Hide
        rhauch added a comment -

        In 1) above I suggest that the JavaDoc be amended to say that identifier paths are also accepted.

        Regarding 3): it would be nice to see some clarification in the spec about this, or even some feedback here. Perhaps it's something that hasn't really been considered. IIUC, even the reference implementation does not provide a way of completely removing the version history for a node - there is always something left, even after removing all versions in a given history. For example, create 1M versionable nodes (and check them in), then remove all of those nodes and remove all of the versions from their histories, and version storage will still contain 1M almost empty version histories.

        Are you suggesting that implementations could implement Version.remove() and VersionHistory.remove()? As it stands now, the TCK tests check that these methods always throw an exception, so implementation cannot vary this behavior and fully pass the TCK. Perhaps it is a TCK error, in which case I can log it accordingly.

        I agree that removing version histories is "policy-sensitive", but no more so than removing versions. In fact, not providing a way to remove entire version histories seems to be assuming some policy. Currently there is no API-compatible way to even allow client applications to completely remove histories, so even if a client application had a policy that allowed removing data, there's no way for them to do it.

        Show
        rhauch added a comment - In 1) above I suggest that the JavaDoc be amended to say that identifier paths are also accepted. Regarding 3): it would be nice to see some clarification in the spec about this, or even some feedback here. Perhaps it's something that hasn't really been considered. IIUC, even the reference implementation does not provide a way of completely removing the version history for a node - there is always something left, even after removing all versions in a given history. For example, create 1M versionable nodes (and check them in), then remove all of those nodes and remove all of the versions from their histories, and version storage will still contain 1M almost empty version histories. Are you suggesting that implementations could implement Version.remove() and VersionHistory.remove()? As it stands now, the TCK tests check that these methods always throw an exception, so implementation cannot vary this behavior and fully pass the TCK. Perhaps it is a TCK error, in which case I can log it accordingly. I agree that removing version histories is "policy-sensitive", but no more so than removing versions. In fact, not providing a way to remove entire version histories seems to be assuming some policy. Currently there is no API-compatible way to even allow client applications to completely remove histories, so even if a client application had a policy that allowed removing data, there's no way for them to do it.
        Hide
        Peeter Piegaze added a comment -

        For

        VersionManager.getVersionHistory(String absPath)

        I have added a note in spec and javadoc clarifying that when a JCR path is passed if may be an "identifier-based absolute path" as defined in in spec section 3.4.4.1.2 and that a version without a corresponding versionable should still be maintained (and be accessible by the versionables identifier).

        I have also added the method

        void VersionManager.removeVersionHistory(String absPath)

        which (in those impl that support it) removes the VH as long as that VH has no exiting verisonables in any WS.

        Regarding the idea Node.remove(boolean removeVersionHistories):
        I don't think this is a good idea, since VH removal is a rather "big deal" and I don't hink it belongs aas flag on something as common as node removal.

        Show
        Peeter Piegaze added a comment - For VersionManager.getVersionHistory(String absPath) I have added a note in spec and javadoc clarifying that when a JCR path is passed if may be an "identifier-based absolute path" as defined in in spec section 3.4.4.1.2 and that a version without a corresponding versionable should still be maintained (and be accessible by the versionables identifier). I have also added the method void VersionManager.removeVersionHistory(String absPath) which (in those impl that support it) removes the VH as long as that VH has no exiting verisonables in any WS. Regarding the idea Node.remove(boolean removeVersionHistories): I don't think this is a good idea, since VH removal is a rather "big deal" and I don't hink it belongs aas flag on something as common as node removal.
        Hide
        rhauch added a comment -

        Thanks, Peeter. Your changes sound good, and I'm okay with not including the "Node.remove(boolean)" method.

        Show
        rhauch added a comment - Thanks, Peeter. Your changes sound good, and I'm okay with not including the "Node.remove(boolean)" method.

          People

          • Assignee:
            Unassigned
            Reporter:
            rhauch
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: