jpa-spec
  1. jpa-spec
  2. JPA_SPEC-40

ON extension with Fetch Join in JPA 2.1 Proposal breaks the EntityManager

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Labels:
      None
    • Environment:

      N/A

      Description

      JPA 2.1 spec proposal extends the fetch joins with the addition of 'ON' specification.

      Fetch joins are for eager loading and plain joins are for resultset restrictions.

      According to specification when within a transaction, entities returned by a JPQL or Crtieria API are managed.

      So restricting fetch joins with ON extension alters the relation in a way that the original relations are represented with an inconsistent managed relations. An modification to the relations will create a corrupt the data graph.

      Example
      A Transaction is active
      Persons are queried and in ON clause their addresses are restricted to Address.primary = true
      One of the Person(s) has 2 addresses. Yet only the Address that is primary loaded as part of the query.
      The Business layer adds a new Address to Person's addresses collection.
      When the EntityManager is flushed inadvertently the non-primary address gets removed from the relation.

      If you work this around by marking the collection as 'unmanaged', then a further find() operation will return again inconsistent Person object with only one address and Person's addresses collection is refetched with full contents, then you will have an inconsistency in the EntityManager that same person with two collections representing different child entities.

      While this is the case with Fetch Joins, there should be no problem with plain Joins as they do not get used to populate the relations but merely restrict the ResultList.

        Activity

        Hide
        ldemichiel added a comment -

        The join_condition production should have been specified as
        join_condition ::= ON conditional_expression
        Conditional expressions are discussed in section 4.6++. These do not introduce new identifiation variables.

        You are right about single-valued associations. Because one can navigate through a single-valued-object-expression from an identification variable introduced on the left-hand side of the join, it is possible to retrieve a null address.

        The spec needs to warn against this. I think it should probably state that it is illegal to use a path expression that traverses to or through a single-valued-object-field in an outer fetch join ON condition.

        Show
        ldemichiel added a comment - The join_condition production should have been specified as join_condition ::= ON conditional_expression Conditional expressions are discussed in section 4.6++. These do not introduce new identifiation variables. You are right about single-valued associations. Because one can navigate through a single-valued-object-expression from an identification variable introduced on the left-hand side of the join, it is possible to retrieve a null address. The spec needs to warn against this. I think it should probably state that it is illegal to use a path expression that traverses to or through a single-valued-object-field in an outer fetch join ON condition.
        Hide
        ceylanh added a comment -

        Thank you, you are right that I missed that this only applies to single-valued associations. Sorry for making it more complex then it needs to be.

        I cannot resist to ask "what's the use case for 'on' condition after all"? As I understand now, this is even applicable only to single-valued associations, I think it creates more complexity then it helps. Correct me if I am wrong but, I cannot see that it introduces any new functionality that could not be achieved with 2.0 syntax.

        If the motivation is to make the syntax become even more SQL-like, I would argue that 'on' needed at all. It exists in SQL solely to join tables where as JPQL in its nature has that. IMHO, it is not a good practice even in SQL to mix restriction and join, this not only looks bad but also hard to read and error prone. But since SQL does not know about the metadata it simply needs a way to join the tables - and a distinction by 'on' and 'where' makes it more clear what condition is to join and what condition is to restrict.

        Whereas in JPQL, joins are implicit and what is left for the developer to create the restriction.

        Show
        ceylanh added a comment - Thank you, you are right that I missed that this only applies to single-valued associations. Sorry for making it more complex then it needs to be. I cannot resist to ask "what's the use case for 'on' condition after all"? As I understand now, this is even applicable only to single-valued associations, I think it creates more complexity then it helps. Correct me if I am wrong but, I cannot see that it introduces any new functionality that could not be achieved with 2.0 syntax. If the motivation is to make the syntax become even more SQL-like, I would argue that 'on' needed at all. It exists in SQL solely to join tables where as JPQL in its nature has that. IMHO, it is not a good practice even in SQL to mix restriction and join, this not only looks bad but also hard to read and error prone. But since SQL does not know about the metadata it simply needs a way to join the tables - and a distinction by 'on' and 'where' makes it more clear what condition is to join and what condition is to restrict. Whereas in JPQL, joins are implicit and what is left for the developer to create the restriction.
        Hide
        arjan tijms added a comment -

        Ceylanh, this is a good observation!

        Intuitively I would say that a way to limit the size of collections would have merit (I'm aware that the problem discussed here only applies to single-valued associations). Many times I've run into a modeling issue where entity A has a logical one to many relation to B, but I don't express that in the corresponding JPA Entities, since the size of the resulting collection would be way too big and there's no real way to limit it.

        Perhaps it's possible for a JPA provider to internally tag a collection as being a 'limited' one and thereafter keep track of adds and deletes. At flush time, the provider then knows it should not persist the collection as-is, but only process the elements that were added and deleted. Alternatively, it might also be an option to simply forbid modifying such 'limited' collection (e.g. throwing an IllegalStateException when at an attempt to add or delete an element).

        That does leave us with the question of L1 and L2 caching. If an Entity is first obtained while explicitly limiting at least one of its associations, what will a followup call to EntityManager#find() return?

        Show
        arjan tijms added a comment - Ceylanh, this is a good observation! Intuitively I would say that a way to limit the size of collections would have merit (I'm aware that the problem discussed here only applies to single-valued associations). Many times I've run into a modeling issue where entity A has a logical one to many relation to B , but I don't express that in the corresponding JPA Entities, since the size of the resulting collection would be way too big and there's no real way to limit it. Perhaps it's possible for a JPA provider to internally tag a collection as being a 'limited' one and thereafter keep track of adds and deletes. At flush time, the provider then knows it should not persist the collection as-is, but only process the elements that were added and deleted. Alternatively, it might also be an option to simply forbid modifying such 'limited' collection (e.g. throwing an IllegalStateException when at an attempt to add or delete an element). That does leave us with the question of L1 and L2 caching. If an Entity is first obtained while explicitly limiting at least one of its associations, what will a followup call to EntityManager#find() return?
        Hide
        ceylanh added a comment -

        I think query methods are there to handle that. IMHO, the managed entities must solely represent the DB data.
        But as you put, marking and defining the behaviour well in the meta data may be way to go.

        Having a second think, I also think the following 2.0 query has a flaw

        select c from Company c
        left join fetch c.manager
        inner join fetch c.manager.address

        Assume that these're all OneToOne relations. In that instance, a company with a manager with no address, will bring a Company instance to session with no manager. I think this is hard to keep track and error prone.

        So, I think, fetch joins should be left joins all the time and there should be no restrictions at all. Their sole use must be limited on the eager loading of the associations. Isn't that also follows the distinction between join and fetch join?

        I do appreciate your use case. Say we have a case BankCustomer – OneToMany --> MoneyTransaction. In that case if I were to list the MoneyTransactions I would go about that by selecting BankTransactions with pagination (or some sort of restrictions). I think the entity model should not try to replace the queries but should just provide the metadata to base the queries on top it.

        Show
        ceylanh added a comment - I think query methods are there to handle that. IMHO, the managed entities must solely represent the DB data. But as you put, marking and defining the behaviour well in the meta data may be way to go. Having a second think, I also think the following 2.0 query has a flaw select c from Company c left join fetch c.manager inner join fetch c.manager.address Assume that these're all OneToOne relations. In that instance, a company with a manager with no address, will bring a Company instance to session with no manager. I think this is hard to keep track and error prone. So, I think, fetch joins should be left joins all the time and there should be no restrictions at all. Their sole use must be limited on the eager loading of the associations. Isn't that also follows the distinction between join and fetch join? I do appreciate your use case. Say we have a case BankCustomer – OneToMany --> MoneyTransaction. In that case if I were to list the MoneyTransactions I would go about that by selecting BankTransactions with pagination (or some sort of restrictions). I think the entity model should not try to replace the queries but should just provide the metadata to base the queries on top it.
        Hide
        ldemichiel added a comment -

        Support for ON conditions with fetch joins removed as of the JPA 2.1 public draft

        Show
        ldemichiel added a comment - Support for ON conditions with fetch joins removed as of the JPA 2.1 public draft

          People

          • Assignee:
            Unassigned
            Reporter:
            ceylanh
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: