Issue Details (XML | Word | Printable)

Key: JPA_SPEC-40
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Unassigned
Reporter: ceylanh
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
jpa-spec

ON extension with Fetch Join in JPA 2.1 Proposal breaks the EntityManager

Created: 24/Oct/12 07:40 AM   Updated: 17/Dec/12 07:58 PM   Resolved: 17/Dec/12 07:58 PM
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

Environment:

N/A


Tags:
Participants: arjan tijms, ceylanh and ldemichiel


 Description  « Hide

JPA 2.1 spec proposal extends the fetch joins with the addition of 'ON' specification.

Fetch joins are for eager loading and plain joins are for resultset restrictions.

According to specification when within a transaction, entities returned by a JPQL or Crtieria API are managed.

So restricting fetch joins with ON extension alters the relation in a way that the original relations are represented with an inconsistent managed relations. An modification to the relations will create a corrupt the data graph.

Example
A Transaction is active
Persons are queried and in ON clause their addresses are restricted to Address.primary = true
One of the Person(s) has 2 addresses. Yet only the Address that is primary loaded as part of the query.
The Business layer adds a new Address to Person's addresses collection.
When the EntityManager is flushed inadvertently the non-primary address gets removed from the relation.

If you work this around by marking the collection as 'unmanaged', then a further find() operation will return again inconsistent Person object with only one address and Person's addresses collection is refetched with full contents, then you will have an inconsistency in the EntityManager that same person with two collections representing different child entities.

While this is the case with Fetch Joins, there should be no problem with plain Joins as they do not get used to populate the relations but merely restrict the ResultList.



ldemichiel added a comment - 29/Oct/12 09:38 PM

Fetch joins prohibit use of an identification variable to reference objects returned by the right hand side of the join, so I don't see how the described problem arises.

Please clarify and/or provide a code example.


ceylanh added a comment - 30/Oct/12 06:22 AM

4.14 BNF

fetch_join ::= join_spec FETCH join_association_path_expression [join_condition]
join_condition ::= ON condition

Note that I am reporting the bug against the version 2.1 proposal.


ceylanh added a comment - 30/Oct/12 06:29 AM

Example as per request.

Model:

Person { (...) Long id; List<Address> addresses; }

Address { (...) String name; boolean primary; }

Data:

Person 1 = { 1, (...) addresses[ {(...), 'address1', true}, { (...), 'address2', false} ]}



JPQL with and without ON clause and results:

select c from Customer c fetch join p.addresses --> { 1, (...) addresses[ {(...), 'address1', true}, { (...), 'address2', false} ]}

select c from Customer c fetch join p.addresses on p.addresses.primary = true --> { 1, (...) addresses[ {(...), 'address1', true} ]}


ldemichiel added a comment - 30/Oct/12 03:41 PM

That query is not legal. You can't navigate through a collection-valued attribute.


ceylanh added a comment - 30/Oct/12 08:34 PM

Are we sure we are talking about the V2.1 proposal?
As I copied from 2.1 Spec Proposal, the BNF seems to me allows it.

Also the proposal has the Fetch Interface with the new 'on' related methods.


ldemichiel added a comment - 30/Oct/12 08:48 PM

2.1 spec is quite clear that it is illegal to navigate through a collection-valued path expression (see section 4.4.4.1).

join-assocation-path-expression requires an identification variable to be able to
reference the right hand side of the FETCH. Use of an identification variable is supported for regular joins but not for fetch joins.


ceylanh added a comment - 30/Oct/12 09:18 PM

v2.0

fetch_join ::= [ LEFT [OUTER] | INNER ] JOIN FETCH join_association_path_expression

v2.1

fetch_join ::= [ LEFT [OUTER] | INNER ] JOIN FETCH join_association_path_expression join_condition
join_condition ::= ON condition

There is not much explanation on 'on' feature.
Wouldn't that create the case I brought up?

On top of that the same applies for a single value association. Assume the address is OneToOne then a disqualifying restriction in 'on' part would end up with Person with null address, right?


ldemichiel added a comment - 06/Nov/12 09:55 PM

The join_condition production should have been specified as
join_condition ::= ON conditional_expression
Conditional expressions are discussed in section 4.6++. These do not introduce new identifiation variables.

You are right about single-valued associations. Because one can navigate through a single-valued-object-expression from an identification variable introduced on the left-hand side of the join, it is possible to retrieve a null address.

The spec needs to warn against this. I think it should probably state that it is illegal to use a path expression that traverses to or through a single-valued-object-field in an outer fetch join ON condition.


ceylanh added a comment - 07/Nov/12 02:42 AM

Thank you, you are right that I missed that this only applies to single-valued associations. Sorry for making it more complex then it needs to be.

I cannot resist to ask "what's the use case for 'on' condition after all"? As I understand now, this is even applicable only to single-valued associations, I think it creates more complexity then it helps. Correct me if I am wrong but, I cannot see that it introduces any new functionality that could not be achieved with 2.0 syntax.

If the motivation is to make the syntax become even more SQL-like, I would argue that 'on' needed at all. It exists in SQL solely to join tables where as JPQL in its nature has that. IMHO, it is not a good practice even in SQL to mix restriction and join, this not only looks bad but also hard to read and error prone. But since SQL does not know about the metadata it simply needs a way to join the tables - and a distinction by 'on' and 'where' makes it more clear what condition is to join and what condition is to restrict.

Whereas in JPQL, joins are implicit and what is left for the developer to create the restriction.


arjan tijms added a comment - 08/Nov/12 11:20 PM

Ceylanh, this is a good observation!

Intuitively I would say that a way to limit the size of collections would have merit (I'm aware that the problem discussed here only applies to single-valued associations). Many times I've run into a modeling issue where entity A has a logical one to many relation to B, but I don't express that in the corresponding JPA Entities, since the size of the resulting collection would be way too big and there's no real way to limit it.

Perhaps it's possible for a JPA provider to internally tag a collection as being a 'limited' one and thereafter keep track of adds and deletes. At flush time, the provider then knows it should not persist the collection as-is, but only process the elements that were added and deleted. Alternatively, it might also be an option to simply forbid modifying such 'limited' collection (e.g. throwing an IllegalStateException when at an attempt to add or delete an element).

That does leave us with the question of L1 and L2 caching. If an Entity is first obtained while explicitly limiting at least one of its associations, what will a followup call to EntityManager#find() return?


ceylanh added a comment - 10/Nov/12 06:22 PM

I think query methods are there to handle that. IMHO, the managed entities must solely represent the DB data.
But as you put, marking and defining the behaviour well in the meta data may be way to go.

Having a second think, I also think the following 2.0 query has a flaw

select c from Company c
left join fetch c.manager
inner join fetch c.manager.address

Assume that these're all OneToOne relations. In that instance, a company with a manager with no address, will bring a Company instance to session with no manager. I think this is hard to keep track and error prone.

So, I think, fetch joins should be left joins all the time and there should be no restrictions at all. Their sole use must be limited on the eager loading of the associations. Isn't that also follows the distinction between join and fetch join?

I do appreciate your use case. Say we have a case BankCustomer – OneToMany --> MoneyTransaction. In that case if I were to list the MoneyTransactions I would go about that by selecting BankTransactions with pagination (or some sort of restrictions). I think the entity model should not try to replace the queries but should just provide the metadata to base the queries on top it.


ldemichiel added a comment - 17/Dec/12 07:58 PM

Support for ON conditions with fetch joins removed as of the JPA 2.1 public draft