Issue Details (XML | Word | Printable)

Key: GLASSFISH-15558
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Nigel Deakin
Reporter: Nigel Deakin
Votes: 3
Watchers: 7
Operations

If you were logged in you would be able to see more operations.
glassfish

Caching JMS session in a session bean causes errors when invoked by a MDB when under load

Created: 13/Jan/11 07:00 AM   Updated: 21/Mar/13 05:04 PM
Component/s: jms
Affects Version/s: 3.1_b37
Fix Version/s: 4.0.1

Time Tracking:
Not Specified

File Attachments: 1. File ServerLog.odt (24 kB) 13/Jan/11 07:12 AM - Nigel Deakin
2. Zip Archive TransactionTests.zip (52 kB) 13/Jan/11 07:00 AM - Nigel Deakin


Tags: 3_1-exclude 3_1-release-note-added 3_1-release-notes 3_1_1-scrubbed 3_1_2-exclude
Participants: jthoennes, marina vatkina, Nazrul, Nigel Deakin, Paul Davies, Scott Fordin and theodor.richard


 Description  « Hide

This issue was originally reported by a user on the GlassFish developer list. Here is the thread:
http://www.java.net/forum/topic/glassfish/glassfish/jms-and-transaction-issue

The issue can be summarised as follows: a MDB consumes a message from an inbound queue, updates a database then invokes a session bean which sends a message to an outbound queue. If 40 messages are placed on the inbound queue then we see a variety of messages in the server log (see that thread), including this one:

625+0000|SEVERE|glassfish3.1|javax.resourceadapter.mqjmsra.outbound.connection|_ThreadID=36;_ThreadName=Thread-1;|commitTransaction (XA) on JMSService:jmsdirect failed for connectionId:950872495901869318 and onePhase:false due to Unknown JMSService server error ERROR: com.sun.messaging.jmq.jmsserver.util.BrokerException: Bad transaction state transition. Cannot perform operation COMMIT_TRANSACTION(46) (XAFlag=0x0:TMNOFLAGS) on a transaction in state COMPLETE(4).|#]

The error occurs ONLY if the session bean caches its JMS connection, session and producer in a field of the bean. This is valid, though it is contrary to the conventional practice which is to create the connection, use it, and close the connection every time the session bean is invoked. If these objects are not cached then this bug is NOT seen. This bug therefore has a workaround.

The issue can be reproduced using JMS only (i.e. not using a database), though to see exactly the same errors as the user reported it is necessary to force the use of two-phase commits.

A simple NetBeans application is attached which demonstrates the issue. This consists of a Enterprise Application "TransactionTests" which is composed of a ejb application "TransactionTests-ejb" and a web application "TransactionTests-war".

Steps to reproduce:

1. Install the latest version of GlassFish 3.1 (I used build 37)
2. Before starting GlassFish, edit domain.xml to set the JVM option -Dimq.jmsra.isSameRMAllowed=false . This is needed to force two-phase transactions to be used. (If this is not done the application will still fail but you will get different errors).
3. Use NetBeans to build the application (which is an ear cotaining an ejb and a web app) and deploy it in GlassFish.
4. Visit http://localhost:8080/TransactionTests-war/ and click on "Run MDB Test 1". This causes a servlet to send 40 messages to the inbound queue.
5. Inspect the server log for errors



Nigel Deakin added a comment - 13/Jan/11 07:12 AM

The attached file ServerLog.odt is an extract from the server log, which includes logging in DirectXAResource and in the application.

Note particularly Thread 36 (highlighted in green) and Thread 51 (highlighted in red). This suggests that the session bean instance used by thread 36 was reused by thread 51 after the business method returned but before the MDB returned and the transaction was committed. This meant that the same JMS session object was being used by two threads at the same time, which caused the error.

(Full disclosure: to create this logging a modified version of MQ was used with all use to System.out() in DirectXAResource changed to use JDK logging. This was necessary to ensure that such log messages were reported using the correct thread)


Nigel Deakin added a comment - 14/Jan/11 10:18 AM

This behaviour is also seen in GlassFish 2.1.1, so this is not a regression. There's also a workaround (and the workaround is generally considered better practice than the problem case). So this bug doesn't need to be fixed now, so setting the 3_1-exclude tag.


Nigel Deakin added a comment - 14/Jan/11 10:30 AM

Have created documentation bug
http://java.net/jira/browse/GLASSFISH-15579
to record this in the release note for 3.1.


Paul Davies added a comment - 14/Jan/11 10:46 AM

For the GlassFish 3.1 release notes add the following information:

A stateless session bean should not save JMS connections or sessions in fields of the bean. Applications that do so may encounter errors.

To avoid this issue, if a stateless session bean's business method requires the use of a JMS connection and session then the business method should create the JMS connection and session, use it to send or receive messages, and then close the connection and session before returning. This is GlassFish issue 15558.


theodor.richard added a comment - 14/Jan/11 11:59 AM

I'm the user who initially reported this issue on the mailing list. A problem with not caching the connection is that the maximum number of connections is reached quickly. I'm seeing the following exceptions in the log when sending 50 messages in a for loop, i.e. the method that acquires and releases the JMS connection is invoked 50 times in a row:

com.sun.messaging.jms.JMSException: MQRA:DCF:allocation failure:createConnection:Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.

My max connection pool size has the default size of 32.


Nigel Deakin added a comment - 17/Jan/11 02:11 AM - edited

@theodor.richard: If you believe that managed connections are not being returned correctly to the pool (and this isn't because your pool simply isn't big enough), then please log this as a separate issue or raise it on the user list. Please keep this issue for discussions of the effect of caching the connection, session and producer.


Nigel Deakin added a comment - 25/Jan/11 06:12 AM

Analysis of the test case shows that the cause of the problem is that the container is reusing the session bean instance (and hence the connection's XAResource instance) after the business method has returned but before the transaction has been committed.

It is legal for the container to reuse the stateless session bean instance before the transaction has been committed: the EJB spec, section 4.7 "Stateless Session Beans" states that "the container may interleave requests from multiple transactions to the same instance".

However doing so causes errors in the JMSRA resource adapter, because it is designed on the basis that the same XAResource instance is used for start, end, prepare and commit and that the instance will not be reused until the transaction is committed or rolled back.

That is a breach of the JCA 1.5 spec, which states in section 7.3.2.1 "Implementation" that "A transaction manager can use any XAResource instance (if it refers to the proper resource manager instance) to initiate transaction completion. The XAResource instance used during the transaction completion process need not be the one initially enlisted with the transaction manager for this transaction"

This has been logged as internal (bugs.sun.com) bug 7014537.


jthoennes added a comment - 14/Apr/11 12:56 AM

Hello Nigel,

as we heavily use that kind of scenario, I would like to ask whether this issue will be fixed for 3.1.1
without raising a service request.

A quick answer is highly appreciated.

Thanks, Jörg


Nigel Deakin added a comment - 14/Apr/11 03:24 AM

This is currently scheduled for 3.2, though, as always, I can't make commitments as to the contents of future releases.

If you have a support licence and this issue is causing a problem them please contact your support representative (and let me know you've done so) since this would definitely affect the priority we give to fixing it.

Nigel


jthoennes added a comment - 14/Apr/11 04:10 AM

In reply to comment #9:
> If you have a support licence and this issue is causing a problem them please
> contact your support representative (and let me know you've done so) since this
> would definitely affect the priority we give to fixing it.

Thanks, Nigel. Yes, we have a support contract. What do you need if I file a service request on My Oracle Support (MOS).
Do you have access to the service requests submitted?

Cheers, Jörg


Scott Fordin added a comment - 15/Apr/11 10:19 AM

Added issue to 3.1 Release Notes.


Nazrul added a comment - 21/Apr/11 10:58 AM

It would be good to take a look at this issue for 3.1.1


Nigel Deakin added a comment - 03/May/11 03:19 AM

@jthoennes - Yes, please file an issue with Oracle support as you suggest. There is a separate engineering team to resolve customer issues, so raising it with support increases the resources available to address this issue.


Nigel Deakin added a comment - 03/May/11 09:50 AM

I have reviewed this bug for 3.1.1 and decided not to fix it in that version for the following reasons:

  • This bug is in older versions of GlassFish (including GlassFish 2.1.1) and so is not a regression
  • There is a workaround (see earlier comment)
  • The fix would require significant changes to the XAResource implementation classes in the JMSRA resource adapter. In addition to the work involved it would require a lot of testing to be sure that it does not introduce a regression. 3.2 will have much more testing than 3.1.1 and so, given that this is an old bug which has a workaround, I would like to defer fixing this bug until 3.2 so it can be properly tested.

Removing the 3_1_1-review tag.

@jthoennes - note that if you raise this issue with Oracle support this will still be reviewed by Oracle sustaining.


jthoennes added a comment - 27/May/11 07:26 AM

Filed Oracle Service Request "SR 3-3705874175: Resolve GLASSFISH-15558 for Glassfish 3.1.1" for this issue.


marina vatkina added a comment - 16/Nov/11 12:10 AM

Re EJB container behavior: In our current implementation, bean instances are returned to pool at the end of method invocation. If we were to to delay it till the termination of tx, we would need more instances because transaction can last much longer than a single method invocation.


Nigel Deakin added a comment - 14/Dec/11 09:52 AM

Adding 3_1_2-exclude tag. Excluding from 3.1.2 for the same reason it was excluded from 3.1.1 (see my comment above).