Issue Details (XML | Word | Printable)

Key: SAILFIN-1218
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Jagadish
Reporter: ekrisjo
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
sailfin

XA deadlock in derby for EJBTimers

Created: 01/Oct/08 02:34 AM   Updated: 30/Oct/08 03:14 AM   Resolved: 30/Oct/08 03:14 AM
Component/s: build_system
Affects Version/s: 1.0
Fix Version/s: b57

Time Tracking:
Not Specified

File Attachments: 1. Text File derby.log (1 kB) 07/Oct/08 07:17 AM - ekrisjo
2. Text File jvm.log (244 kB) 01/Oct/08 03:53 AM - ekrisjo
3. Java Source File NetXAResource.java (43 kB) 01/Oct/08 03:55 AM - ekrisjo

Environment:

Operating System: Linux
Platform: Sun


Issuezilla Id: 1,218
Tags:
Participants: ehsroha, ekrisjo, Jagadish, Knut Anders Hatlen and prasads


 Description  « Hide

Hi,

We have a serious problem with our EJB Timers. It seems as if there is some kind
of deadlock when the application server is trying to aquire an XA connection
from derby.

This problem has been obsered multiple times.

We are running a SGCS b37g cluster with one server instance. The derby database
is running in a separate JVM. The EJB timer JDBC pool is runnning XA.

Here is the specific RUNNABLE thread in SGCS which never releases its mutex lock
on a Vector. Multiple thread dumps has been taken in order verify that this
particular RUNNABLE thread never release the lock.

"p: thread-pool-1; w: 12" daemon prio=10 tid=0x00002aab7eff9000
nid=0x63fa runnable [0x000000004bae1000..0x000000004bae2aa0]
java.lang.Thread.State: RUNNABLE
at org.apache.derby.client.net.NetXAResource.initForReuse(Unknown Source)

  • locked <0x00002aaab17fd168> (a java.util.Vector)
    at org.apache.derby.client.ClientXAConnection.getConnection(Unknown Source)
    at
    com.sun.gjc.spi.ManagedConnection.getActualConnection(ManagedConnection.java:571)
    at com.sun.gjc.spi.ManagedConnection.getConnection(ManagedConnection.java:325)
    at
    com.sun.enterprise.resource.ConnectorAllocator.fillInResourceObjects(ConnectorAllocator.java:155)
    at
    com.sun.enterprise.resource.AbstractResourcePool.getResource(AbstractResourcePool.java:502)
    at
    com.sun.enterprise.resource.PoolManagerImpl.getResourceFromPool(PoolManagerImpl.java:248)
    at
    com.sun.enterprise.resource.PoolManagerImpl.getResource(PoolManagerImpl.java:176)
    at
    com.sun.enterprise.connectors.ConnectionManagerImpl.internalGetConnection(ConnectionManagerImpl.java:337)
    at
    com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:189)
    at
    com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:165)
    at
    com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:158)
    at com.sun.gjc.spi.base.DataSource.getConnection(DataSource.java:108)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.ejb.TransactionHelperImpl.getConnection(TransactionHelperImpl.java:212)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.ejb.EJBHelper.getConnection(EJBHelper.java:197)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.impl.TransactionImpl.getConnectionInternal(TransactionImpl.java:1447)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.impl.TransactionImpl.getConnection(TransactionImpl.java:1358)
  • locked <0x00002aaab1b96c38> (a
    com.sun.jdo.spi.persistence.support.sqlstore.impl.TransactionImpl)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.SQLStoreManager.executeQuery(SQLStoreManager.java:447)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.SQLStoreManager.retrieve(SQLStoreManager.java:376)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.SQLStateManager.retrieve(SQLStateManager.java:2059)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.SQLStateManager.reload(SQLStateManager.java:1197)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.SQLStateManager.reload(SQLStateManager.java:1153)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.impl.PersistenceManagerImpl.getObjectById(PersistenceManagerImpl.java:658)
    at
    com.sun.jdo.spi.persistence.support.sqlstore.impl.PersistenceManagerWrapper.getObjectById(PersistenceManagerWrapper.java:276)
    at
    com.sun.ejb.containers.TimerBean_2100919770_ConcreteImpl.ejbFindByPrimaryKey(TimerBean_2100919770_ConcreteImpl.java:1066)
    at sun.reflect.GeneratedMethodAccessor128.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    com.sun.enterprise.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1067)
    at com.sun.enterprise.security.SecurityUtil.invoke(SecurityUtil.java:176)
    at
    com.sun.ejb.containers.BaseContainer.invokeTargetBeanMethod(BaseContainer.java:2895)
    at
    com.sun.ejb.containers.EntityContainer.invokeFindByPrimaryKey(EntityContainer.java:803)
    at
    com.sun.ejb.containers.EJBLocalHomeInvocationHandler.invoke(EJBLocalHomeInvocationHandler.java:233)
    at $Proxy24.findByPrimaryKey(Unknown Source)
    at com.sun.ejb.containers.EJBTimerService.findTimer(EJBTimerService.java:1180)
    at
    com.sun.ejb.containers.EJBTimerService.getValidTimerFromDB(EJBTimerService.java:1591)
    at com.sun.ejb.containers.EJBTimerService.postEjbTimeout(EJBTimerService.java:1515)
  • locked <0x00002aaab74af7f8> (a com.sun.ejb.containers.RuntimeTimerState)
    at com.sun.ejb.containers.BaseContainer.callEJBTimeout(BaseContainer.java:2850)
    at com.sun.ejb.containers.EJBTimerService.deliverTimeout(EJBTimerService.java:1401)
    at com.sun.ejb.containers.EJBTimerService.access$100(EJBTimerService.java:99)
    at
    com.sun.ejb.containers.EJBTimerService$TaskExpiredWork.run(EJBTimerService.java:1952)
    at
    com.sun.ejb.containers.EJBTimerService$TaskExpiredWork.service(EJBTimerService.java:1948)
    at com.sun.ejb.containers.util.WorkAdapter.doWork(WorkAdapter.java:75)
    at
    com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:555)


ekrisjo added a comment - 01/Oct/08 03:53 AM

Created an attachment (id=708)
threaddump of deadlock


ekrisjo added a comment - 01/Oct/08 03:55 AM

Created an attachment (id=709)
deadlock source in derby 10.2.2.0


ehsroha added a comment - 03/Oct/08 03:19 AM

Reassign to Prasad according to mail from Tim


ekrisjo added a comment - 03/Oct/08 03:41 AM

We manage to get rid of this problem by replacing the Derby DB 10.2 jars in
<SAILFIN_HOME>/javadb/lib to 10.4.2.0.

We do not know for sure if this is the right solution to the problem. But at
least we dont see this particular problem anymore.

Cheers,
-Kristoffer


prasads added a comment - 03/Oct/08 04:07 AM

Re-assign this to Jagadish Ramu


Jagadish added a comment - 05/Oct/08 10:09 PM

seems to be related to Derby issue :
https://issues.apache.org/jira/browse/DERBY-2432
that is fixed in Derby 10.4.x


Jagadish added a comment - 06/Oct/08 10:19 PM

I had a discussion with derby engineers and the JIRA issue DERBY-2432 will take
effect only when transaction-timeout is called on the XAResource. By default its
not set by GlassFish (SailFin).

Could you post the <transaction-service> element in domain.xml ?


Jagadish added a comment - 06/Oct/08 11:17 PM

Also, do you have any setting in SailFin_Install_Dir/databases/derby.properties ?
can you also post the CallFlow Pool configuration ?

Do you see any exception in SailFin server.log when using 10.4.x derby libraries ?


ekrisjo added a comment - 07/Oct/08 01:51 AM

Here is the transaction-service element:

<transaction-service automatic-recovery="true"
heuristic-decision="rollback" keypoint-interval="65536"
retry-timeout-in-seconds="600" timeout-in-seconds="90"
tx-log-dir="${com.sun.aas.instanceRoot}/logs"/>


ekrisjo added a comment - 07/Oct/08 03:44 AM

I did not manage to find the derby.properties file in the location you
specified. However, I noticed that there is such a file located in
'javadb\demo\programs\simple', is that what you need?

The callflow pool looks as follows:

<jdbc-connection-pool allow-non-component-callers="false"
associate-with-thread="false" connection-creation-retry-attempts="0"
connection-creation-retry-interval-in-seconds="10"
connection-leak-reclaim="false" connection-leak-timeout-in-seconds="0"
connection-validation-method="auto-commit"
datasource-classname="org.apache.derby.jdbc.EmbeddedXADataSource"
fail-all-connections="false" idle-timeout-in-seconds="300"
is-connection-validation-required="false" is-isolation-level-guaranteed="true"
lazy-connection-association="false" lazy-connection-enlistment="false"
match-connections="false" max-connection-usage-count="0" max-pool-size="32"
max-wait-time-in-millis="60000" name="__CallFlowPool"
non-transactional-connections="false" pool-resize-quantity="2"
res-type="javax.sql.XADataSource" statement-timeout-in-seconds="-1"
steady-pool-size="8" validate-atmost-once-period-in-seconds="0"
wrap-jdbc-objects="false">
<property name="databaseName"
value="${com.sun.aas.instanceRoot}/lib/databases/sun-callflow"/>
<property name="connectionAttributes" value=";create=true"/>
</jdbc-connection-pool>


ekrisjo added a comment - 07/Oct/08 03:45 AM

There are no exceptions in the sailfin log after we updated to derby 10.4.x
libraries.


ekrisjo added a comment - 07/Oct/08 07:17 AM

Created an attachment (id=725)
derby log from when problem was observed


prasads added a comment - 10/Oct/08 02:06 AM

Changing to build system


Knut Anders Hatlen added a comment - 12/Oct/08 08:28 AM

I have found a couple of problems in org.apache.derby.client.net.NetXAResource
that could lead to symptoms like the ones described here. They have now been
logged in Derby's bug tracker: https://issues.apache.org/jira/browse/DERBY-3909

Those problems are also present in Derby 10.4, so I'm not sure why the upgrade
fixed it for you. May be coincidental, or it may be that you're seeing another
problem.


Knut Anders Hatlen added a comment - 15/Oct/08 06:31 AM

A fix for the hang (actually an infinite loop) has been checked in to Derby's
development sources. The hang may also occur with Derby 10.4, but there were
some timing changes that made it less likely. See the bug report in Derby's bug
tracker for more details, and for a stand-alone reproducible test case.


Jagadish added a comment - 30/Oct/08 03:13 AM

Fix for the derby issue
https://issues.apache.org/jira/browse/DERBY-3909
is available with the latest GlassFish/SailFin builds.

Derby 10.4.2x is integrated with GlassFish/SailFin builds (b57 onwards) and the
issue reported is fixed.

Requesting the bug submitter to confirm.


Jagadish added a comment - 30/Oct/08 03:14 AM

fixed in b57