[GLASSFISH-18858] After the execution of versioning timer apps deployment, next timer apps deployment - failed. Created: 29/Jun/12  Updated: 16/Aug/12  Resolved: 16/Aug/12

Status: Resolved
Project: glassfish
Component/s: ejb_container
Affects Version/s: 4.0_b43
Fix Version/s: 4.0_b50_ms4

Type: Bug Priority: Major
Reporter: aelena Assignee: marina vatkina
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: XML File domain.xml     File setup_cl.pl     Text File timer.log     File workaround.sh    
Tags: 40-regression

 Description   

Promoted 4.0 build 43. OEL6 or Solaris Sparc 10.

Created a cluster with two instances.

The deployment of all timer apps to the cluster failed.

It happened if before were executed the follow timer app deployment commands:
asadmin deploy --target domain --name=timersession:1.0 --retrieve /opt/appserver-sqe/pe/deploymen
t_v3 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

asadmin deploy --target my-c1 --name=timersession:1.0 --retrieve /opt/appserver-sqe/pe/deployment
_v3 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

(The second deployment created syntax warning, see bug: 18857)

Then the apps were undeployed; cluster and instances were removed, recreated and started; DB restarted; domain restarted; all required resources recreated.

After that was executed the follow depoyment command:

asadmin deploy --target my-c1 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

This deployment and the deployment of all other timer apps - failed. I've attached the correspondent error messages from server.log.

If these two first versioning timer app deployment would not be executed, then the next timer apps deployment would not fail.

This is a regression issue, for example, this issue was not seen for b35.



 Comments   
Comment by marina vatkina [ 29/Jun/12 ]

Where does the TimerPool point to for the cluster-wide database access? Could you accidentally remove the Timer table after you recreated the cluster?

Comment by aelena [ 29/Jun/12 ]

It was a regular automated test, that I've executed for 3.1.1, 3.1.2. I did not remove any tables.

Comment by marina vatkina [ 29/Jun/12 ]

Please attach your domain.xml after before and after you created the cluster.

Comment by aelena [ 29/Jun/12 ]

"Before" and "after" the configuration was the same, because I've used the same scripts to create a configuration.

Comment by Hong Zhang [ 02/Jul/12 ]

Elena: I just checked in a fix for 18857, when you verify fix for that, please also re-run the tests for this issue to see if they are also addressed by that fix. Thanks.

Comment by aelena [ 03/Jul/12 ]

I've re-run the test against the latest nightly build. The bug 18857 was really fixed. But this issue still exists. The second deployment of the timer app, that created the syntax error before, now creates the same "timer" error: "EJB Timer Service is not available"

Comment by Hong Zhang [ 05/Jul/12 ]

Yes, it seems the issue with the client jar retrieve is solved but there is some issue with timer specific things.

I was able to reproduce the problem by doing the following steps:

1. create a cluster with two instances, start cluster and start database
2. create a resource reference on cluster for timer pool:
asadmin create-resource-ref --target cluster1 jdbc/__TimerPool
3. deploy the timersession.ear attached in 18857 to the cluster:
asadmin deploy --target cluster1 --name=timersession:1.0 --retrieve . timersession.ear

and I got similar error messages with SQL/tables etc. I will let Marina do some further investigation from here.

Comment by marina vatkina [ 05/Jul/12 ]

Of course this would fail. You can't use default TimerPool in the cluster - it points to the embedded (i.e. each instance owned) derby, not a cluster-wide database.

Comment by Hong Zhang [ 05/Jul/12 ]

Ok, these are just the steps I used and I guess they are not correct.

Elena: could you provide the set of the steps you used (not to run the whole test suite, but just enough to reproduce the issue)?

Comment by aelena [ 05/Jul/12 ]

I've executed the follow commands:

asadmin start-database
Then created a cluster with two instances, see setup_cl.pl
After that created timer configuration, see workaround.sh

Then can be executed:

asadmin deploy --target $CLUSTER --name=timersession:1.0 --retrieve $OUT_DIR $OUT_DIR/archives_nodb/timer
session.ear

Comment by Hong Zhang [ 05/Jul/12 ]

Thanks Elena!

Marina: can you take a look at Elena's steps to see if the steps are the proper steps to deploy an timer related application to cluster?

Comment by marina vatkina [ 05/Jul/12 ]

Elena,

Does it work if you do not use versioning on the 1st deploy? We have tests with timer app deployed to a cluster and they work fine.

Comment by aelena [ 05/Jul/12 ]

As I've mentioned in the description, without versioning deployment, everything works fine. But after versioning deployment of the timer app, all other timer apps deployment without versioning - failed, despite of the restarting domain, DB and clustered instances after the versioning deployment of the timer app.

And the versioning deployment of the timer app to the cluster failed, but the same deployment to domain doesn't fail (it was first timer app deployment).

Comment by Hong Zhang [ 05/Jul/12 ]

Deploy to domain is ok is probably because the initial deployment to domain does not load the application on any target so that part of the code path is not executed yet.

Marina: if elena's steps look ok to you, do you want to use her steps to look into why the versioning would make any difference here? According to elena, this used to work..

Comment by Alex Pineda [ 12/Jul/12 ]

Adding a regression tag for QA tracking purposes.

Comment by aelena [ 31/Jul/12 ]

Executed a test against b48, still see this issue.

Comment by Hong Zhang [ 31/Jul/12 ]

I tried a few weeks ago, set up the cluster and used your workaround.sh to set up resources, and then deploy the timersession.ear to the cluster, I did not see any exception.

Can you remind me what's the simplest set of the steps to reproduce the problem again?

Do I have to deploy the timersession.ear to domain first, and then to the cluster target?

Comment by aelena [ 31/Jul/12 ]

I've executed only versioning test, where were such commands for timersession:
===================================================
asadmin deploy --target domain --name=timersession:1.0 --retrieve /export/hudson/workspace/deploy
ment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/time
rsession.ear

asadmin undeploy --target domain timersession:*

asadmin deploy --target my-c1 --name=timersession:1.0 --retrieve /export/hudson/workspace/deploym
ent-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timer
session.ear
asadmin undeploy --target my-c1 timersession:*
=================================================

All these commands were executed successfully.

Then I've restarted domain, cluster and DB. After that I've executed such command:

=========================================================
asadmin deploy --target my-c1 --retrieve /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear
remote failure: Error occurred during deployment: Exception while deploying the app [timersession] : Failed to create automatic timers for TimerSessionEJB – null. Please see server.log for more details.
Command deploy failed.
=================================================

So this deployment failed.

Comment by Hong Zhang [ 01/Aug/12 ]

Thanks Elena! A couple follow up questions, if you don't restart domain/cluster/DB before the failed deployment, will that deployment still fail?

If we execute the exact same sequence of the steps, deploy/undeploy to domain, deploy/undeploy to cluster, restart everything, deploy to cluster again, just without using versioning, will the exact same sequence fail also?

Comment by aelena [ 01/Aug/12 ]

I've installed glassfish, created a cluster with two inst, executed workaround.sh then deployed/undeployed to domain and then to the cluster, the deployment to the cluster failed.
============================================
/export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target domain --retrieve . archives_nodb/timersession.ear
Application deployed with name timersession.

/export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin undeploy --target domain timersession
Command undeploy executed successfully.

/export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target my-c1 --retrieve . archives_nodb/timersession.ear
Application deployed with name timersession.
WARNING: Command _deploy did not complete successfully on server instance my-in1: remote failure: Failed to load the application on instance my-in1. The application will not run properly. Please fix your application and redeploy.
Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details.
WARNING: Command _deploy did not complete successfully on server instance my-in2: remote failure: Failed to load the application on instance my-in2. The application will not run properly. Please fix your application and redeploy.
Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details.

Comment by Hong Zhang [ 01/Aug/12 ]

Thanks Elena! So from the commands you used, this problem does happen to non-versioned deployment as well?

Comment by aelena [ 01/Aug/12 ]

Yes.

Comment by Hong Zhang [ 02/Aug/12 ]

Elena, thanks for confirming. In this case, I will assign to the ejb team for further evaluation on this.

Comment by marina vatkina [ 13/Aug/12 ]

Elena, I'm confused. The comment http://java.net/jira/browse/GLASSFISH-18858?focusedCommentId=344054&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_344054 says that only versioned deploy fails.

Comment by aelena [ 13/Aug/12 ]

The problem happened when first was executed the deployment of the timer app to the domain. Then the deployment of any timer app, for example to the cluster, failed. It doesn't matter, whether the domain deployment of the timer app, used versioning or not.

Also, after the domain deployment happened, then any timer deployment failed, independently whether the domain/cluster/db were restarted or not.

Comment by marina vatkina [ 14/Aug/12 ]

I'm surprised it ever worked. We do not support more than one configuration for the EJB TS on a domain, but domain undeploy tries to remove the timers, and not knowing that they never ran, or if the app was ever enabled in any instance, looks for the TS config on the domain, which points to the embedded pool. The next (real) deploy hits an existing TS and doesn't create the table in the cluster-specific resource.

Comment by aelena [ 14/Aug/12 ]

This a regression issue. Everything worked fine for GF 3.1.2 and it worked fine, for example, for GF 4.0 b35.

Comment by marina vatkina [ 15/Aug/12 ]

It was a false positive. The behavior was actually wrong. But I can restore it, and then create a feature request to do it right.

Comment by marina vatkina [ 16/Aug/12 ]

Fixed with rev 55512 by keeping resource name null for target 'domain'. Timers can't be removed when undeploy target is 'domain' (may be we should document it).

Generated at Mon Aug 31 11:38:49 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.