glassfish
  1. glassfish
  2. GLASSFISH-18858

After the execution of versioning timer apps deployment, next timer apps deployment - failed.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0_b43
    • Fix Version/s: 4.0_b50_ms4
    • Component/s: ejb_container
    • Labels:
      None

      Description

      Promoted 4.0 build 43. OEL6 or Solaris Sparc 10.

      Created a cluster with two instances.

      The deployment of all timer apps to the cluster failed.

      It happened if before were executed the follow timer app deployment commands:
      asadmin deploy --target domain --name=timersession:1.0 --retrieve /opt/appserver-sqe/pe/deploymen
      t_v3 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

      asadmin deploy --target my-c1 --name=timersession:1.0 --retrieve /opt/appserver-sqe/pe/deployment
      _v3 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

      (The second deployment created syntax warning, see bug: 18857)

      Then the apps were undeployed; cluster and instances were removed, recreated and started; DB restarted; domain restarted; all required resources recreated.

      After that was executed the follow depoyment command:

      asadmin deploy --target my-c1 /opt/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear

      This deployment and the deployment of all other timer apps - failed. I've attached the correspondent error messages from server.log.

      If these two first versioning timer app deployment would not be executed, then the next timer apps deployment would not fail.

      This is a regression issue, for example, this issue was not seen for b35.

      1. domain.xml
        57 kB
        aelena
      2. setup_cl.pl
        1 kB
        aelena
      3. timer.log
        43 kB
        aelena
      4. workaround.sh
        0.4 kB
        aelena

        Activity

        aelena created issue -
        Hide
        marina vatkina added a comment -

        Where does the TimerPool point to for the cluster-wide database access? Could you accidentally remove the Timer table after you recreated the cluster?

        Show
        marina vatkina added a comment - Where does the TimerPool point to for the cluster-wide database access? Could you accidentally remove the Timer table after you recreated the cluster?
        Hide
        aelena added a comment -

        It was a regular automated test, that I've executed for 3.1.1, 3.1.2. I did not remove any tables.

        Show
        aelena added a comment - It was a regular automated test, that I've executed for 3.1.1, 3.1.2. I did not remove any tables.
        Hide
        marina vatkina added a comment -

        Please attach your domain.xml after before and after you created the cluster.

        Show
        marina vatkina added a comment - Please attach your domain.xml after before and after you created the cluster.
        Hide
        aelena added a comment -

        "Before" and "after" the configuration was the same, because I've used the same scripts to create a configuration.

        Show
        aelena added a comment - "Before" and "after" the configuration was the same, because I've used the same scripts to create a configuration.
        aelena made changes -
        Field Original Value New Value
        Attachment domain.xml [ 50455 ]
        Hide
        Hong Zhang added a comment -

        Elena: I just checked in a fix for 18857, when you verify fix for that, please also re-run the tests for this issue to see if they are also addressed by that fix. Thanks.

        Show
        Hong Zhang added a comment - Elena: I just checked in a fix for 18857, when you verify fix for that, please also re-run the tests for this issue to see if they are also addressed by that fix. Thanks.
        Hide
        aelena added a comment -

        I've re-run the test against the latest nightly build. The bug 18857 was really fixed. But this issue still exists. The second deployment of the timer app, that created the syntax error before, now creates the same "timer" error: "EJB Timer Service is not available"

        Show
        aelena added a comment - I've re-run the test against the latest nightly build. The bug 18857 was really fixed. But this issue still exists. The second deployment of the timer app, that created the syntax error before, now creates the same "timer" error: "EJB Timer Service is not available"
        Hide
        Hong Zhang added a comment -

        Yes, it seems the issue with the client jar retrieve is solved but there is some issue with timer specific things.

        I was able to reproduce the problem by doing the following steps:

        1. create a cluster with two instances, start cluster and start database
        2. create a resource reference on cluster for timer pool:
        asadmin create-resource-ref --target cluster1 jdbc/__TimerPool
        3. deploy the timersession.ear attached in 18857 to the cluster:
        asadmin deploy --target cluster1 --name=timersession:1.0 --retrieve . timersession.ear

        and I got similar error messages with SQL/tables etc. I will let Marina do some further investigation from here.

        Show
        Hong Zhang added a comment - Yes, it seems the issue with the client jar retrieve is solved but there is some issue with timer specific things. I was able to reproduce the problem by doing the following steps: 1. create a cluster with two instances, start cluster and start database 2. create a resource reference on cluster for timer pool: asadmin create-resource-ref --target cluster1 jdbc/__TimerPool 3. deploy the timersession.ear attached in 18857 to the cluster: asadmin deploy --target cluster1 --name=timersession:1.0 --retrieve . timersession.ear and I got similar error messages with SQL/tables etc. I will let Marina do some further investigation from here.
        Hong Zhang made changes -
        Assignee Hong Zhang [ hzhang_jn ] marina vatkina [ mvatkina ]
        Component/s ejb_container [ 10596 ]
        Component/s deployment [ 10594 ]
        Hide
        marina vatkina added a comment -

        Of course this would fail. You can't use default TimerPool in the cluster - it points to the embedded (i.e. each instance owned) derby, not a cluster-wide database.

        Show
        marina vatkina added a comment - Of course this would fail. You can't use default TimerPool in the cluster - it points to the embedded (i.e. each instance owned) derby, not a cluster-wide database.
        marina vatkina made changes -
        Assignee marina vatkina [ mvatkina ] Hong Zhang [ hzhang_jn ]
        Component/s deployment [ 10594 ]
        Component/s ejb_container [ 10596 ]
        Hide
        Hong Zhang added a comment -

        Ok, these are just the steps I used and I guess they are not correct.

        Elena: could you provide the set of the steps you used (not to run the whole test suite, but just enough to reproduce the issue)?

        Show
        Hong Zhang added a comment - Ok, these are just the steps I used and I guess they are not correct. Elena: could you provide the set of the steps you used (not to run the whole test suite, but just enough to reproduce the issue)?
        Hide
        aelena added a comment -

        I've executed the follow commands:

        asadmin start-database
        Then created a cluster with two instances, see setup_cl.pl
        After that created timer configuration, see workaround.sh

        Then can be executed:

        asadmin deploy --target $CLUSTER --name=timersession:1.0 --retrieve $OUT_DIR $OUT_DIR/archives_nodb/timer
        session.ear

        Show
        aelena added a comment - I've executed the follow commands: asadmin start-database Then created a cluster with two instances, see setup_cl.pl After that created timer configuration, see workaround.sh Then can be executed: asadmin deploy --target $CLUSTER --name=timersession:1.0 --retrieve $OUT_DIR $OUT_DIR/archives_nodb/timer session.ear
        aelena made changes -
        Attachment workaround.sh [ 50491 ]
        Attachment setup_cl.pl [ 50492 ]
        Hide
        Hong Zhang added a comment -

        Thanks Elena!

        Marina: can you take a look at Elena's steps to see if the steps are the proper steps to deploy an timer related application to cluster?

        Show
        Hong Zhang added a comment - Thanks Elena! Marina: can you take a look at Elena's steps to see if the steps are the proper steps to deploy an timer related application to cluster?
        Hide
        marina vatkina added a comment -

        Elena,

        Does it work if you do not use versioning on the 1st deploy? We have tests with timer app deployed to a cluster and they work fine.

        Show
        marina vatkina added a comment - Elena, Does it work if you do not use versioning on the 1st deploy? We have tests with timer app deployed to a cluster and they work fine.
        Hide
        aelena added a comment -

        As I've mentioned in the description, without versioning deployment, everything works fine. But after versioning deployment of the timer app, all other timer apps deployment without versioning - failed, despite of the restarting domain, DB and clustered instances after the versioning deployment of the timer app.

        And the versioning deployment of the timer app to the cluster failed, but the same deployment to domain doesn't fail (it was first timer app deployment).

        Show
        aelena added a comment - As I've mentioned in the description, without versioning deployment, everything works fine. But after versioning deployment of the timer app, all other timer apps deployment without versioning - failed, despite of the restarting domain, DB and clustered instances after the versioning deployment of the timer app. And the versioning deployment of the timer app to the cluster failed, but the same deployment to domain doesn't fail (it was first timer app deployment).
        Hide
        Hong Zhang added a comment -

        Deploy to domain is ok is probably because the initial deployment to domain does not load the application on any target so that part of the code path is not executed yet.

        Marina: if elena's steps look ok to you, do you want to use her steps to look into why the versioning would make any difference here? According to elena, this used to work..

        Show
        Hong Zhang added a comment - Deploy to domain is ok is probably because the initial deployment to domain does not load the application on any target so that part of the code path is not executed yet. Marina: if elena's steps look ok to you, do you want to use her steps to look into why the versioning would make any difference here? According to elena, this used to work..
        Hide
        Alex Pineda added a comment -

        Adding a regression tag for QA tracking purposes.

        Show
        Alex Pineda added a comment - Adding a regression tag for QA tracking purposes.
        Alex Pineda made changes -
        Tags 40-regression
        Hide
        aelena added a comment -

        Executed a test against b48, still see this issue.

        Show
        aelena added a comment - Executed a test against b48, still see this issue.
        Hide
        Hong Zhang added a comment -

        I tried a few weeks ago, set up the cluster and used your workaround.sh to set up resources, and then deploy the timersession.ear to the cluster, I did not see any exception.

        Can you remind me what's the simplest set of the steps to reproduce the problem again?

        Do I have to deploy the timersession.ear to domain first, and then to the cluster target?

        Show
        Hong Zhang added a comment - I tried a few weeks ago, set up the cluster and used your workaround.sh to set up resources, and then deploy the timersession.ear to the cluster, I did not see any exception. Can you remind me what's the simplest set of the steps to reproduce the problem again? Do I have to deploy the timersession.ear to domain first, and then to the cluster target?
        Hide
        aelena added a comment -

        I've executed only versioning test, where were such commands for timersession:
        ===================================================
        asadmin deploy --target domain --name=timersession:1.0 --retrieve /export/hudson/workspace/deploy
        ment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/time
        rsession.ear

        asadmin undeploy --target domain timersession:*

        asadmin deploy --target my-c1 --name=timersession:1.0 --retrieve /export/hudson/workspace/deploym
        ent-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timer
        session.ear
        asadmin undeploy --target my-c1 timersession:*
        =================================================

        All these commands were executed successfully.

        Then I've restarted domain, cluster and DB. After that I've executed such command:

        =========================================================
        asadmin deploy --target my-c1 --retrieve /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear
        remote failure: Error occurred during deployment: Exception while deploying the app [timersession] : Failed to create automatic timers for TimerSessionEJB – null. Please see server.log for more details.
        Command deploy failed.
        =================================================

        So this deployment failed.

        Show
        aelena added a comment - I've executed only versioning test, where were such commands for timersession: =================================================== asadmin deploy --target domain --name=timersession:1.0 --retrieve /export/hudson/workspace/deploy ment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/time rsession.ear asadmin undeploy --target domain timersession:* asadmin deploy --target my-c1 --name=timersession:1.0 --retrieve /export/hudson/workspace/deploym ent-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timer session.ear asadmin undeploy --target my-c1 timersession:* ================================================= All these commands were executed successfully. Then I've restarted domain, cluster and DB. After that I've executed such command: ========================================================= asadmin deploy --target my-c1 --retrieve /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3 /export/hudson/workspace/deployment-w/appserver-sqe/pe/deployment_v3/archives_nodb/timersession.ear remote failure: Error occurred during deployment: Exception while deploying the app [timersession] : Failed to create automatic timers for TimerSessionEJB – null. Please see server.log for more details. Command deploy failed. ================================================= So this deployment failed.
        Hide
        Hong Zhang added a comment -

        Thanks Elena! A couple follow up questions, if you don't restart domain/cluster/DB before the failed deployment, will that deployment still fail?

        If we execute the exact same sequence of the steps, deploy/undeploy to domain, deploy/undeploy to cluster, restart everything, deploy to cluster again, just without using versioning, will the exact same sequence fail also?

        Show
        Hong Zhang added a comment - Thanks Elena! A couple follow up questions, if you don't restart domain/cluster/DB before the failed deployment, will that deployment still fail? If we execute the exact same sequence of the steps, deploy/undeploy to domain, deploy/undeploy to cluster, restart everything, deploy to cluster again, just without using versioning, will the exact same sequence fail also?
        Hide
        aelena added a comment -

        I've installed glassfish, created a cluster with two inst, executed workaround.sh then deployed/undeployed to domain and then to the cluster, the deployment to the cluster failed.
        ============================================
        /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target domain --retrieve . archives_nodb/timersession.ear
        Application deployed with name timersession.

        /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin undeploy --target domain timersession
        Command undeploy executed successfully.

        /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target my-c1 --retrieve . archives_nodb/timersession.ear
        Application deployed with name timersession.
        WARNING: Command _deploy did not complete successfully on server instance my-in1: remote failure: Failed to load the application on instance my-in1. The application will not run properly. Please fix your application and redeploy.
        Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details.
        WARNING: Command _deploy did not complete successfully on server instance my-in2: remote failure: Failed to load the application on instance my-in2. The application will not run properly. Please fix your application and redeploy.
        Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details.

        Show
        aelena added a comment - I've installed glassfish, created a cluster with two inst, executed workaround.sh then deployed/undeployed to domain and then to the cluster, the deployment to the cluster failed. ============================================ /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target domain --retrieve . archives_nodb/timersession.ear Application deployed with name timersession. /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin undeploy --target domain timersession Command undeploy executed successfully. /export/hudson/workspace/deployment-w/glassfish3/glassfish/bin/asadmin deploy --target my-c1 --retrieve . archives_nodb/timersession.ear Application deployed with name timersession. WARNING: Command _deploy did not complete successfully on server instance my-in1: remote failure: Failed to load the application on instance my-in1. The application will not run properly. Please fix your application and redeploy. Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details. WARNING: Command _deploy did not complete successfully on server instance my-in2: remote failure: Failed to load the application on instance my-in2. The application will not run properly. Please fix your application and redeploy. Exception while loading the app : EJB Timer Service is not available. Please see server.log for more details.
        Hide
        Hong Zhang added a comment -

        Thanks Elena! So from the commands you used, this problem does happen to non-versioned deployment as well?

        Show
        Hong Zhang added a comment - Thanks Elena! So from the commands you used, this problem does happen to non-versioned deployment as well?
        Hide
        aelena added a comment -

        Yes.

        Show
        aelena added a comment - Yes.
        Hide
        Hong Zhang added a comment -

        Elena, thanks for confirming. In this case, I will assign to the ejb team for further evaluation on this.

        Show
        Hong Zhang added a comment - Elena, thanks for confirming. In this case, I will assign to the ejb team for further evaluation on this.
        Hong Zhang made changes -
        Assignee Hong Zhang [ hzhang_jn ] marina vatkina [ mvatkina ]
        Component/s ejb_container [ 10596 ]
        Component/s deployment [ 10594 ]
        Show
        marina vatkina added a comment - Elena, I'm confused. The comment http://java.net/jira/browse/GLASSFISH-18858?focusedCommentId=344054&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_344054 says that only versioned deploy fails.
        Hide
        aelena added a comment -

        The problem happened when first was executed the deployment of the timer app to the domain. Then the deployment of any timer app, for example to the cluster, failed. It doesn't matter, whether the domain deployment of the timer app, used versioning or not.

        Also, after the domain deployment happened, then any timer deployment failed, independently whether the domain/cluster/db were restarted or not.

        Show
        aelena added a comment - The problem happened when first was executed the deployment of the timer app to the domain. Then the deployment of any timer app, for example to the cluster, failed. It doesn't matter, whether the domain deployment of the timer app, used versioning or not. Also, after the domain deployment happened, then any timer deployment failed, independently whether the domain/cluster/db were restarted or not.
        Hide
        marina vatkina added a comment -

        I'm surprised it ever worked. We do not support more than one configuration for the EJB TS on a domain, but domain undeploy tries to remove the timers, and not knowing that they never ran, or if the app was ever enabled in any instance, looks for the TS config on the domain, which points to the embedded pool. The next (real) deploy hits an existing TS and doesn't create the table in the cluster-specific resource.

        Show
        marina vatkina added a comment - I'm surprised it ever worked. We do not support more than one configuration for the EJB TS on a domain, but domain undeploy tries to remove the timers, and not knowing that they never ran, or if the app was ever enabled in any instance, looks for the TS config on the domain, which points to the embedded pool. The next (real) deploy hits an existing TS and doesn't create the table in the cluster-specific resource.
        Hide
        aelena added a comment -

        This a regression issue. Everything worked fine for GF 3.1.2 and it worked fine, for example, for GF 4.0 b35.

        Show
        aelena added a comment - This a regression issue. Everything worked fine for GF 3.1.2 and it worked fine, for example, for GF 4.0 b35.
        Hide
        marina vatkina added a comment -

        It was a false positive. The behavior was actually wrong. But I can restore it, and then create a feature request to do it right.

        Show
        marina vatkina added a comment - It was a false positive. The behavior was actually wrong. But I can restore it, and then create a feature request to do it right.
        Hide
        marina vatkina added a comment -

        Fixed with rev 55512 by keeping resource name null for target 'domain'. Timers can't be removed when undeploy target is 'domain' (may be we should document it).

        Show
        marina vatkina added a comment - Fixed with rev 55512 by keeping resource name null for target 'domain'. Timers can't be removed when undeploy target is 'domain' (may be we should document it).
        marina vatkina made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 4.0_b50_ms4 [ 15639 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            marina vatkina
            Reporter:
            aelena
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: