[GLASSFISH-20495] Clean up not happening when application fails to load Created: 08/May/13  Updated: 19/Sep/14

Status: Open
Project: glassfish
Component/s: None
Affects Version/s: 4.0_b87_RC3
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Mitesh Meswani Assignee: michael.y.chen
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When an exception happens while loading an app, the application is undeployed. The clean up logic for this use case needs to be updated to perform full clean up.



 Comments   
Comment by Mitesh Meswani [ 08/May/13 ]

Here is a mail thread between relevant engineers with analysis.

Hi, Marina
Here is the page:
https://wikis.oracle.com/display/GlassFish/3.1DeploymentInternalDesignDoc

Note for this particular use case we are talking about, it's not the normal deployment/undeployment, it's the rollback for the failed deployment. The entry point is in ApplicationLifecycle, the following block:

ProgressTracker tracker = new ProgressTracker() {
@Override
public void actOn(Logger logger)

{ ... }

Depending on the state of where the application failed deployment, the rollback logic will call each component(container)'s stop method, unload method and then clean method.

Yeah, I noticed the asymmetry part also but I don't have an explanation of it as I am not familiar with this part of the container/naming code so not sure what the expected behavior is..

Thanks,

  • Hong

On 5/7/2013 9:24 PM, Marina Vatkina wrote:
> Hi Hong,
>
> I'm not sure if it is part of the same problem or not, or if it had been fixed recently but was hitting a problem once in a while when a failed deployment of a @Startup @Singleton session bean left things behind so that only server restart allowed to deploy the fixed app.
>
> But before we all rush in investigating our parts, do you still have around the wiki page where you described the callbacks and events being called/filred by the dol infrastructure during the deploy/undeploy processing?
>
> What surprises me in the stack traces that Mitesh published, is the asymmetry of the register/unregister calls - there are 2 register at deploy calls from the ejb-container code, and no register calls from the web-container. But each one calls register on undeploy.
>
> thanks,
> -marina
>
> On 5/7/13 6:03 PM, Hong Zhang wrote:
>> Hi, Mitesh
>> Thanks for investigating further on this. I am adding Marina and Shingwai to the thread (and reattaching your earlier attachments so they could reproduce from their side).
>>
>> Looking at the two clean ups happened at normal undeployment [2], one is from EjbApplication.stop, and the other is from WebApplication.stop. And I traced the rollback logic (in ApplicationLifecycle.java) in this case when the application failed loading during server start up:
>>
>> ProgressTracker tracker = new ProgressTracker() {
>> @Override
>> public void actOn(Logger logger) {
>> //loaded is used instead of started to include more modules to
>> //stop. In some modules, the setup and cleanup steps are not
>> //fully symmetric, and to ensure thorough cleanup, we need to
>> //call module.stop() for started modules, and modules that are
>> //loaded but may not be started. Issue 18263
>> for (EngineRef module : get("loaded", EngineRef.class)) {
>> try

{ >> module.stop(context); >> }

catch (Exception e)

{ >> // ignore >> }

>> }
>>
>> The module.stop loop invokes EjbApplication.stop and WebApplication.stop as expected, but seems neither stop method invokes the JNDI clean up similarly as the undeployment code path. Probably this part of the code in EJB and web containers need to be updated to handle the deployment failure roll back case better. I will let Marina and Shingwai do some further investigation on this..
>>
>>
>> Thanks,
>>
>> - Hong
>>
>>
>> On 5/7/2013 4:35 PM, Mitesh Meswani wrote:
>>> I debugged this bit further and the root cause for this is RefCountJndiNameEnvironment for componentId "basicwebapp/_basicwebapp" not getting cleaned up from ComponentEnvManagerImpl.componentId2Env when the exception occurs in @Startup. This results in us reusing the old RefCountJndiNameEnvironment on next deploy and hence the old closed EMF.
>>>
>>> Here is bit more detail:
>>> -When the app is loaded, we register RefCountJndiNameEnvironment for componentId "basicwebapp/_basicwebapp" twice [1] contains the stack frames corresponding to those calls to register.
>>> This results in RefCountJndiNameEnvironment.refcnt being set to 2 for it.
>>> -When the app is normally undeployed, we unregister it twice as required. [2] contains the stack frames corresponding to those calls to unregister
>>> -When we run into the exception in @Singleton, we only unregister it once. [3] contains the stack frame corresponding to that call to unregister.
>>>
>>> Hong,
>>> From this data it seems that the WebContainer is not cleaning up in this situation. Does that analysis look correct to you? What do we need to do so that the web container cleans up in this case?
>>>
>>> [1]
>>> First registration of RefCountJndiNameEnvironment
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.register(ComponentEnvManagerImpl.java:149)
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.bindToComponentNamespace(ComponentEnvManagerImpl.java:267)
>>> at org.glassfish.ejb.startup.EjbDeployer.load(EjbDeployer.java:284)
>>> at org.glassfish.ejb.startup.EjbDeployer.load(EjbDeployer.java:99)
>>> at org.glassfish.internal.data.ModuleInfo.load(ModuleInfo.java:206)
>>> at org.glassfish.internal.data.ApplicationInfo.load(ApplicationInfo.java:313)
>>>
>>> refcnt++ for same RefCountJndiNameEnvironment
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.register(ComponentEnvManagerImpl.java:149)
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.bindToComponentNamespace(ComponentEnvManagerImpl.java:267)
>>> at com.sun.ejb.containers.BaseContainer.setupEnvironment(BaseContainer.java:3911)
>>> at com.sun.ejb.containers.BaseContainer.<init>(BaseContainer.java:786)
>>> at com.sun.ejb.containers.AbstractSingletonContainer.<init>(AbstractSingletonContainer.java:134)
>>> at com.sun.ejb.containers.CMCSingletonContainer.<init>(CMCSingletonContainer.java:76)
>>> at com.sun.ejb.containers.SingletonContainerFactory.createContainer(SingletonContainerFactory.java:68)
>>> at org.glassfish.ejb.startup.EjbApplication.loadContainers(EjbApplication.java:221)
>>> at org.glassfish.ejb.startup.EjbDeployer.load(EjbDeployer.java:291)
>>> at org.glassfish.ejb.startup.EjbDeployer.load(EjbDeployer.java:99)
>>> at org.glassfish.internal.data.ModuleInfo.load(ModuleInfo.java:206)
>>> at org.glassfish.internal.data.ApplicationInfo.load(ApplicationInfo.java:313)
>>>
>>> [2] Clean up when normal undeploy happens on the app.
>>> First unregister
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unregister(ComponentEnvManagerImpl.java:155)
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unbindFromComponentNamespace(ComponentEnvManagerImpl.java:414)
>>> at com.sun.enterprise.web.WebModuleContextConfig.unbindFromComponentNamespace(WebModuleContextConfig.java:446)
>>> at com.sun.enterprise.web.WebModuleContextConfig.stop(WebModuleContextConfig.java:439)
>>> at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:488)
>>> at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:163)
>>> at org.apache.catalina.core.StandardContext.stop(StandardContext.java:6131)
>>> at com.sun.enterprise.web.WebModule.stop(WebModule.java:720)
>>> at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1172)
>>> at com.sun.enterprise.web.WebContainer.unloadWebModule(WebContainer.java:2431)
>>> at com.sun.enterprise.web.WebContainer.unloadWebModule(WebContainer.java:2386)
>>> at com.sun.enterprise.web.WebApplication.stop(WebApplication.java:190)
>>> at org.glassfish.internal.data.EngineRef.stop(EngineRef.java:161)
>>> at org.glassfish.internal.data.ModuleInfo.stop(ModuleInfo.java:324)
>>> at org.glassfish.internal.data.ApplicationInfo.stop(ApplicationInfo.java:380)
>>>
>>> Second unregister
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unregister(ComponentEnvManagerImpl.java:154)
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unbindFromComponentNamespace(ComponentEnvManagerImpl.java:414)
>>> at com.sun.ejb.containers.BaseContainer.doContainerCleanup(BaseContainer.java:4332)
>>> at com.sun.ejb.containers.BaseContainer.onShutdown(BaseContainer.java:4200)
>>> at org.glassfish.ejb.startup.SingletonLifeCycleManager.doShutdown(SingletonLifeCycleManager.java:172)
>>> at org.glassfish.ejb.startup.EjbApplication.stop(EjbApplication.java:293)
>>> at org.glassfish.internal.data.EngineRef.stop(EngineRef.java:161)
>>> at org.glassfish.internal.data.ModuleInfo.stop(ModuleInfo.java:324)
>>> at org.glassfish.internal.data.ApplicationInfo.stop(ApplicationInfo.java:380)
>>>
>>>
>>> [3] The only clean up when the exception occurs
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unregister(ComponentEnvManagerImpl.java:154)
>>> at com.sun.enterprise.container.common.impl.ComponentEnvManagerImpl.unbindFromComponentNamespace(ComponentEnvManagerImpl.java:414)
>>> at com.sun.ejb.containers.BaseContainer.doContainerCleanup(BaseContainer.java:4332)
>>> at com.sun.ejb.containers.BaseContainer.undeploy(BaseContainer.java:4165)
>>> at org.glassfish.ejb.startup.EjbApplication.abortInitializationAfterException(EjbApplication.java:365)
>>> at org.glassfish.ejb.startup.EjbApplication.start(EjbApplication.java:169)
>>> at org.glassfish.internal.data.EngineRef.start(EngineRef.java:122)
>>> at org.glassfish.internal.data.ModuleInfo.start(ModuleInfo.java:291)
>>> at org.glassfish.internal.data.ApplicationInfo.start(ApplicationInfo.java:352)
>>> at com.sun.enterprise.v3.server.ApplicationLifecycle.deploy(ApplicationLifecycle.java:497)
>>> at com.sun.enterprise.v3.server.ApplicationLoaderService.processApplication(ApplicationLoaderService.java:407)
>>> at com.sun.enterprise.v3.server.ApplicationLoaderService.postConstruct(ApplicationLoaderService.java:243)
>>> at org.jvnet.hk2.internal.ClazzCreator.postConstructMe(ClazzCreator.java:281)
>>> at org.jvnet.hk2.internal.ClazzCreator.create(ClazzCreator.java:328)
>>> at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:448)
>>> at org.glassfish.hk2.runlevel.internal.AsyncRunLevelContext.findOrCreate(AsyncRunLevelContext.java:163)
>>> at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2204)
>>>
>>>
>>> Thanks,
>>> Mitesh
>>> On 5/7/2013 6:51 AM, Hong Zhang wrote:
>>>> Hi, Mitesh
>>>> I was able to reproduce the problem with the steps, but I am not sure yet what additionally need to be cleaned up to avoid the "attempting to execute operation on closed EMF" failure. The roll back logic we have today calls individual component of the application to stop, unload and clean and that part of the code seems to be executed as expected. I have added some additional rollback logic to clean up classloaders but that did not seem to help. Do we understand what condition would cause "attempting to execute operation on closed EMF" failure so I could look more specifically at those parts to see how the proper clean up could be done. I do see this event was sent out as expected in the roll back logic which is to trigger EMF close:
>>>> events.send(new Event<ApplicationInfo>(Deployment.APPLICATION_DISABLED, appInfo));
>>>>
>>>> Thanks,
>>>>
>>>> - Hong
>>>>
>>>>
>>>>
>>>> On 5/6/2013 10:04 PM, Mitesh Meswani wrote:
>>>>> Hi Hong,
>>>>>
>>>>> I am working with Reza to analyze issue #2 below. Attached is a small app that reproduces the issue. The core issue seems to be that if an application fails to load, we only do partial clean up and hence get into inconsistent state. For example, for the attached app, when the app fails to load, we close the EMF created for the app, however, we do not clean up the naming (and probably classloader). If we redeploy the app, we end up using the old artifacts and obviously get some strange exceptions.
>>>>>
>>>>> The app uses a @Singleton, @Startup bean that in its @PostConstruct method inserts some data into the DB. When we are using this pattern, if GF is restarted, the code in @PostConstruct is triggered again which results in a database exception (because of duplicate key) being thrown from the @PostConstruct method and the app loading fails. Now, if we redeploy this app, we see "Attempting to execute operation on closed EntityManagerFactor" exception. Here are the detailed steps:
>>>>>
>>>>> 1. Start GF
>>>>> -asadmin start-domain
>>>>> 2. Start Derby
>>>>> -asadmin start-database
>>>>> 3. Deploy
>>>>> -asadmin deploy basicwebapp.war #Deployment goes through without any issues
>>>>> 4. Restart GF
>>>>> -asadmin stop-domain
>>>>> -asadmin start-domain # You will see an exception during load because of duplicate key
>>>>> 5. asadmin deploy --force basicwebapp.war # you will see the exception about "attempting to execute operation on closed EMF"
>>>>>
>>>>> Can you please help analyze this further to see what needs to be done at step 4 above so that we do proper clean up.
>>>>>
>>>>> Thanks,
>>>>> Mitesh

Generated at Tue Jun 30 22:12:01 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.