Bug 4098 - More detail needed on execution sequence during restart
More detail needed on execution sequence during restart
Status: CLOSED FIXED
Product: jbatch
Classification: Unclassified
Component: source
1
PC Windows
: P5 enhancement
: ---
Assigned To: cvignola
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-05 14:56 UTC by ScottKurz
Modified: 2013-01-16 17:45 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ScottKurz 2012-09-05 14:56:45 UTC
First, if this is the wrong place to make a note like this, please let me know.   (e.g. if you'd rather I bring it up on the mailing list).  This isn't a bug, it is just a suggestion.   I thought it would be helpful to capture it as an item to be addressed though so I'm starting here.

I think the spec takes too much for granted with respect to the execution sequence during restart.

To kick off the discussion, let me write some sample proposal text.   I'm not necessarily strongly arguing for the interpretations explained below, and I'm more interested in simply having the behavior clarified one way or the other.

Though I expect the examples below are less common or even rare use cases, I do think they are worth clarifying in the spec.

------------------------------------------------------------------------------

Suggested text:

==================
Restart processing
==================

The key idea to understand in restart processing is that while the business logic of the step may or may not need to be rerun, 
(if it already ran to a COMPLETED state in an earlier execution), the decision logic WILL always rerun.   

Though it will probably be an atypical case, the ability to use Job XML substitution combined with the ability to use different
job parameter values on restart (different than the parameter values on the original execution) means that the step sequence could
be substantially different on the restart execution.

Though it's a complicated example, this sequence illustrates the point:

(Note in all these examples I'm using the default value for "allow-start-if-complete", which is 'false')

Say we start with:

<step id="step1">
  <next on="#{jobParameters['parm1']}" to="step2" />
  <next on="*" to="step3" />
  ...

<step id="step3">
  ...
  <stop on="#{jobParameters['parm2']}" restart="step1"/>
  
During the job execution for the original job submission, we have:
parm1 => set to "RC1.a". 
parm2 => set to "RC3.a". 

As it turns out, "step1" ends with exit status "RC1.b".  So execution proceeds
to "step3", which exits with "RC3.a", and so the job is stopped.

During restart, the value of parm1 is overridden to "RC1.b".   The business logic in "step1", having already completed, is not re-run,
and the earlier exit status of "RC1.b" is used.  

However, now the first 'next on' clause matches the "step1" exit status, and so execution proceeds to "step2", even though "step2" may
have not have already run in any previous execution.

If execution ever proceeds to "step3", the business logic in "step3" will NOT be re-run, but the earlier exit status of "RC3.a" will
be used.  This is true even though the execution sequence landed on "step3" via a different route. Of course, if job parameter 'parm2' is not overridden, the job execution will again stop after "step3".

-------

Another point is that, in contrast to step business logic, the decider application logic is always rerun.

I.e. for something like:

<decision id="decision1" ref="MyDecider">
  <properties>	
	<property name="prop1" value="jobParameters['d1.prop1Val']" />
  </properties>

then the logic in "MyDecider" will always rerun, and overridden job parameters in decision properties can similarly cause the execution sequence to differ than that in the original execution.


------------------------------------------------------------------------------

We could go even further.

One more quick example:  be clear that if stepX runs for the first time in execution 2, and stepY for the first time in execution 3, then on execution 4 neither stepX nor stepY will rerun (i.e. it's not just the steps that just ran on the last most recent execution that won't rerun).   Maybe that's clear enough without additionally spelling it out though.
Comment 1 cvignola 2012-10-05 21:44:56 UTC
Agreed.  Expect this in next rev of spec.
Comment 2 ScottKurz 2012-11-21 12:34:52 UTC
Just realizing that in writing this up we should be sure to incorporate, allow-start-if-complete, as this of course will cause the application logic to rerun as well on restart.
Comment 3 mminella 2012-12-03 19:47:15 UTC
This is a big departure from what was expected (and what we thought was implied) in the current version of the spec.  It has been my expectation that since steps are independent components, once a step is complete, it should not be re-evaluated to determine the flow.

The example provided by Scott opens up a large can of worms.  If we follow that logic that all decision logic is rerun...if I have a 3 step job that fails in step 3, this is saying that the job could in theory be restarted from any step on the fly.  Spring Batch (for reference) does not allow parameter injection at this point for this reason.  We do have a "allow-start-if-complete" for steps that should be rerun if the job is restarted.
Comment 4 waynexlund 2012-12-03 21:16:18 UTC
I think this example illuminates the difference in side effects from the natural key approach to job identification vs id generation.  Now that you explain some scenarios I think there are unwanted side effects for steps that are going to be more difficult to manage than immutable jobs / steps once a job is executed. I relented on job identification before because it seemed to be the general feeling of the group but seeing the side effects makes me want to reconsider the discussion around job identification and what value allowing the changing of job parameters adds to the batch DSL.  I remember one member (Tim?) described a scenario that he felt required upstream data cleanup that provoked job parameter changes but there may be better ways to solve that problem.
Comment 5 cvignola 2012-12-06 23:53:04 UTC
Ok, well I didn't expect the firestorm that has now ensued :)  

A few points:

1) The use case for job parameters on restart exists, but is somewhat weak.  So it's not worth breaking a sweat over.

2) If we were determined to have job parameters on restart (which we are not) we could restrict substitution on flow control elements/attributes.  

3) Rather than heroic efforts, I think we should just drop job parameters on restart from the spec.  

4) Dropping job parameters on restart increases harmony with SpringBatch. If some vendor thinks this is important, they are free to add a non-standard extension if they think it adds value.  

5) We are free to propose adding job parameters on restart in a future version of the spec if it ever becomes important.
Comment 6 cvignola 2012-12-06 23:56:12 UTC
So to conclude:

1) I retract my statements in comment #1. 

2) I propose we remove job parameters on restart from the spec.  

3) We still need to work out how restart behaves.  The objective in this instance is to follow the example offered by Spring Batch.  There are some questions about that behavior and some of that behavior may need to be described in the spec.  So that's what we have to work on.