Bugzilla – Bug 4274
Parallel step instances should not stop when another step fails
Last modified: 2013-01-16 16:07:20 UTC
Section 5.5 indicates that when parallel steps in a job are executing and one fails, the other should be marked as STOPPED. Why would the other step be marked as stopped if it is independent of the step that throws an error? Why not let it run to completion?
I agree with this assessment. The job would default to FAILED but the rest of the parallel steps may still succeed.
I've had multiple clients ask to stop partitioned execution completely and immediately if any one partition fails so they can trigger off of job failure to promptly commence problem resolution. So that influenced me here. You can see room for policy. But I don't want to complicate matters. I agree the parallel execution units should be allowed to complete. There are sufficient listeners and partition callbacks for the user to interpose and issue a stop if they want to pre-empt processing.
the updated spec states active partitions are allowed to complete before the job is failed