Bugzilla – Bug 5675
Metric values are 0 when batch runs in a partition
Last modified: 2015-09-01 21:05:52 UTC
When running a job with a partition, the Metric array from the StepExecution has all it's values equal to 0. Removing the partition, the Metric array shows the expected values for the data processed.
You can check a couple of test cases the reproduce the problem here:
And here is the code:
Since I believe it to be related, it's worth noting that the persistent user data is also not available for partitioned steps. I'm guessing that a solution won't be possible with the 1.0 specification (since the persistent data can't be aggregated like the Metrics can be, possibly).
Hi, I had realized the Metrics weren't set up correctly for partitioned steps, thanks for opening the bug to track this.
Question to m_edgar: what issue are you having with the persistent user data?
I'm wondering if this is working as designed and your issue relates to the fact that each partition gets its own StepContext, and along with it its own persistent data, in addition to the top-level thread's StepContext and persistent data.
Could you give some sample code and explain?
(In reply to ScottKurz from comment #2)
The persistent data is working as expected within the partitions as they execute. However, my issue is with accessing the data via the job operator and StepExecution list. Since only the execution representing the parent thread of the step is returned, none of the details of the partitions is available.
It seems that the missing metrics from the parent of the step is the same situation, since they are present on the child records as seen in the job repository.
Would a modified runtime which returns both the parent as well as the children step executions for a particular step still be in compliance with the specification? It seems to me that it (the spec) doesn't necessarily indicate whether the StepExecution list returned by the job operator are for the parent, the children, or both.
While we could improve the behavior in the RI alone w/o a spec update, this isn't the first time we've touched on a need to possibly consider from a spec view how to view the partition-level equivalent of StepExecution.
Marking as "SPEC".
I'm breaking off the idea of a partition-level StepExecution into Bug 6490, since this would require a new API.
For this bug, 5675, we'll just fix the RI to aggregate the metrics. I started it but haven't finished yet.
We aggregate the metrics (not mentioned in spec IIRC but probably non-controversial). We only do the summation on a successful execution. If we blow up before then you'll never see the metrics again (avoids the need to understand when they need updating).
Extended fix to case where partition runs from a split-flow in:
Marking as resolved, now that the RI 1.0.1 version has been released.
I'll just note that the behavior of aggregating the metrics should probably be considered to be an RI-specific behavior at this time (not required by the standard). If someone feels a need to clarify this at the spec level, please raise a new issue. I think it's OK to leave this for now.