Bugzilla – Full Text Bug Listing
Description mminella 2012-09-28 15:51:14 UTC
* Parallelization (6.5) - Is the role of the PartitionMapper really to calculate the NUMBER of partitions or the data that exists within each partition? I thought it was the latter… * @Begin (126.96.36.199) * remove sub-job verbiage. * How is the logical grouping denoted? In otherwords, who calls the @Begin, etc and when?
Comment 1 cvignola 2012-10-05 17:24:31 UTC
The role of the mapper is decide which items exist in each partition. It does this by assigning properties to each partition, which indirectly defines the number of partitions. These properties communicate to the partition which items to process. Key ranges, line numbers, and file names are all examples of property values that might communicate data range for a given partition. Sub-job verbiage will be removed from next version of spec. @Begin, etc is invoked by the batch runtime. I will add a flow diagram to the spec to spell this out.
Comment 2 cvignola 2013-01-16 15:29:02 UTC
I didn't add a diagram, but I did add a section that enumerate the sequence of events for partition execution.