Bug 5875

Summary: When the first readItem() in a chunk return 'null', is this a zero-item chunk or is this not a new chunk after all?
Product: jbatch Reporter: ScottKurz
Component: TCKAssignee: ScottKurz
Status: NEW ---    
Severity: enhancement CC: issues
Priority: P5    
Version: 1   
Target Milestone: ---   
Hardware: PC   
OS: Windows   

Comment 1 ScottKurz 2014-03-27 17:31:39 UTC
One simple change to the spec, and one too complicated to show more than a sample.

The simple change was to the Java doc for: ItemWriter#writeItems 

Change to:
	 * The writeItems method writes a list of item
	 * for the current chunk.
	 * @param items specifies the list of items to write.
	 * This may be an empty list (e.g. if all the
	 * items have been filtered out by the 
	 * ItemProcessor).
	 * @throws Exception is thrown for any errors.
	public void writeItems(List<Object> items) throws Exception;


The more complicated change was to the flow outlines in 11.6-11.10.

Decided that this was a zero-item chunk (rather than "not a chunk").  Illustrated most simply by 11.6 which will look like:

9.  <repeat until no more items (i.e. while readItem hasn't returned 'null') > {
    a.  <begin checkpoint interval [<begin chunk transaction>]>
    b.  <repeat until checkpoint criteria reached OR readItem returns 'null'> {
        i.  <->ItemReader.readItem // thread A
        ii. // if readItem returns non-null
            1.  <->ItemProcessor.processItem // thread A
            2.  // if processItem returns non-null, <add item to writeItems buffer>
    c.  }
    d.  // if at least one non-null value has been successfully read in the present chunk
        i.    <->ItemWriter.writeItems // thread A
    e.  <->[ItemReader.checkpointInfo] // thread A
    f.  <->[ItemWriter.checkpointInfo] // thread A
    g.  <Store StepContext persistent area>
    h.  [<commit chunk transaction>]
10. }


In addition, RI should be fixed to allow for an empty List being passed to ItemWriter#writeItems().
Comment 2 ScottKurz 2014-03-28 14:14:07 UTC
As noted in Brent's email, we have a TCK issue here as well.
Comment 3 ScottKurz 2014-04-04 14:36:44 UTC
In Draft 5, making one further change as discussed in my last post:

In Sec. "11.7	Partitioned Chunk Processing ":
Deleting outline entry 5.e.ix, (otherwise leaving the rest unchanged).

The reason is:  now that we are clear that the collector will always be called at the end of the last chunk (even a zero-item chunk), there is no need for yet another call.

This leaves the TCK behaving per spec (in a somewhat roundabout way).
Comment 4 ScottKurz 2014-05-27 21:55:20 UTC
Noting that I messed up in producing the MR spec draft when reflecting the last change noted above.

In Section 11.7, it should read:

5. <->[PartitionMapper.mapPartitions] // thread A // per partition - on thread Px:
  a. [<begin transaction> ]
  b. <->ItemReader.open // thread Px
  c. <->ItemWriter.open // thread Px
  d. [<commit transaction> ] 
  e. <repeat until no more items (i.e. while readItem hasn't returned 'null') > { 
       i. <begin checkpoint interval [<begin chunk transaction>]> 
       vii. <Store (partition-local) StepContext persistent area>
       viii. [<commit chunk transaction>]
       ix. <->[PartitionCollector.collectPartitionData] // thread Px
  f. }
  g. [<begin transaction> ]
  h. <->ItemWriter.close // thread Px
  i. <->ItemReader.close // thread Px
  j. [<commit transaction> ]
6. [<begin transaction> ] // thread A
7. // Actions 9-12 run continuously until all partitions end.


That is, the collector gets called after the chunk transaction is committed, even if it is a "zero-item" chunk.   The collector does not get called after reader/writer close.

In my last draft I had wrongly remove the collector call in 5.e.ix, when I meant to remove the collector call in 5.k. 

Comment 5 ScottKurz 2014-10-30 21:00:00 UTC
Fixed a piece of this in the RI with:


Changed behavior so we don't call writeItems with an empty list.