Bugzilla – Bug 4532
buffer-items should be about reading and not writing
Last modified: 2013-02-01 23:23:26 UTC
In section 5.2.1 on page 18, the buffer-items attribute talks about buffering the items to be written, however this is incorrect. This is probably my fault, but this attribute (assuming it's intended to replace the is-reader-transactional-queue attribute in SB) should indicate if the input is transactional and therefore should not be buffered…not about how many items to buffer before a write. The item-count handles the buffer size for writing. If you don't want to buffer items to be written, item-count=1 should address this.
Yes, buffer-items is like SB is-reader-transactional-queue. It indicates whether to buffer items read and pass a list to ItemWriter or to call ItemWriter one item at a time.
The item-count is like SB commit-interval - it specifies the chunk size and therefore the commit frequency. So you wouldn't want to specify item-count=1 just to avoid buffering because you'd get the unwanted side effect of checkpointing (commiting) every item.
So the intended behavior is:
Result is read and buffer 10 items, call ItemWriter with list of 10, then commit checkpoint.
Result is read item, write item one item at a time for 10 items, then commit checkpoint.
Do you agree/disagree we need those two behaviors?
Now I'll concede my description in the spec may or may not convey those facts, but that was the intention. For convenience, here's what the spec says about buffer-items:
"Specifies whether items are buffered until it is time to take a checkpoint. It must be the value 'true' or 'false'. It is an optional attribute. The default is true. When items are buffered, a single call to the item writer is made to write the items immediately before the next checkpoint is taken."
"Yes, buffer-items is like SB is-reader-transactional-queue. It indicates whether to buffer items read and pass a list to ItemWriter or to call ItemWriter one item at a time."
That is my point. SB's is-reader-transactional-queue does not have any impact on when the items are passed to the writer. This is controlled solely by the item-count attribute.
Internally as we read items, by default, we buffer them to save the I/O work from rereading them on a retry (among other reasons). So in your example, regardless of if the is-reader-transactional-queue flag is true or false, the ItemReader will always get 10 items.
The purpose of this flag is when reading from a JMS queue. Most ItemReaders can improve performance by caching the items read when a rollback occurs and we need to reread them. However, with a JMS queue, we can't do that because the items will be put back on the queue automatically (if we were to buffer these items, they would end up being read twice...once on the retry and once when we resumed reading from the queue).
Ok, so is_reader_transactional_queue is really just a way to control a retry optimization. is_reader_transactional_queue=false and you retry items from the buffer; is_reader_transactional_queue=false and you retry by re-reading items from the resource.
So here's where I think this leaves us:
1) What you're really saying is that ItemWriter should always be called once with a list of all items in the current chunk.
2) If that's is indeed what you're saying that would imply we don't need the buffer-items option at all. We could remove it.
3) Whether retry reads from a buffer or from a resource is an implementation decision and if it needs any external control, that too should be implementation specific. An implementation could define a step-level property for such purposes.
Please indicate agreement/disagreement with points 1-3 above. Thanks
I agree with all three of those points. Sorry for the confusion about this attribute..
Changes to spec resulting from this bug are:
1) The buffer-items attribute has been removed.
2) Section 5.2.1 states "A single call is made to the ItemWriter to per chunk."