Bugzilla – Bug 4776
SPEC - TransientUserData should be a key/value pair.
Last modified: 2014-02-25 03:43:08 UTC
I was under the understanding that the transient user data was going to be updated to be a key/value pair. Requiring a single unnamed object seems vague at best.
The same goes for the PersistentUserData
+1. I would again suggest we remvoe all the type parameters <T>, <T, P extends Serialiable> from JobContext and StepContext. user data is not an integral part of a context so IMO we shouldn't include user data information into a context's identity. Having these type parameters also clutters the api for developers and implementers alike.
Users should be able to save different types of data in a context, so restricting to 1 type of data is not helpful.
The single object could be a map, or it could be a single object. Doesn't that satisfy this concern?
The generic gets you a small extra in compile-time checking and saves a cast (though as it's currently holding up Glassfish integration due to a Weld bug I wouldn't argue too strongly against changing).
Serializable gives the user more latitude. You can always adopt a convention of HashMap<String,String> if you want key/value.
Alright, I concede the existing interfaces are little awkward to use as it pertains to transient and persistent user data types. Parameterized types force the developer to always choose types when dealing with JobContext, StepContext, and StepExecution. It's flexible and allows for strong typing, but it imposes a lot on the developer.
Map<Object,Object> among other things has been suggested previously. If we're going to do something like that, I suggest we impose a little more structure by defining a String keyed approach that allows any object for transient, but enforces Serializable for persistent, as follows:
Map<String,Object> for transient data
Map<String,Serializable> for persistent data
No promises, because it's getting late for changes; but this might be one that needs to be made to make the API set consumable.
Sounds good. For flexibility, we could use
Map<String, ?> for transient data
Map<String, ? extends Serializable> for persistent data,
so user can pass in Map<String, String>, or Map<String, Number> in as transient data, and Map<String, String> in as persistent data.
I'm not following the arguments... one angle seems to say using generics makes the API awkward.. but each of the proposals involves generics in some form.
Another angle seems to be that "context" = Map... and anything else makes the API surprising... I can appreciate that too much freedom could make the API look unwieldy at first when getting started.
But this isn't "just a refactoring" and it isn't strikingly better than the status quo.. it seems too late to change at this point.
It is a bit of a pain for the implementor to "do it right" and make all the warnings go away ... you'll see we haven't in the RI.
But the end user could do something like:
private StepContext<Object, HashMap<Object, Serializable>> stepCtx = null;
which doesn't seem too bad or any worse than the other suggestions.
Note that's HashMap, not Map which isn't Serializable. That also raises another question.. depending on whether MyMapType is or isn't Serializable, should we spec whether we do or don't serialize the whole MyMapType object
or just the entries ? I don't think we have time for even that questionn.
I believe I asked the question previously on what the generic type on the contexts themselves was. That didn't make sense to me. I'd rather see the following:
StepContext#setPersistentUserData(String key, Object value)
StepContext#setTransientUserData(String key, Object value)
and the same for the JobContexts. This remove the confusing typing at the class level and makes the API much more straight forward IMHO.
With regards to the serializable aspect, I'd be open to having the PersistentUserData require the value be Serializable as well...
Seems late for just a programming style preference adjustment..
Yes, we need to keep the existing interfaces. They do at least provide the developer with considerable latitude. Specifying JobContext<HashMap<String,String>> is an example of an easy convention for a key-based transient data.
Can we revisit this in 1.1? I think this is a very important part that we can improve to define a structure so various parts of the application can pass multiple values around without conflict.
It's true that application can stick to a convention to use Map<String, ?> as the container for multiple transient context data, but a convention is not a contract. When a batch application consists of components from various source, there is no good way to convey and enforce this convention.
For example, some spec implementations provide ready-to-use ItemReader and ItemWriter artifacts. There is the need to pass certain context data and processing instructions from reader to writer, but I can't reliably use step userTransientData because it may overwrite any existing data, and it may also be overwritten by ItemProcessor, and as a result the writer may never get the data from reader.
One option to support key-value pair while keeping backward compatible is to overload current methods, providing more api choices to users. There are valid use cases for both single-valued and multi-valued user data.
void addTransientUserData(String key, Object data);
//overload current method to take a key param
Object getTransientUserData(String key);
The same applies for persistentUserData, and the same applies for both StepContext and JobContext.