Welcome to this Wiki ! This project is still under construction.
In RapidMiner, sample data is available in processes in the form of ExampleSets. When entering the Octave script operator, data (an object instance of the ExampleSet java class) first needs to be transposed into one of Octave supported data types so that it is available in the Octave environment when the script executes (1). JavaOctave API provides the appropriate java classes to create Octave data, in particular OctaveCells and OctaveStructs, and to send these objects to Octave (2). Then the Octave script can be executed, and we need to redirect any output in RapidMiner’s log window during this execution. When execution is finished, results (3) need to be imported back into RapidMiner. Once again JavaOctave provides the API to create the appropriate java objects, and we can then build the appropriate ExampleSets based on the cell or struct’s content (4). Note: this mapping is described in details in the following chapters.
This new operator is quite intuitive for the analyst to handle. It accepts any number of inputs of type ExampleSet, converts them into Cell or Struct and makes them available in the Octave environment that will execute the script. The name of each input in the Octave environment, as well as the type of conversion (cell or struct) is defined in the corresponding “inputs” parameter.
The Octave script to be executed is stored in the “script” parameter, as shown below in Figure 4. The script may use any of the input variables defined in the inputs configuration. Note also that printing functions will be outputted correctly to RapidMiner log viewer.
Finally, the outputs to be imported back into RapidMiner must also be declared in the “results” parameters. Type (cell or struct) is directly inferred here so there is nothing than the name to fill (“data table” is currently the only possible mapping).
The RapidMiner ExampleSet data type contains very specific information (attribute names, roles, and optionally levels for nominal attributes) so there is a need to express this information in Octave in an importable and exportable way. There is no one best way to encode this in Octave so we suggest two methods, based on the Cell and Struct data types.
This is pretty straightforward – Name and roles are encoded by strings in the respective name and role fields of the resulting structure. Data is encoded in a matrix, but since only numerical data can fit an Octave matrix, nominal data is encoded on the fly using a dictionary stored in the “level” field.
The cell encoding is even simpler, as each attribute data is stored in a separate entry (cell or matrix) in the 3d row of the global cell.
Depending on the type of algorithm, a developer will typically prefer to use the struct encoding (for direct matrix manipulation) or the cell one (for straightforward nominal data handling). The output data format is free for the script developer to choose between the two above defined. The type will be inferred and the output(s) will be translated back into ExampleSet(s).