Section 3:

How to implement your own algorithm in a NodeModel with a NodeDialog

We start off by explaining a very simple binner. The bins are equally spaced, such that the whole range of a certain attribute is divided into n intervals. The data points with an attribute value within the k-th interval are considered to belong to the k-th bin. Therefore, the output is the original table with the binning information appended for each instance, i.e. row. The node also requires a dialog, as the user should be able to determine the number of bins and also specify the column on which the values should be binned.

NodeModel:

Before we start to implement the actual binning algorithm in the execute method, we have to define the fields we need in the NodeModel. (After creation the NodeModel already contains exemplary code which can be deleted). A convienent way to exchange the settings from the NodeModel to the NodeDialog is provided by the SettingsModel. As you will see later on, the NodeDialog also works with the SettingsModel, which is why we use them for the number of bins and the column on which the values should be binned:

	// the settings model for the number of bins 
	private final SettingsModelIntegerBounded m_numberOfBins =
		new SettingsModelIntegerBounded(NumericBinnerNodeModel.CFGKEY_NR_OF_BINS,
                    NumericBinnerNodeModel.DEFAULT_NR_OF_BINS,
                    1, Integer.MAX_VALUE);
	
	// the settings model storing the column to bin
	private final SettingsModelString m_column = new SettingsModelString(
            NumericBinnerNodeModel.CFGKEY_COLUMN_NAME, "");
	

In order to obtain the settings from the dialog, they must be written into a NodeSettings object. The NodeSettings transfer the settings from the dialog to the model and vice versa. A key is needed for each field to identify and retrieve it from the NodeSettings. It is good practice to define the static final string used as the key in the NodeModel.

    /** The config key for the number of bins. */ 
    public static final String CFGKEY_NR_OF_BINS = "numberOfbins"; 
    /** The config key for the selected column. */
    public static final String CFGKEY_COLUMN_NAME = "columnName";
	

Transfer of the settings from the NodeModel to the NodeDialog is realized by implementing the validateSettings, loadValidatedSettings and saveSettings methods. All this methods can be safely delegated to the SettingsModels. In the validateSettings method a check is made to see if the values are present and valid (for example in a valid range, etc.).

    /**
     * @see org.knime.core.node.NodeModel
     *      #validateSettings(org.knime.core.node.NodeSettingsRO)
     */
     @Override
    protected void validateSettings(final NodeSettingsRO settings)
            throws InvalidSettingsException {
            
    	// delegate this to the settings models
    	
        m_numberOfBins.validateSettings(settings);
        m_column.validateSettings(settings);
    }
	

When the loadValidatedSettings method is called, the settings are already validated and can be loaded into the local fields, which in this case is the SettingsModels of the number of bins and the selected column.

    /**
     * @see org.knime.core.node.NodeModel
     *      #loadValidatedSettingsFrom(org.knime.core.node.NodeSettingsRO)
     */
     @Override
    protected void loadValidatedSettingsFrom(final NodeSettingsRO settings)
            throws InvalidSettingsException {
            
    	// loads the values from the settings into the models.
        // It can be safely assumed that the settings are validated by the 
        // method below.
        
        m_numberOfBins.loadSettingsFrom(settings);
        m_column.loadSettingsFrom(settings);

    }
	

In the saveSettings method the local fields are written into the settings such that the dialog displays the current values.

    /**
     * @see org.knime.core.node.NodeModel
     *      #saveSettingsTo(org.knime.core.node.NodeSettings)
     */
     @Override
    protected void saveSettingsTo(final NodeSettingsWO settings) {

        // save settings to the config object.
    	
        m_numberOfBins.saveSettingsTo(settings);
        m_column.saveSettingsTo(settings);
    }
	

The above described methods are only one step to check whether the node is executable with the current settings. It is also very important to check whether or not it might work with the incoming data table. This is accomplished by the configure method. The configure method is executed as soon as the inport has been connected. In the small example of our numeric binner, a check is performed to see if at least one numeric column is available and if the incoming data table contains a column with the selected column name. otherwise the node is not executable. The DataTableSpec contains the required information and is passed to the configure method.

    /**
     * @see org.knime.core.node.NodeModel
     *      #configure(org.knime.core.data.DataTableSpec[])
     */
    protected DataTableSpec[] configure(final DataTableSpec[] inSpecs)
            throws InvalidSettingsException {
        // first of all validate the incoming data table spec
        
        boolean hasNumericColumn = false;
        boolean containsName = false;
        for (int i = 0; i < inSpecs[IN_PORT].getNumColumns(); i++) {
            DataColumnSpec columnSpec = inSpecs[IN_PORT].getColumnSpec(i);
            // we can only work with it, if it contains at least one 
            // numeric column
            if (columnSpec.getType().isCompatible(DoubleValue.class)) {
                // found one numeric column
                hasNumericColumn = true;
            }
            // and if the column name is set it must be contained in the data 
            // table spec
            if (m_column != null 
                    && columnSpec.getName().equals(m_column.getStringValue())) {
                containsName = true;
            }
            
        }
        if (!hasNumericColumn) {
            throw new InvalidSettingsException("Input table must contain at " 
                    + "least one numeric column");
        }
        
        if (!containsName) {
            throw new InvalidSettingsException("Input table contains not the " 
                    + "column " + m_column.getStringValue() + " . Please (re-)configure " 
                    + "the node.");
        }
        
        
        // so far the input is checked and the algorithm can work with the 
        // incoming data
        ...
	

Just as we rely on the incoming specification of the data, the successor nodes also require information about the data format, which is provided after execution. For this reason, a specification for the output of our node must also be created in the configure method.

    ...
	// now produce the output table spec, 
    // i.e. specify the output of this node
    DataColumnSpec newColumnSpec = createOutputColumnSpec();
    // and the DataTableSpec for the appended part
    DataTableSpec appendedSpec = new DataTableSpec(newColumnSpec);
    // since it is only appended the new output spec contains both:
    // the original spec and the appended one
    DataTableSpec outputSpec = new DataTableSpec(inSpecs[IN_PORT],
            appendedSpec);
    return new DataTableSpec[]{outputSpec};
	...
	

Since a DataColumnSpec must be created for the newly appended column in both the configure and the execute method, the code for the creation of the DataColumnSpec is extracted in a separate method:

    private DataColumnSpec createOutputColumnSpec() {
        // we want to add a column with the number of the bin 
        DataColumnSpecCreator colSpecCreator = new DataColumnSpecCreator(
                "Bin Number", IntCell.TYPE);
        // if we know the number of bins we also know the number of possible
        // values of that new column
        DataColumnDomainCreator domainCreator = new DataColumnDomainCreator(
                new IntCell(0), new IntCell(m_numberOfBins.getIntValue() - 1));
        // and can add this domain information to the output spec
        colSpecCreator.setDomain(domainCreator.createDomain());
        // now the column spec can be created
        DataColumnSpec newColumnSpec = colSpecCreator.createSpec();
        return newColumnSpec;
    }
	

Once this has been completed and implemented, the actual algorithm for equidistant binning can be written. The algorithm operating on the data must be placed in the execute method. In this example only one column is appended to the original data. For this purpose the so-called ColumnRearranger is used. It requires a CellFactory, which returns the appended cells for a given row.

        ...        
	    // instantiate the cell factory
        CellFactory cellFactory = new NumericBinnerCellFactory(
               createOutputColumnSpec(), splitPoints, colIndex);
        // create the column rearranger
        ColumnRearranger outputTable = new ColumnRearranger(
                inData[IN_PORT].getDataTableSpec());
        // append the new column
        outputTable.append(cellFactory);
	    ...
	

Having created the ColumnRearranger, it can be transferred together with the input table to the ExecutionContext to create a BufferedDataTable which is returned by the execute method, i.e. provided at the outport. Each node buffers the data in a BufferedDataTable. In order to avoid redundant buffering of the same data the ColumnRearranger is used. In this way only the appended column is buffered in our node. That is why we have to retrieve the BufferedDataTable from the ExecutionContext:

	    ...
        // and create the actual output table
        BufferedDataTable bufferedOutput = exec.createColumnRearrangeTable(
                inData[IN_PORT], outputTable, exec);
        // return it
        return new BufferedDataTable[]{bufferedOutput};	
        ...
	

For purposes of the CellFactory it is necessary to implement a NumericBinnerCellFactory. This extends the SingleCellFactory and only implements the getCell method. The passed row is checked to find out which bin contains the value from the selected column. It returns the number of the bin as a DataCell.

    /**
     * @see org.knime.core.data.container.SingleCellFactory#getCell(
     * org.knime.core.data.DataRow)
     */
    @Override
    public DataCell getCell(DataRow row) {
        DataCell currCell = row.getCell(m_colIndex);
		// check the cell for missing value
        if (currCell.isMissing()) {
            return DataType.getMissingCell();
        }
        double currValue = ((DoubleValue)currCell).getDoubleValue();
        int binNr = 0;
        for (Double intervalBound : m_intervalUpperBounds) {
            if (currValue <= intervalBound) {
                return new IntCell(binNr);
            }
            binNr++;
        }
        return DataType.getMissingCell();
    }
	

NodeDialog:

When the NumericBinnerNodeDialog is created you will see that the constructor already contains some exemplary code. You may delete it and add instead the code for your desired control elements. For the NumericBinnerNodeDialog we need two GUI elements: one to set the number of bins and one to select the column for the binning. The KNIME framework provides a very convenient setting to apply standard dialog elements to the NodeDialog. Thus, your NumericBinnerNodeDialog extends the DefaultNodeSettingsPane by default. If the default dialog components do not suit your needs, for example if some components should be enabled or disabled depending on the user's settings, you may extend the NodeDialogPane directly. In our case a DialogComponentNumber for the number of bins and a DialogComponentColumnSelection need to be added. Each component's constructor requires a new instance of a SettingsModel. The SettingsModel expects a string identifier, which it uses to store and load the value of the component, and a default value, which it holds until a new value is loaded. Additional parameters are necessary, depending on the type of component. The loading from and saving to the settings is executed automatically via the key passed in the constructor. We recommend using the key defined in the NodeModel. If you do this, you must make it public at this point.

public class NumericBinnerNodeDialog extends DefaultNodeSettingsPane {

    /**
     * New pane for configuring NumericBinner node dialog.
     * Contains control elements to adjust the number of bins 
     * and to select the column to bin.
     * Suppress warnings here: it is unavoidable since the 
     * allowed types passed as an generic array. 
     */
	@SuppressWarnings ("unchecked")
    protected NumericBinnerNodeDialog() {
        super();
        // nr of bins control element
        addDialogComponent(new DialogComponentNumber(
                new SettingsModelIntegerBounded(
                    NumericBinnerNodeModel.CFGKEY_NR_OF_BINS,
                    NumericBinnerNodeModel.DEFAULT_NR_OF_BINS,
                    1, Integer.MAX_VALUE),
                    "Number of bins:", /*step*/ 1));
        // column to bin
        addDialogComponent(new DialogComponentColumnNameSelection(
                new SettingsModelString(
                    NumericBinnerNodeModel.CFGKEY_COLUMN_NAME,
                    "Select a column"),
                    "Select the column to bin",
                    NumericBinnerNodeModel.IN_PORT,
                    DoubleValue.class));                    
    }
}

After you have created your node and have implemented the NodeModel and the NodeDialog don’t forget to edit your node description in the XML file (with exactly the same name as your NodeFactory). Describe your node, the dialog settings, the in- and outports and later on, the view. This is explained in detail in Section 8