Binner (Dictionary)
Categorizes values in a column according to a dictionary table
with min/max values. The table at the first input contains a
column with values to be categorized. The second table contains
a column with lower bound values, a column with upper bound
values and a column with label values. The latter will be used as
outcome in case a given value is between the corresponding lower
and upper bound. Each row in the second table represents a rule,
whereby the rules are evaluated top-down, i.e. rules with low
row index have higher priority than rules in the subsequent rows.
Either the lower or upper bound test can be disabled by unsetting
the corresponding checkbox in the dialog. Missing values in the
columns containing upper and lower bounds will always evaluate
the bound check to true. That is, a missing value in the lower
bound column will always be smaller and a missing value in the
upper bound column will always be larger than the value. Missing
values in the value column (1st input) will result in a missing
cell output (no categorization).
Note: The table containing bound and label information
(2nd input) will be read into memory during execution; it must be
a relatively small table!
Dialog Options
- Value Column to bin (1st port)
-
Select the column in the first input table containing the values
to be categorized.
- Lower Bound Column (2nd port)
-
If enabled, select the column containing the lower bound values.
Choose whether a value must be strictly smaller or smaller or equal
than the lower bound value by selecting the "Inclusive" checkbox
(if checked it will be smaller or equal).
- Upper Bound Column (2nd port)
-
If enabled, select the column containing the upper bound values.
Choose whether a value must be strictly larger or larger or equal
than the upper bound value by selecting the "Inclusive" checkbox
(if checked it will be larger or equal).
- Label Column (2nd port)
-
Select the label column from the 2nd input that will be appended as
category column to the output table.
- If no rule matches
-
Choose the behavior if none of the rules in the 2nd input fires for
a given value: Select "Fail" to make this node fail during execution
(reasonable when the rule table is assumed to cover the entire domain)
or "Insert Missing" to insert missing values as result.
- Search Pattern
-
Linear Search scans all rules sequentially in the order they are defined
in the rule table and returns the label of the first rule that matches.
Binary search only works if both limits are specified (and not missing);
it sorts the rules based on their lower and upper limits and performs a
binary search to find the matching label. Binary search might not be
deterministic if rules overlap. It is much faster if there is a large
rule set (tens of thousands or millions). If in doubt, use linear search.
Ports
Input Ports
0 |
Arbitrary input data with column to be binned.
|
1 |
Table containing categorization rules with lower and upper bound
and the label column.
|
Output Ports
0 |
Input table amended by column with categorization values.
|
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.