Table Validator

This node ensures a certain table structure and table content. The base for a configuration is given by a reference specification which must be connected to the input port during configuration and provides the basic template for the output table. It is ensured that the result table structure is mostly identical to the reference specification. That is done by resorting of columns, the insertion of missing columns (filled with missing values) and optional removal of additional columns. You can also choose for each column (or a group of them) if it is required and if the data type or the domain should be checked/converted. To make use of this second approach, select a column or a list of columns to be handling, drag them to the appearing "+" button, and set the parameters. To remove this extra handling (and instead use the default handling), click the "Remove" button for this column. If the validation succeeds, data gets output to the first port (potentially renamed, sorted according to the reference specification and with converted types). If the validation fails, the first port is inactive and the second port contains a table that lists all conflicts or the node fails. All options mentioned below marked with Data forces also a traversal of the input data.

Dialog Options

General settings
Behavior on validation issues
Defines how validation faults should influence the following workflow.
  • Fail node - Forces the node to fail; the exception carries a appropriate message containing detailed descriptions about the validation faults. A traversal of the data is canceled if the structural comparison already failed.
  • Deactivate first output port - The node will never fail but the first output port is set inactive. Validation results are presented at the second output port as a data table which contains the Column name, an Error ID (one of: COLUMN_NOT_CONTAINED, CONTAINS_MISSING_VALUE, INVALID_DATATYPE, CONVERTION_FAILED, OUT_OF_DOMAIN) and an human readable Description for each validation fault. The data is completely traversed, independent of potential structural differences. This option is useful if a complete validation of the input data is desired. For example if the workflow is used within the WebPortal, to avoid try and error passes.
Handling of unkown colums
Removes columns which are not included in the reference table spec.
  • Don't allow unknown columns - Unknown columns will force a validation issue.
  • Remove unkown columns - Unknown columns will be removed.
  • Sort them to the end - Unknown columns will shifted to the end of the table.
Validation Settings
Fail if column is missing (Structure)
Ensures that the configured columns exist in the input table. If case insensitive name matching is selected the first matching column will satisfy this condition.
Case insensitive name matching (Structure)
Also columns with an similar name will be considered to be validated according to this configuration. Users should take attention if using this option as the assignment from a column to a configuration is not trivial computed at runtime. The rules are explained in the following.
  1. Exact name match - Assigns the configuration with the exact name. The name is marked as used and cannot match any following input columns again.
  2. First matching configuration - Assigns the first configuration to the column with a matching name, the name is marked as used and cannot match any following input columns again.
Fail on missing value (Data)
Fails if the columns contains any missing value.
Check data type (Structure|Data)
Ensures a correct data type.
  • Fail if different - Fails if the reference data type is not a super type of the input column spec. I.e. it checks that the input column implements all DataValue classes that are also implemented by the reference column's data type.
  • Try to convert; fail if not compatible
  • Try to convert; insert missing if not compatible
Check possible values (Data)
Checks if each data object is contained in the possible values of the reference domain. The option is only enabled if any configured column defines possible values.
  • Fail if out of domain
  • Replace with missing values
Check min & max (Data)
Checks if each data object is between min and max defined by the domain of the reference specification. The option is only enabled if any configured column defines possible values.
  • Fail if out of domain
  • Replace with missing values
Set input table as reference
Sets the input table specification as reference specification.
Reference Spec
Reference Spec
The reference specification.
Input Spec
The input specification. Only visible if it differs from the reference specification.

Ports

Input Ports
0 Table to be validated.
Output Ports
0 Table with corrected and validated structure. Depending on the validation result and the Behavior if validation fails settings, this port may be inactive.
1 Table where missing values have been handled. Depending on the validation result and the Behavior if validation fails settings, this port may be inactive.
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.