Logistic Regression Learner

Performs a multinomial logistic regression. Select in the dialog a target column (combo box on top), i.e. the response. The two lists in the center of the dialog allow you to include only certain columns which represent the (independent) variables. Make sure the columns you want to have included being in the right "include" list. See article in wikipedia about logistic regression for an overview about the topic. This particular implementation uses an iterative optimization procedure termed Fisher's scoring in order to compute the model.
If the optional PMML inport is connected and contains preprocessing operations in the TransformationDictionary those are added to the learned model.

Potential Errors and Error Handling

The computation of the model is an iterative optimization process that requires some properties of the data set. This requires a reasonable distribution of the target values and non-constant, uncorrelated columns. While some of these properties are checked during the node execution you may still run into errors during the computation. The list below gives some ideas what might go wrong and how to avoid such situations.

Dialog Options

Target
Select the target column. Only columns with nominal data are allowed. The reference category is empty if the domain of the target column is not available. In this case the node determines the domain values right before computing the logistic regression model and chooses the last domain value as the targets reference category.
By default the target domain values are sorted lexicographically in the output, but you can enforce the order of the target column domain to be preserved by checking the box.
Note, if a target reference column is selected in the dropdown, the checkbox will have no influence on the coefficients of the model except that the output representation (e.g. order of rows in the coefficient table) may vary.
Values
Specify the independent columns that should be included in the regression model. Numeric and nominal data can be included.
By default the domain values (categories) of nominal valued columns are sorted lexicographically, but you can check that the order from the column domain is used. Please note that the first category is used as a reference when creating the dummy variables.

Ports

Input Ports
0 Table on which to perform regression. The input must not contain missing values, you have to fix them by e.g. using the Missing Values node.
1 Optional PMML port object containing preprocessing operations.
Output Ports
0 Model to connect to a predictor node.
1 Coefficients and statistics of the logistic regression model.

Views

Logistic Regression Result View
Displays the estimated coefficients and error statistics. Note, that the estimated coefficients are not reliable when the standard error is high.
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.