Target Shuffling
This node performs Target Shuffling by randomly permuting the values
in one column of the input table. This will break any connection
between input variables (learning columns) and response variable
(target column) while retaining the overall distribution of the
target variable. Target shuffling is used to estimate the baseline
performance of a predictive model. It's expected that the quality of
a model (accuracy, area under the curve, R², ...) will decrease
drastically if the target values were shuffled as any relationship
between input and target was removed.
It's advisable to repeat this process (target shuffling + model
building + model evaluation) many times and record the bogus result
in order to receive good estimates on how well the real model
performs in comparison to randomized data.
Target shuffling
is
sometimes called
randomization test
or y-scrambling. For more information see also
Handbook of Statistical Analysis and Data Mining Applications
by Gary Miner, Robert Nisbet,
John Elder IV.
Dialog Options
- Column name
- Name of the column to shuffle
- Seed
-
Enter a seed for the random number generator.
Entering a seed
will cause the node to shuffle the same input data
always the
same way (e.g. if you reset and execute the node). Disable
this option to
have always a different seed, i.e. real randomness.
Ports
Output Ports
0 |
Input table with values
shuffled in one column
|
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.