mlchem.ml.preprocessing.undersampling.undersample¶
- undersample(train_set: DataFrame, test_set: DataFrame, class_column: str, desired_proportion_majority: float, add_dropped_to_test: bool = False, random_seed: int | None = 1) tuple[DataFrame, DataFrame]¶
Undersample the majority class in a training set to achieve a desired class balance.
- Parameters:
train_set (pandas.DataFrame) – The training dataset.
test_set (pandas.DataFrame) – The test dataset.
class_column (str) – Name of the column containing class labels.
desired_proportion_majority (float) – Desired proportion of the majority class in the training set.
add_dropped_to_test (bool, default=False) – Whether to add the dropped samples to the test set.
random_seed (int, optional) – Random seed for reproducibility.
- Returns:
The undersampled training set and the updated test set.
- Return type:
tuple of pandas.DataFrame