mlchem.ml.preprocessing.undersampling.undersample¶

undersample(train_set: DataFrame, test_set: DataFrame, class_column: str, desired_proportion_majority: float, add_dropped_to_test: bool = False, random_seed: int | None = 1) → tuple[DataFrame, DataFrame]¶

Undersample the majority class in a training set to achieve a desired class balance.

Parameters:

train_set (pandas.DataFrame) – The training dataset.
test_set (pandas.DataFrame) – The test dataset.
class_column (str) – Name of the column containing class labels.
desired_proportion_majority (float) – Desired proportion of the majority class in the training set.
add_dropped_to_test (bool, default=False) – Whether to add the dropped samples to the test set.
random_seed (int, optional) – Random seed for reproducibility.

Returns:

The undersampled training set and the updated test set.

Return type:

tuple of pandas.DataFrame

mlchem.ml.preprocessing.undersampling.undersample¶

Table of Contents

This Page