mlchem.ml.feature_selection.filters.diversity_filter¶
- diversity_filter(df: DataFrame, threshold: float, target_variable: str = None) DataFrame¶
Filter features based on diversity ratio using Shannon entropy.
Calculates the diversity ratio of each feature by comparing its Shannon entropy to that of an ideal uniform distribution. Retains features with diversity ratios above the specified threshold.
- Parameters:
df (pandas.DataFrame) – The input dataset.
threshold (float) – The minimum diversity ratio required to retain a feature.
target_variable (str, optional) – The name of the target variable to retain regardless of its diversity score.
- Returns:
A DataFrame containing the filtered columns with diversity higher than the threshold.
- Return type:
pandas.DataFrame