mlchem.ml.feature_selection.filters.collinearity_filter¶
- collinearity_filter(df: DataFrame, threshold: float, target_variable: str = None, method: Literal['pearson', 'kendall', 'spearman'] = 'pearson', numeric_only: bool = False) DataFrame¶
Filter features based on collinearity threshold.
Returns a subset of DataFrame columns whose squared correlation (R²) values are below the specified threshold. If a target variable is provided, the function retains the feature with the higher correlation to the target when multiple features are collinear.
- Parameters:
df (pandas.DataFrame) – The input dataset.
threshold (float) – The maximum allowed squared correlation between features.
target_variable (str, optional) – The name of the target variable. If provided, it is used to resolve collinearity conflicts.
method ({'pearson', 'kendall', 'spearman'}, optional) – The correlation method to use. Default is ‘pearson’.
numeric_only (bool, optional) – Whether to include only numeric columns. Default is False.
- Returns:
A DataFrame containing the filtered columns.
- Return type:
pandas.DataFrame