mlchem.ml.feature_selection.filters.collinearity_filter

collinearity_filter(df: DataFrame, threshold: float, target_variable: str = None, method: Literal['pearson', 'kendall', 'spearman'] = 'pearson', numeric_only: bool = False) DataFrame

Filter features based on collinearity threshold.

Returns a subset of DataFrame columns whose squared correlation (R²) values are below the specified threshold. If a target variable is provided, the function retains the feature with the higher correlation to the target when multiple features are collinear.

Parameters:
  • df (pandas.DataFrame) – The input dataset.

  • threshold (float) – The maximum allowed squared correlation between features.

  • target_variable (str, optional) – The name of the target variable. If provided, it is used to resolve collinearity conflicts.

  • method ({'pearson', 'kendall', 'spearman'}, optional) – The correlation method to use. Default is ‘pearson’.

  • numeric_only (bool, optional) – Whether to include only numeric columns. Default is False.

Returns:

A DataFrame containing the filtered columns.

Return type:

pandas.DataFrame