mlchem.ml.modelling.model_evaluation.y_scrambling

y_scrambling(estimator, train_set: ndarray | DataFrame, y_train: Iterable, test_set: ndarray | DataFrame, y_test: Iterable, metric_function: Callable, n_iter: int, plot: bool = True) None

Perform y-scrambling to assess model performance due to chance.

This function evaluates the robustness of a model by randomly shuffling the target variable multiple times and measuring performance on the test set. It compares the distribution of scores from scrambled targets to the actual model performance. More explained at https://doi.org/10.1021/ci700157b.

Parameters:
  • estimator (object) – A scikit-learn compatible estimator.

  • train_set (numpy.ndarray or pandas.DataFrame) – Training feature matrix.

  • y_train (iterable) – Target values for training.

  • test_set (numpy.ndarray or pandas.DataFrame) – Testing feature matrix.

  • y_test (iterable) – Target values for testing.

  • metric_function (callable) – A scoring function that accepts (y_true, y_pred) as arguments.

  • n_iter (int) – Number of shuffling iterations.

  • plot (bool, optional (default=True)) – Whether to display a histogram of the scrambled scores.

Return type:

None