mlchem.chem.calculator package¶

Submodules¶

mlchem.chem.calculator.descriptors module¶

get_EHT_descriptors(mol_input: Mol, conf_id: int = -1) → dict¶

Calculate quantum chemistry descriptors using Extended Hückel Theory (EHT).

This function computes various quantum chemistry properties for a 3D-embedded molecule using RDKit’s EHT implementation. It includes orbital energies, overlap matrices, and Mulliken charges.

More information: https://dasher.wustl.edu/chem478/reading/extended-huckel-lowe.pdf

Parameters:

mol_input (rdkit.Chem.rdchem.Mol) – RDKit Mol object with at least one conformer.
conf_id (int, optional) – Conformer ID to use. Default is -1 (use the first conformer).

Returns:

Dictionary containing quantum chemistry descriptors: - AtomicCharges - Hamiltonian - OrbitalEnergies - OverlapMatrix - ReducedChargeMatrix - ReducedOverlapPopulationMatrix - FermiEnergy - NumElectrons - NumOrbitals - TotalEnergy

Return type:

dict

Raises:

ValueError – If the molecule has no conformers.

Examples

>>> get_EHT_descriptors(mol_with_conformer)

get_allDesc(mol_input_list: list[str | Mol] | ndarray[str | Mol], include_3D: bool = False) → DataFrame¶

Calculate both Mordred and RDKit descriptors for a list of molecules.

This function computes both Mordred and RDKit descriptors for each molecule in the input list. If include_3D is True, 3D descriptors are included in both sets.

Parameters:

mol_input_list (list or np.ndarray of str or rdkit.Chem.rdchem.Mol) – List or array of molecules in SMILES format or as RDKit Mol objects.
include_3D (bool, optional) – Whether to include 3D descriptors. Default is False.

Returns:

DataFrame containing the combined descriptors for each molecule.

Return type:

pd.DataFrame

Examples

>>> get_allDesc(["CCO", "c1ccccc1"], include_3D=True)

get_atomicDesc(mol_input: str | Mol, atom_index: int) → DataFrame¶

Calculate atomic descriptors for a specific atom in a molecule.

This function computes a comprehensive set of atomic-level descriptors for a given atom in a molecule. These include properties related to bond types, hybridisation, charges, ring membership, and statistics on neighbouring atoms up to the third order.

Parameters:

mol_input (str or rdkit.Chem.rdchem.Mol) – Molecule in SMILES format or as an RDKit Mol object.
atom_index (int) – Index of the atom for which descriptors are calculated.

Returns:

A DataFrame containing the descriptors for the specified atom.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the molecule cannot be created from the input.
IndexError – If the atom index is out of bounds.

Examples

>>> get_atomicDesc("CC(=O)O", atom_index=1)

get_chemotypes(mol_input_list: list | ndarray[str | Mol], chemotype_dict: dict | None = None) → DataFrame¶

Identify chemotypes for a list of molecules.

This function applies a dictionary of chemotype definitions to each molecule in the input list. Each chemotype is defined by a function and its arguments. If no dictionary is provided, a default one is used.

Parameters:

mol_input_list (list or np.ndarray of str or rdkit.Chem.rdchem.Mol) – List or array of molecules in SMILES format or as RDKit Mol objects.
chemotype_dict (dict, optional) – Dictionary of chemotype definitions. Each entry should be a key with a tuple of (function, argument_dict). If None, a default dictionary is used.

Returns:

DataFrame containing the identified chemotypes for each molecule.

Return type:

pd.DataFrame

Examples

>>> get_chemotypes(["CCO", "c1ccccc1"])

get_fingerprint(mol_input: Mol | str, fp_type: Literal['m', 'ap', 'rk', 'tt', 'mac'] = 'm', radius: int = 2, nBits: int = 2048, include_chirality: bool = False, include_bit_info: bool = False) → tuple | Mol¶

Generate a molecular fingerprint using RDKit.

This function generates a fingerprint for a molecule using one of several RDKit-supported types. Optionally, bit information can be returned for interpretability.

Parameters:

mol_input (str or rdkit.Chem.rdchem.Mol) – Molecule in SMILES format or as an RDKit Mol object.
fp_type ({'m', 'ap', 'rk', 'tt', 'mac'}, optional) – Type of fingerprint to generate: - ‘m’: Morgan - ‘ap’: Atom Pair - ‘rk’: RDKit - ‘tt’: Topological Torsion - ‘mac’: MACCS keys Default is ‘m’.
radius (int, optional) – Radius or path length depending on fingerprint type. Default is 2.
nBits (int, optional) – Size of the fingerprint. Default is 2048.
include_chirality (bool, optional) – Whether to include chirality. Default is False.
include_bit_info (bool, optional) – Whether to return bit information. Default is False.

Returns:

Fingerprint of the molecule. If include_bit_info is True, returns a tuple (fingerprint, bit_info_dict).

Return type:

tuple or rdkit.DataStructs.cDataStructs.ExplicitBitVect

Examples

>>> get_fingerprint("CCO", fp_type='m', include_bit_info=True)

get_fingerprint_df(mol_input_list: list[str | Mol] | ndarray[str | Mol], fp_type: Literal['m', 'ap', 'rk', 'tt', 'mac'] = 'm', radius: int = 2, nBits: int = 2048, include_chirality: bool = False, include_bit_info: bool = False) → DataFrame | tuple[DataFrame, dict]¶

Generate a DataFrame of fingerprints for a list of molecules.

This function computes fingerprints for each molecule in the input list and returns them as a DataFrame. Optionally, bit information can also be returned.

Parameters:

mol_input_list (list or np.ndarray of str or rdkit.Chem.rdchem.Mol) – List or array of molecules in SMILES format or as RDKit Mol objects.
fp_type ({'m', 'ap', 'rk', 'tt', 'mac'}, optional) – Type of fingerprint to generate. Default is ‘m’.
radius (int, optional) – Radius or path length depending on fingerprint type. Default is 2.
nBits (int, optional) – Size of the fingerprint. Default is 2048.
include_chirality (bool, optional) – Whether to include chirality. Default is False.
include_bit_info (bool, optional) – Whether to return bit information. Default is False.

Returns:

DataFrame of fingerprints. If include_bit_info is True, also returns a dictionary of bit information.

Return type:

pd.DataFrame or tuple of (pd.DataFrame, dict)

Examples

>>> get_fingerprint_df(["CCO", "c1ccccc1"], fp_type='m')

get_mordredDesc(mol_input_list: list | ndarray[str | Mol], include_3D: bool = False) → DataFrame¶

Calculate Mordred descriptors for a list of molecules.

This function computes Mordred descriptors for each molecule in the input list. If include_3D is True, 3D descriptors are included.

Parameters:

mol_input_list (list or np.ndarray of str or rdkit.Chem.rdchem.Mol) – List or array of molecules in SMILES format or as RDKit Mol objects.
include_3D (bool, optional) – Whether to include 3D descriptors. Default is False.

Returns:

DataFrame containing the descriptors for each molecule.

Return type:

pd.DataFrame

Examples

>>> get_mordredDesc(["CCO", "c1ccccc1"], include_3D=True)

get_rdkitDesc(mol_input_list: Iterable[str | Mol], include_3D: bool = False) → DataFrame¶

Calculate RDKit descriptors for a list of molecules.

This function computes 2D descriptors for each molecule in the input list. If include_3D is True, it also calculates 3D descriptors and merges them with the 2D descriptors.

Parameters:

mol_input_list (Iterable[str or rdkit.Chem.rdchem.Mol]) – List of molecules in SMILES format or as RDKit Mol objects.
include_3D (bool, optional) – Whether to include 3D descriptors. Default is False.

Returns:

DataFrame containing the descriptors for each molecule.

Return type:

pd.DataFrame

Examples

>>> get_rdkitDesc(["CCO", "c1ccccc1"], include_3D=False)

mlchem.chem.calculator.tools module¶

bernoulli(n: int, k: int, p: float) → float¶

Calculate the Bernoulli probability of k successes in n trials.

This uses the binomial distribution formula.

Parameters:

n (int) – Number of trials.
k (int) – Number of successes.
p (float) – Probability of success on a single trial.

Returns:

Bernoulli probability of k successes in n trials.

Return type:

float

Examples

>>> bernoulli(10, 3, 0.5)

boltzmann_probability(energy_levels: Iterable[float], temperature: int | float, energy_unit: Literal['eV', 'J', 'cal', 'kJ', 'kcal', 'kJ/mol', 'kcal/mol'] = 'kcal/mol') → list[float]¶

Calculate the Boltzmann probability for a set of energy levels at a given temperature.

Parameters:

energy_levels (Iterable[float]) – A list or array of energy levels.
temperature (float or int) – Temperature in Kelvin.
energy_unit (str, optional) – Unit of energy levels. Default is ‘kcal/mol’. Supported units: ‘eV’, ‘J’, ‘cal’, ‘kJ’, ‘kcal’, ‘kJ/mol’, ‘kcal/mol’.

Returns:

Boltzmann probabilities for the given energy levels.

Return type:

list[float]

Examples

>>> boltzmann_probability([0, 1, 2], 298, 'kcal/mol')

calc_centroid(coordinates: ndarray, masses: Iterable | None = None) → ndarray¶

Calculate the centroid of a set of points, optionally weighted by masses.

Parameters:

coordinates (np.ndarray) – A 2D array of shape (N, D) where N is the number of points.
masses (Iterable, optional) – An iterable of length N representing the masses of each point.

Returns:

The coordinates of the centroid.

Return type:

np.ndarray

Examples

>>> calc_centroid(np.array([[0, 0], [2, 0], [1, 2]]))

calc_gyration_tensor(coordinates: ndarray, masses: Iterable | None = None) → ndarray¶

Calculate the gyration tensor of a set of coordinates.

Parameters:

coordinates (np.ndarray) – A 2D array of shape (N, 3) representing spatial coordinates.
masses (Iterable, optional) – An iterable of length N representing the masses of each point.

Returns:

The 3x3 gyration tensor.

Return type:

np.ndarray

Examples

>>> calc_gyration_tensor(np.random.rand(5, 3))

calc_logD_HH(pH: float, logP: float, pKa: float, behaviour: Literal['acid', 'base']) → tuple¶

Calculate the distribution coefficient (logD) at a given pH using the Henderson-Hasselbalch equation.

Parameters:

pH (float) – The pH at which to calculate the distribution coefficient.
logP (float) – The logarithm of the partition coefficient.
pKa (float) – The acid dissociation constant.
behaviour ({'acid', 'base'}) – The behaviour of the molecule.

Returns:

A tuple containing: - Ion-neutral ratio (float) - Ionised percentage (float) - logD (float)

Return type:

tuple

Examples

>>> calc_logD_HH(7.4, 3.0, 4.5, 'acid')

calc_shape_descriptors_from_gyration_tensor(gyration_tensor: ndarray) → dict¶

Calculate shape descriptors from a 3x3 gyration tensor.

Parameters:: gyration_tensor (np.ndarray) – A 3x3 gyration tensor.
Returns:: A dictionary containing: - ‘moments_of_inertia’ - ‘principal_axes’ - ‘asphericity’ - ‘acylindricity’ - ‘relative_shape_anisotropy’
Return type:: dict

Examples

>>> tensor = calc_gyration_tensor(np.random.rand(5, 3))
>>> calc_shape_descriptors_from_gyration_tensor(tensor)

logit_to_proba(logit: float) → float¶

Convert a logit value to probability.

This applies the logistic (sigmoid) function.

Parameters:: logit (float) – Logit value to be converted.
Returns:: Corresponding probability.
Return type:: float

Examples

>>> logit_to_proba(0)
0.5

pairwise_euclidean_distance(matrix: ndarray) → ndarray¶

Calculate the pairwise Euclidean distances between rows of a matrix.

Uses SciPy’s pdist and squareform functions.

Parameters:: matrix (np.ndarray) – Input 2D array of shape (N, D) where N is the number of points.
Returns:: 2D array of pairwise Euclidean distances.
Return type:: np.ndarray

Examples

>>> pairwise_euclidean_distance(np.array([[0, 0], [1, 0], [0, 1]]))

shannon_entropy(vector: ndarray) → float¶

Calculate Shannon entropy of a vector.

This function computes the entropy based on the frequency of unique elements in the input array.

Parameters:: vector (np.ndarray) – Input array for which Shannon entropy is calculated.
Returns:: Shannon entropy of the input vector.
Return type:: float

Examples

>>> shannon_entropy(np.array([1, 1, 2, 2, 3, 3]))

mlchem.chem.calculator package¶

Submodules¶

mlchem.chem.calculator.descriptors module¶

mlchem.chem.calculator.tools module¶

Table of Contents

Previous topic

Next topic

This Page