Tautomer and Protomer Analysis for Drug Discovery
This notebook covers the chemistry of tautomerism and protonation states, which are critical for:
Molecular docking - Wrong tautomer = wrong binding pose
pKa prediction - Understanding ionization at physiological pH
SAR analysis - Tautomeric preference affects activity
Crystal structure interpretation - Hydrogen positions often ambiguous
Chemistry Background
Tautomerism
Tautomers are constitutional isomers that interconvert rapidly via proton transfer:
Keto ⇌ Enol (carbonyl compounds)
Lactam ⇌ Lactim (cyclic amides)
Amino ⇌ Imino (aminoheterocycles)
Thione ⇌ Thiol (thiocarbonyls)
The tautomeric ratio depends on:
Thermodynamic stability (ΔG)
Solvent effects
Intramolecular hydrogen bonding
Protonation States (Protomers)
At physiological pH (7.4), functional groups are protonated or deprotonated:
Group |
Typical pKa |
State at pH 7.4 |
|---|---|---|
Carboxylic acid |
4-5 |
Deprotonated (COO⁻) |
Aliphatic amine |
9-11 |
Protonated (NH₃⁺) |
Aromatic amine |
4-5 |
Neutral (NH₂) |
Phenol |
9-10 |
Neutral (OH) |
Imidazole |
6-7 |
Mixed |
[ ]:
import os
import tempfile
from pathlib import Path
import Auto3D
from Auto3D import Auto3DOptions, main
from rdkit import Chem
from rdkit.Chem import AllChem, Draw
from rdkit.Chem.MolStandardize import rdMolStandardize
import pandas as pd
print(f"Auto3D version: {Auto3D.__version__}")
1. Common Tautomeric Systems in Drugs
Many drug molecules contain tautomerizable groups. Understanding which tautomer is bioactive is crucial.
[ ]:
# Classic tautomeric examples from medicinal chemistry
tautomeric_examples = {
# Keto-enol tautomerism
"warfarin_keto": "CC(=O)CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O",
"warfarin_enol": "CC(O)=CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O",
# Lactam-lactim in nucleobases
"guanine_keto": "NC1=NC2=C(N=CN2)C(=O)N1", # Dominant form
"guanine_enol": "NC1=NC2=C(N=CN2)C(O)=N1", # Rare form
# Thione-thiol in antithyroid drugs
"methimazole_thione": "CN1C=CN=C1S", # Actually C=S
# Amino-imino in sulfonamides
"sulfamethoxazole": "CC1=CC(NS(=O)(=O)C2=CC=C(N)C=C2)=NO1",
# Imidazole tautomerism (histidine)
"histamine_N1H": "NCCC1=CNC=N1", # N1-H tautomer
"histamine_N3H": "NCCC1=CN=CN1", # N3-H tautomer
}
print("Key tautomeric systems in drugs:")
for name, smi in tautomeric_examples.items():
mol = Chem.MolFromSmiles(smi)
if mol:
print(f" {name}: {Chem.MolToSmiles(mol)}")
2. Enumerating Tautomers with Auto3D
Auto3D can enumerate tautomers before 3D generation using the enumerate_tautomer=True option.
[ ]:
# Let's analyze warfarin - an important anticoagulant
# The keto-enol equilibrium affects its binding to vitamin K epoxide reductase
warfarin = "CC(=O)CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O"
# Use RDKit's tautomer enumerator to see all forms
enumerator = rdMolStandardize.TautomerEnumerator()
mol = Chem.MolFromSmiles(warfarin)
tautomers = list(enumerator.Enumerate(mol))
print(f"Warfarin tautomers found: {len(tautomers)}")
print()
for i, taut in enumerate(tautomers):
smi = Chem.MolToSmiles(taut)
print(f"Tautomer {i+1}: {smi}")
[ ]:
# Generate 3D structures for all tautomers to compare energies
# Write tautomers to file
with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:
for i, taut in enumerate(tautomers[:5]): # Limit to top 5
smi = Chem.MolToSmiles(taut)
f.write(f"{smi} warfarin_taut_{i+1}\n")
taut_file = f.name
print(f"Wrote {min(5, len(tautomers))} tautomers to {taut_file}")
[ ]:
# Run Auto3D on tautomers
if __name__ == "__main__":
config = Auto3DOptions(
path=taut_file,
k=1, # Lowest energy conformer per tautomer
optimizing_engine="AIMNET",
use_gpu=True,
)
taut_output = main(config)
print(f"Output: {taut_output}")
3. Calculating Tautomer Stability
The most stable tautomer has the lowest energy. We can rank tautomers by their electronic energy from Auto3D.
[ ]:
def analyze_tautomer_energies(sdf_path):
"""
Compare energies of different tautomers.
The energy difference determines the equilibrium ratio:
K = exp(-ΔG/RT)
At 298K, 1.36 kcal/mol difference = 10:1 ratio
"""
mols = list(Chem.SDMolSupplier(sdf_path, removeHs=False))
data = []
for mol in mols:
if mol is None:
continue
name = mol.GetProp("_Name")
# Get energy in Hartree
if mol.HasProp("E_hartree"):
e_hartree = float(mol.GetProp("E_hartree"))
elif mol.HasProp("E_tot"):
e_hartree = float(mol.GetProp("E_tot")) / 27.2114 # eV to Hartree
else:
continue
data.append({"Name": name, "E_hartree": e_hartree})
df = pd.DataFrame(data)
# Calculate relative energies in kcal/mol
e_min = df["E_hartree"].min()
df["E_rel_kcal"] = (df["E_hartree"] - e_min) * 627.509 # Hartree to kcal/mol
# Estimate Boltzmann populations at 298K
import numpy as np
RT = 0.001987 * 298 # kcal/mol
df["Boltzmann_weight"] = np.exp(-df["E_rel_kcal"] / RT)
df["Population_%"] = 100 * df["Boltzmann_weight"] / df["Boltzmann_weight"].sum()
return df.sort_values("E_rel_kcal")
if 'taut_output' in dir():
df_taut = analyze_tautomer_energies(taut_output)
print("Tautomer Stability Analysis:")
print(df_taut[["Name", "E_rel_kcal", "Population_%"]].to_string(index=False))
4. Protonation State Analysis
For docking and simulations, we need the correct protonation state at the target pH.
[ ]:
# Common ionizable groups and their typical pKa values
ionizable_examples = {
# Acids - deprotonated at pH 7.4
"benzoic_acid": "C1=CC=C(C=C1)C(=O)O", # pKa ~4.2
"phenol": "C1=CC=C(C=C1)O", # pKa ~10
"sulfonamide": "CC1=CC=C(C=C1)S(=O)(=O)N", # pKa ~10
# Bases - protonated at pH 7.4
"piperidine": "C1CCNCC1", # pKa ~11
"aniline": "NC1=CC=CC=C1", # pKa ~4.6
"imidazole": "C1=CN=CN1", # pKa ~7.0
"pyridine": "C1=CC=NC=C1", # pKa ~5.2
# Zwitterions
"glycine": "NCC(=O)O", # pKa1 ~2.3, pKa2 ~9.6
}
def predict_protonation_state(smiles, pH=7.4):
"""
Simple rule-based prediction of protonation state.
For more accurate predictions, use:
- Dimorphite-DL
- OpenEye's pKa tools
- Epik (Schrodinger)
"""
mol = Chem.MolFromSmiles(smiles)
# Count ionizable groups
carboxylic = len(mol.GetSubstructMatches(Chem.MolFromSmarts("C(=O)[OH]")))
aliphatic_N = len(mol.GetSubstructMatches(Chem.MolFromSmarts("[NX3;H2,H1;!$(NC=O)]")))
aromatic_N = len(mol.GetSubstructMatches(Chem.MolFromSmarts("[nH]")))
state = {
"carboxylic_acids": carboxylic,
"aliphatic_amines": aliphatic_N,
"aromatic_N": aromatic_N,
"predicted_charge": aliphatic_N - carboxylic # Simplified
}
return state
print("Ionizable group analysis at pH 7.4:")
print("-" * 60)
for name, smi in ionizable_examples.items():
state = predict_protonation_state(smi)
print(f"{name:20s}: charge = {state['predicted_charge']:+d}")
5. pH-Dependent Conformer Generation
Different protonation states can have different conformational preferences due to:
Electrostatic effects - charged groups repel/attract
Hydrogen bonding - protonated groups form different H-bonds
Tautomeric shifts - pH affects tautomer equilibrium
[ ]:
# Compare neutral vs zwitterionic glycine
glycine_forms = {
"glycine_neutral": "NCC(=O)O", # Low pH
"glycine_zwitterion": "[NH3+]CC(=O)[O-]", # Physiological pH
"glycine_anion": "NCC(=O)[O-]", # High pH
}
# Write to file
with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:
for name, smi in glycine_forms.items():
f.write(f"{smi} {name}\n")
glycine_file = f.name
print("Glycine protonation states:")
for name, smi in glycine_forms.items():
mol = Chem.MolFromSmiles(smi)
charge = Chem.GetFormalCharge(mol)
print(f" {name}: charge = {charge:+d}")
[ ]:
# Generate 3D structures for each protonation state
if __name__ == "__main__":
config = Auto3DOptions(
path=glycine_file,
k=1,
optimizing_engine="AIMNET",
use_gpu=True,
)
glycine_output = main(config)
print(f"Output: {glycine_output}")
6. Histidine: A Complex Case
Histidine has three ionizable groups and two tautomeric forms, making it one of the most complex amino acids:
Carboxyl: pKa ~1.8
Amino: pKa ~9.2
Imidazole: pKa ~6.0
At pH 7.4, ~10% of histidine’s imidazole is protonated.
[ ]:
# All relevant histidine forms for drug design
histidine_forms = {
# Neutral imidazole tautomers
"His_Nδ1H": "[NH3+][C@@H](Cc1c[nH]cn1)C(=O)[O-]", # δ-tautomer
"His_Nε2H": "[NH3+][C@@H](Cc1cnc[nH]1)C(=O)[O-]", # ε-tautomer
# Protonated imidazole (positively charged)
"His_charged": "[NH3+][C@@H](Cc1c[nH+]c[nH]1)C(=O)[O-]",
}
print("Histidine forms relevant at physiological pH:")
for name, smi in histidine_forms.items():
mol = Chem.MolFromSmiles(smi)
if mol:
charge = Chem.GetFormalCharge(mol)
print(f" {name}: net charge = {charge:+d}")
7. Integration with Auto3D Tautomer Enumeration
Auto3D has built-in tautomer enumeration that can be enabled during conformer generation.
CLI Usage
# Enable tautomer enumeration from command line
auto3d run drugs.smi --k=3 --enumerate-tautomer --gpu
# With ANI2xt (recommended for tautomers)
auto3d run drugs.smi --k=3 --enumerate-tautomer --engine=ANI2xt --gpu
# Using configuration file for full control
auto3d run drugs.smi -c tautomer_config.yaml
Example tautomer_config.yaml:
k: 3
enumerate_tautomer: true
tauto_engine: rdkit
pKaNorm: true
optimizing_engine: ANI2xt
use_gpu: true
[ ]:
# Example: Run Auto3D with tautomer enumeration
# This generates conformers for ALL tautomers, then ranks by energy
drug_with_tautomers = {
"sulfasalazine": "OC1=CC=C(N=NC2=CC=C(NC3=CC=CC=N3)C=C2)C(=C1)C(=O)O",
"allopurinol": "O=C1NC=NC2=C1NNC=2", # Xanthine oxidase inhibitor
"omeprazole": "COC1=CC2=NC(CS(=O)C3=NC4=CC=CC=C4N3)=NC2=CC1OC",
}
with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:
for name, smi in drug_with_tautomers.items():
f.write(f"{smi} {name}\n")
drug_file = f.name
print(f"Drugs with complex tautomerism: {len(drug_with_tautomers)}")
[ ]:
# Run with tautomer enumeration enabled
if __name__ == "__main__":
config = Auto3DOptions(
path=drug_file,
k=3, # Top 3 conformers (may include different tautomers)
enumerate_tautomer=True, # Enable tautomer enumeration
tauto_engine="rdkit", # Use RDKit tautomer engine
pKaNorm=True, # Normalize to dominant protonation state
optimizing_engine="AIMNET",
use_gpu=True,
)
drug_output = main(config)
print(f"Output: {drug_output}")
8. Practical Guidelines
For Docking
Generate multiple tautomers if the molecule has tautomerizable groups
Use the dominant protomer at target tissue pH (usually 7.4)
Consider both forms for ambiguous cases (e.g., imidazole pKa ~7)
For QSAR/ML
Use canonical tautomer for consistency
Consider tautomeric preference as a descriptor
Be aware that training data may have inconsistent tautomers
For Thermodynamics
Calculate ΔG for each tautomer/protomer
Use Boltzmann averaging for ensemble properties
Account for pH in free energy calculations
[ ]:
# Cleanup
import os
for f in [taut_file, glycine_file, drug_file]:
if f in dir() and os.path.exists(f):
os.unlink(f)
Summary
This tutorial covered:
Tautomerism fundamentals - keto-enol, lactam-lactim, amino-imino
Tautomer enumeration - using RDKit and Auto3D
Energy-based ranking - identifying the dominant tautomer
Protonation states - pH-dependent ionization
Complex cases - histidine, drugs with multiple ionizable groups
Key takeaway: Always consider tautomers and protonation states when preparing molecules for computational chemistry.