{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tautomer and Protomer Analysis for Drug Discovery\n", "\n", "This notebook covers the chemistry of tautomerism and protonation states, which are critical for:\n", "\n", "- **Molecular docking** - Wrong tautomer = wrong binding pose\n", "- **pKa prediction** - Understanding ionization at physiological pH\n", "- **SAR analysis** - Tautomeric preference affects activity\n", "- **Crystal structure interpretation** - Hydrogen positions often ambiguous\n", "\n", "## Chemistry Background\n", "\n", "### Tautomerism\n", "\n", "Tautomers are constitutional isomers that interconvert rapidly via proton transfer:\n", "\n", "```\n", "Keto ⇌ Enol (carbonyl compounds)\n", "Lactam ⇌ Lactim (cyclic amides)\n", "Amino ⇌ Imino (aminoheterocycles)\n", "Thione ⇌ Thiol (thiocarbonyls)\n", "```\n", "\n", "The **tautomeric ratio** depends on:\n", "- Thermodynamic stability (ΔG)\n", "- Solvent effects\n", "- Intramolecular hydrogen bonding\n", "\n", "### Protonation States (Protomers)\n", "\n", "At physiological pH (7.4), functional groups are protonated or deprotonated:\n", "\n", "| Group | Typical pKa | State at pH 7.4 |\n", "|-------|-------------|------------------|\n", "| Carboxylic acid | 4-5 | Deprotonated (COO⁻) |\n", "| Aliphatic amine | 9-11 | Protonated (NH₃⁺) |\n", "| Aromatic amine | 4-5 | Neutral (NH₂) |\n", "| Phenol | 9-10 | Neutral (OH) |\n", "| Imidazole | 6-7 | Mixed |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import tempfile\n", "from pathlib import Path\n", "\n", "import Auto3D\n", "from Auto3D import Auto3DOptions, main\n", "from rdkit import Chem\n", "from rdkit.Chem import AllChem, Draw\n", "from rdkit.Chem.MolStandardize import rdMolStandardize\n", "import pandas as pd\n", "\n", "print(f\"Auto3D version: {Auto3D.__version__}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Common Tautomeric Systems in Drugs\n", "\n", "Many drug molecules contain tautomerizable groups. Understanding which tautomer is bioactive is crucial." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Classic tautomeric examples from medicinal chemistry\n", "tautomeric_examples = {\n", " # Keto-enol tautomerism\n", " \"warfarin_keto\": \"CC(=O)CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O\",\n", " \"warfarin_enol\": \"CC(O)=CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O\",\n", " \n", " # Lactam-lactim in nucleobases\n", " \"guanine_keto\": \"NC1=NC2=C(N=CN2)C(=O)N1\", # Dominant form\n", " \"guanine_enol\": \"NC1=NC2=C(N=CN2)C(O)=N1\", # Rare form\n", " \n", " # Thione-thiol in antithyroid drugs\n", " \"methimazole_thione\": \"CN1C=CN=C1S\", # Actually C=S\n", " \n", " # Amino-imino in sulfonamides\n", " \"sulfamethoxazole\": \"CC1=CC(NS(=O)(=O)C2=CC=C(N)C=C2)=NO1\",\n", " \n", " # Imidazole tautomerism (histidine)\n", " \"histamine_N1H\": \"NCCC1=CNC=N1\", # N1-H tautomer\n", " \"histamine_N3H\": \"NCCC1=CN=CN1\", # N3-H tautomer\n", "}\n", "\n", "print(\"Key tautomeric systems in drugs:\")\n", "for name, smi in tautomeric_examples.items():\n", " mol = Chem.MolFromSmiles(smi)\n", " if mol:\n", " print(f\" {name}: {Chem.MolToSmiles(mol)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Enumerating Tautomers with Auto3D\n", "\n", "Auto3D can enumerate tautomers before 3D generation using the `enumerate_tautomer=True` option." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's analyze warfarin - an important anticoagulant\n", "# The keto-enol equilibrium affects its binding to vitamin K epoxide reductase\n", "\n", "warfarin = \"CC(=O)CC(C1=CC=CC=C1)C2=C(O)C3=CC=CC=C3OC2=O\"\n", "\n", "# Use RDKit's tautomer enumerator to see all forms\n", "enumerator = rdMolStandardize.TautomerEnumerator()\n", "mol = Chem.MolFromSmiles(warfarin)\n", "\n", "tautomers = list(enumerator.Enumerate(mol))\n", "print(f\"Warfarin tautomers found: {len(tautomers)}\")\n", "print()\n", "\n", "for i, taut in enumerate(tautomers):\n", " smi = Chem.MolToSmiles(taut)\n", " print(f\"Tautomer {i+1}: {smi}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate 3D structures for all tautomers to compare energies\n", "\n", "# Write tautomers to file\n", "with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:\n", " for i, taut in enumerate(tautomers[:5]): # Limit to top 5\n", " smi = Chem.MolToSmiles(taut)\n", " f.write(f\"{smi} warfarin_taut_{i+1}\\n\")\n", " taut_file = f.name\n", "\n", "print(f\"Wrote {min(5, len(tautomers))} tautomers to {taut_file}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run Auto3D on tautomers\n", "if __name__ == \"__main__\":\n", " config = Auto3DOptions(\n", " path=taut_file,\n", " k=1, # Lowest energy conformer per tautomer\n", " optimizing_engine=\"AIMNET\",\n", " use_gpu=True,\n", " )\n", " \n", " taut_output = main(config)\n", " print(f\"Output: {taut_output}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Calculating Tautomer Stability\n", "\n", "The most stable tautomer has the lowest energy. We can rank tautomers by their electronic energy from Auto3D." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def analyze_tautomer_energies(sdf_path):\n", " \"\"\"\n", " Compare energies of different tautomers.\n", " \n", " The energy difference determines the equilibrium ratio:\n", " K = exp(-ΔG/RT)\n", " \n", " At 298K, 1.36 kcal/mol difference = 10:1 ratio\n", " \"\"\"\n", " mols = list(Chem.SDMolSupplier(sdf_path, removeHs=False))\n", " \n", " data = []\n", " for mol in mols:\n", " if mol is None:\n", " continue\n", " name = mol.GetProp(\"_Name\")\n", " \n", " # Get energy in Hartree\n", " if mol.HasProp(\"E_hartree\"):\n", " e_hartree = float(mol.GetProp(\"E_hartree\"))\n", " elif mol.HasProp(\"E_tot\"):\n", " e_hartree = float(mol.GetProp(\"E_tot\")) / 27.2114 # eV to Hartree\n", " else:\n", " continue\n", " \n", " data.append({\"Name\": name, \"E_hartree\": e_hartree})\n", " \n", " df = pd.DataFrame(data)\n", " \n", " # Calculate relative energies in kcal/mol\n", " e_min = df[\"E_hartree\"].min()\n", " df[\"E_rel_kcal\"] = (df[\"E_hartree\"] - e_min) * 627.509 # Hartree to kcal/mol\n", " \n", " # Estimate Boltzmann populations at 298K\n", " import numpy as np\n", " RT = 0.001987 * 298 # kcal/mol\n", " df[\"Boltzmann_weight\"] = np.exp(-df[\"E_rel_kcal\"] / RT)\n", " df[\"Population_%\"] = 100 * df[\"Boltzmann_weight\"] / df[\"Boltzmann_weight\"].sum()\n", " \n", " return df.sort_values(\"E_rel_kcal\")\n", "\n", "\n", "if 'taut_output' in dir():\n", " df_taut = analyze_tautomer_energies(taut_output)\n", " print(\"Tautomer Stability Analysis:\")\n", " print(df_taut[[\"Name\", \"E_rel_kcal\", \"Population_%\"]].to_string(index=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Protonation State Analysis\n", "\n", "For docking and simulations, we need the correct protonation state at the target pH." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Common ionizable groups and their typical pKa values\n", "ionizable_examples = {\n", " # Acids - deprotonated at pH 7.4\n", " \"benzoic_acid\": \"C1=CC=C(C=C1)C(=O)O\", # pKa ~4.2\n", " \"phenol\": \"C1=CC=C(C=C1)O\", # pKa ~10\n", " \"sulfonamide\": \"CC1=CC=C(C=C1)S(=O)(=O)N\", # pKa ~10\n", " \n", " # Bases - protonated at pH 7.4 \n", " \"piperidine\": \"C1CCNCC1\", # pKa ~11\n", " \"aniline\": \"NC1=CC=CC=C1\", # pKa ~4.6\n", " \"imidazole\": \"C1=CN=CN1\", # pKa ~7.0\n", " \"pyridine\": \"C1=CC=NC=C1\", # pKa ~5.2\n", " \n", " # Zwitterions\n", " \"glycine\": \"NCC(=O)O\", # pKa1 ~2.3, pKa2 ~9.6\n", "}\n", "\n", "def predict_protonation_state(smiles, pH=7.4):\n", " \"\"\"\n", " Simple rule-based prediction of protonation state.\n", " \n", " For more accurate predictions, use:\n", " - Dimorphite-DL\n", " - OpenEye's pKa tools\n", " - Epik (Schrodinger)\n", " \"\"\"\n", " mol = Chem.MolFromSmiles(smiles)\n", " \n", " # Count ionizable groups\n", " carboxylic = len(mol.GetSubstructMatches(Chem.MolFromSmarts(\"C(=O)[OH]\")))\n", " aliphatic_N = len(mol.GetSubstructMatches(Chem.MolFromSmarts(\"[NX3;H2,H1;!$(NC=O)]\")))\n", " aromatic_N = len(mol.GetSubstructMatches(Chem.MolFromSmarts(\"[nH]\")))\n", " \n", " state = {\n", " \"carboxylic_acids\": carboxylic,\n", " \"aliphatic_amines\": aliphatic_N,\n", " \"aromatic_N\": aromatic_N,\n", " \"predicted_charge\": aliphatic_N - carboxylic # Simplified\n", " }\n", " \n", " return state\n", "\n", "\n", "print(\"Ionizable group analysis at pH 7.4:\")\n", "print(\"-\" * 60)\n", "for name, smi in ionizable_examples.items():\n", " state = predict_protonation_state(smi)\n", " print(f\"{name:20s}: charge = {state['predicted_charge']:+d}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. pH-Dependent Conformer Generation\n", "\n", "Different protonation states can have different conformational preferences due to:\n", "\n", "- **Electrostatic effects** - charged groups repel/attract\n", "- **Hydrogen bonding** - protonated groups form different H-bonds\n", "- **Tautomeric shifts** - pH affects tautomer equilibrium" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare neutral vs zwitterionic glycine\n", "glycine_forms = {\n", " \"glycine_neutral\": \"NCC(=O)O\", # Low pH\n", " \"glycine_zwitterion\": \"[NH3+]CC(=O)[O-]\", # Physiological pH\n", " \"glycine_anion\": \"NCC(=O)[O-]\", # High pH\n", "}\n", "\n", "# Write to file\n", "with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:\n", " for name, smi in glycine_forms.items():\n", " f.write(f\"{smi} {name}\\n\")\n", " glycine_file = f.name\n", "\n", "print(\"Glycine protonation states:\")\n", "for name, smi in glycine_forms.items():\n", " mol = Chem.MolFromSmiles(smi)\n", " charge = Chem.GetFormalCharge(mol)\n", " print(f\" {name}: charge = {charge:+d}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate 3D structures for each protonation state\n", "if __name__ == \"__main__\":\n", " config = Auto3DOptions(\n", " path=glycine_file,\n", " k=1,\n", " optimizing_engine=\"AIMNET\",\n", " use_gpu=True,\n", " )\n", " \n", " glycine_output = main(config)\n", " print(f\"Output: {glycine_output}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Histidine: A Complex Case\n", "\n", "Histidine has three ionizable groups and two tautomeric forms, making it one of the most complex amino acids:\n", "\n", "- **Carboxyl**: pKa ~1.8\n", "- **Amino**: pKa ~9.2 \n", "- **Imidazole**: pKa ~6.0\n", "\n", "At pH 7.4, ~10% of histidine's imidazole is protonated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# All relevant histidine forms for drug design\n", "histidine_forms = {\n", " # Neutral imidazole tautomers\n", " \"His_Nδ1H\": \"[NH3+][C@@H](Cc1c[nH]cn1)C(=O)[O-]\", # δ-tautomer\n", " \"His_Nε2H\": \"[NH3+][C@@H](Cc1cnc[nH]1)C(=O)[O-]\", # ε-tautomer\n", " \n", " # Protonated imidazole (positively charged)\n", " \"His_charged\": \"[NH3+][C@@H](Cc1c[nH+]c[nH]1)C(=O)[O-]\",\n", "}\n", "\n", "print(\"Histidine forms relevant at physiological pH:\")\n", "for name, smi in histidine_forms.items():\n", " mol = Chem.MolFromSmiles(smi)\n", " if mol:\n", " charge = Chem.GetFormalCharge(mol)\n", " print(f\" {name}: net charge = {charge:+d}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": "## 7. Integration with Auto3D Tautomer Enumeration\n\nAuto3D has built-in tautomer enumeration that can be enabled during conformer generation.\n\n### CLI Usage\n\n```bash\n# Enable tautomer enumeration from command line\nauto3d run drugs.smi --k=3 --enumerate-tautomer --gpu\n\n# With ANI2xt (recommended for tautomers)\nauto3d run drugs.smi --k=3 --enumerate-tautomer --engine=ANI2xt --gpu\n\n# Using configuration file for full control\nauto3d run drugs.smi -c tautomer_config.yaml\n```\n\nExample `tautomer_config.yaml`:\n\n```yaml\nk: 3\nenumerate_tautomer: true\ntauto_engine: rdkit\npKaNorm: true\noptimizing_engine: ANI2xt\nuse_gpu: true\n```" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example: Run Auto3D with tautomer enumeration\n", "# This generates conformers for ALL tautomers, then ranks by energy\n", "\n", "drug_with_tautomers = {\n", " \"sulfasalazine\": \"OC1=CC=C(N=NC2=CC=C(NC3=CC=CC=N3)C=C2)C(=C1)C(=O)O\",\n", " \"allopurinol\": \"O=C1NC=NC2=C1NNC=2\", # Xanthine oxidase inhibitor\n", " \"omeprazole\": \"COC1=CC2=NC(CS(=O)C3=NC4=CC=CC=C4N3)=NC2=CC1OC\",\n", "}\n", "\n", "with tempfile.NamedTemporaryFile(mode='w', suffix='.smi', delete=False) as f:\n", " for name, smi in drug_with_tautomers.items():\n", " f.write(f\"{smi} {name}\\n\")\n", " drug_file = f.name\n", "\n", "print(f\"Drugs with complex tautomerism: {len(drug_with_tautomers)}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run with tautomer enumeration enabled\n", "if __name__ == \"__main__\":\n", " config = Auto3DOptions(\n", " path=drug_file,\n", " k=3, # Top 3 conformers (may include different tautomers)\n", " enumerate_tautomer=True, # Enable tautomer enumeration\n", " tauto_engine=\"rdkit\", # Use RDKit tautomer engine\n", " pKaNorm=True, # Normalize to dominant protonation state\n", " optimizing_engine=\"AIMNET\",\n", " use_gpu=True,\n", " )\n", " \n", " drug_output = main(config)\n", " print(f\"Output: {drug_output}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Practical Guidelines\n", "\n", "### For Docking\n", "1. Generate **multiple tautomers** if the molecule has tautomerizable groups\n", "2. Use the **dominant protomer** at target tissue pH (usually 7.4)\n", "3. Consider **both forms** for ambiguous cases (e.g., imidazole pKa ~7)\n", "\n", "### For QSAR/ML\n", "1. Use **canonical tautomer** for consistency\n", "2. Consider tautomeric preference as a **descriptor**\n", "3. Be aware that training data may have inconsistent tautomers\n", "\n", "### For Thermodynamics\n", "1. Calculate ΔG for each tautomer/protomer\n", "2. Use **Boltzmann averaging** for ensemble properties\n", "3. Account for pH in free energy calculations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Cleanup\n", "import os\n", "for f in [taut_file, glycine_file, drug_file]:\n", " if f in dir() and os.path.exists(f):\n", " os.unlink(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "This tutorial covered:\n", "\n", "1. **Tautomerism fundamentals** - keto-enol, lactam-lactim, amino-imino\n", "2. **Tautomer enumeration** - using RDKit and Auto3D\n", "3. **Energy-based ranking** - identifying the dominant tautomer\n", "4. **Protonation states** - pH-dependent ionization\n", "5. **Complex cases** - histidine, drugs with multiple ionizable groups\n", "\n", "Key takeaway: **Always consider tautomers and protonation states** when preparing molecules for computational chemistry." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 4 }