Save MOL2 file from RDkit Mol

My company are now using a docking engine which requires MOL2 as input. Since I use rdkit alot, I want to use rdkit to save mol2 file, while rdkit doesn’t support Tripos MOL2 format in current release. I don’t want to move to OpenBabel or other packages because I’m lazy ;P. So how about write one MOL2 writer?

I have found many email or issue of rdkit, and I found a pull request https://github.com/rdkit/rdkit/pull/415 after I write this script (Sad!). When I wrote these script, I haven’t notict this pull request, but I have read MOL2 scripts of UCSF Chimera and an introduction for MOL2 format. Chimera has many internal atom types which is perfectly matching MOL2 supported atom types, while rdkit don’t. For example, the amide bond. If I directly write bond type of amide bond, most MOL2 reader take this a rotatable single bond… Thus the only solution is use SMARTS pattern matching to assign MOL2 atom types.

MOL2 atom and bond types

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

rdkit2sybyl = {
'6SP3False': 'C.3', '6SP2False': 'C.2', '6SPFalse': 'C.1', '6SP2True': 'C.ar',
'7SP3False': 'N.3', '7SP3False_4': 'N.4',
'7SP2False': 'N.2', '7SP2False_4':'N.2', '7SP2True': 'N.ar', '7SP2True_4': 'N.ar',
'7SP2False_am': 'N.am', 'N_pl':'N.pl',
'7SPFalse': 'N.1', '7SPFalse_4': 'N.1',
'8SP3False': 'O.3', '8SP2False': 'O.2', '8SP2False_co2': 'O.co2',
'9SP3False': 'F', '17SP3False':'Cl', '35SP3False': 'Br', '53SP3False': 'I',
'15SP3False':'P.3',
'16SP3False':'S.3', '16SP2False':'S.2', # '16SP2False':'S.O', '16SP2False':'S.O2',
'1UNSPECIFIEDFalse': 'H',
'11SFalse': 'Na', '12SFalse': 'Mg', '13SFalse': 'Al', '19SFalse': 'K', '20SFalse': 'Ca',
'26SFalse': 'Fe', '29SFalse': 'Cu', '30SFalse': 'Zn',
}

idx2element = {1:'H', 6:'C', 7:'N', 8:'O', 9:'F', 15:'P', 16:'S',
11:'Na', 12:'Mg', 13:'Al', 14:'Si', 3:'Li', 4:'P', 5:'B',
19:'K', 20:'Ca', 26:'Fe', 29:'Cu', 30:'Zn',
17:'Cl', 35:'Br', 53:'I'}

bond2type = {1.0:'1', 1.5:'ar', 2.0:'2', 3.0:'3'}

I use a structure like {AtomIdx}{SP?}{IsAromatic} to identify sybyl atom type, like ‘6SP3False’ is a SP3 carboclic carbon. But these kind of symbl is not enough for many special type of atoms, like amide bond and planar amide. I have to use predifined pattern to recognize these special atom type, like:

1
2
3
4
5
6
amide_smarts = '[OX1]=CN'
amide_smarts = Chem.MolFromSmarts(amide_smarts)
amide_pl_smarts = '[#7]-,=[#6]-,=[#7]'
amide_pl_smarts = Chem.MolFromSmarts(amide_pl_smarts)
carboxylate_smarts = '[#6]C(=O)[O;H,-]'
carboxylate_smarts = Chem.MolFromSmarts(carboxylate_smarts)

MOL2 format

Detailed MOL2 format are refereced from here, or download from this link. I decide to use a simple MOL2 format, which means I ignore most parts of MOL2 file, and only preserve @<TRIPOS>MOLECULE, @<TRIPOS>ATOM, @<TRIPOS>BOND and @<TRIPOS>SUBSTRUCTURE. For atoms and bonds, I travese all of them for labeling special atoms and bonds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

# atom_id atom_name x y z atom_type subst_id subst_name charge
atom_base_str = '{:>7d} {:<4s} {:>9.4f} {:>9.4f} {:>9.4f} {:<5s} {:>2d} {:>3s} {:>9.4f}\n'

# bond_id origin_atom_id target_atom_id bond_type
bond_base_str = '{:>6d} {:>4d} {:>4d} {:<2s}\n'


def mol_to_mol2_block(mol, mol_name):
"""Write molecule in MOL2 file as string

Args:
mol: rdMol, molecule
mol_name: str, molecule name to set as name in mol2 file
"""
mol2_blk = ''

mol2_blk += '@<TRIPOS>MOLECULE\n{}\n{:<3d} {:<3d} 1 0 0\nSMALL\nGASTEIGER\n'.format(
mol_name, mol.GetNumAtoms(), mol.GetNumBonds())

conf = mol.GetConformer(-1)

mol2_blk += '@<TRIPOS>ATOM\n'

# processing guadinium bonds TODO
# pass

# nitrogen trigonal planar
if mol.HasSubstructMatch(amide_pl_smarts):
amide_pl_flag = True
amide_pl_matched_atoms = []
for pair in mol.GetSubstructMatches(amide_pl_smarts):
amide_pl_matched_atoms.extend(pair)
else:
amide_pl_flag = False

# processing amide bonds
if mol.HasSubstructMatch(amide_smarts):
amide_flag = True
amide_matched_atoms = []
for pair in mol.GetSubstructMatches(amide_smarts):
amide_matched_atoms.extend(pair)
else:
amide_flag = False

# processing carboxyllate O
if mol.HasSubstructMatch(carboxylate_smarts):
carboxy_flag = True
carboxy_matched_atoms = []
for pair in mol.GetSubstructMatches(carboxylate_smarts):
carboxy_matched_atoms.extend(pair)
else:
carboxy_flag = False

# travese all atoms and set atom type
for atom in mol.GetAtoms():
#if atom.GetAtomicNum() != 8: continue
key = str(atom.GetAtomicNum()) + str(atom.GetHybridization()) + str(atom.GetIsAromatic())

# N.4
if atom.GetAtomicNum() == 7:
if atom.GetFormalCharge()==1:
key += '_4' # N.4 and N.pl
elif amide_flag:
if atom.GetIdx() in amide_matched_atoms:
key += '_am' # N.am
elif amide_pl_flag:
if atom.GetIdx() in amide_pl_matched_atoms:
key = 'N_pl' # N.pl


# O in CO2
if atom.GetAtomicNum() == 8 and carboxy_flag:
if atom.GetIdx() in carboxy_matched_atoms:
key += '_co2'

atom_id = atom.GetIdx() + 1
atom_name = idx2element[atom.GetAtomicNum()]
coord = conf.GetAtomPosition(atom.GetIdx())
atom_type = rdkit2sybyl[key]
subst_id = 1
subst_name = 'UNL'
charge = 0.00

mol2_blk += atom_base_str.format(atom_id, atom_name, coord.x, coord.y, coord.z,
atom_type, subst_id, subst_name, charge)

mol2_blk += '@<TRIPOS>BOND\n'

# traverse all bonds
for bond in mol.GetBonds():
bond_id = bond.GetIdx()+1
origin_atom_id = bond.GetBeginAtomIdx()+1
target_atom_id = bond.GetEndAtomIdx()+1
bond_type = bond2type[bond.GetBondTypeAsDouble()]
if amide_flag:
origin_atom = mol.GetAtomWithIdx(origin_atom_id-1).GetAtomicNum()
target_atom = mol.GetAtomWithIdx(target_atom_id-1).GetAtomicNum()
if bond.GetBeginAtomIdx() in amide_matched_atoms \
and bond.GetEndAtomIdx() in amide_matched_atoms \
and (origin_atom == 7 or target_atom == 7):
bond_type = 'am'

mol2_blk += bond_base_str.format(bond_id, origin_atom_id, target_atom_id, bond_type)

mol2_blk += '@<TRIPOS>SUBSTRUCTURE\n 1 UNL 1\n'

return mol2_blk

3D coordinates and Gastiger charge support

I also found another email of rdkit asking ‘writing a Tripos MOL2 file with charges’, and I have tried to add rdkit calculated Gasteiger charges in MOL2 file.

1
2
3
4
5
6
7
8
9
10
11
12
13
def smi2conf(smiles, charge=True):
mol = Chem.MolFromSmiles(smiles)
if mol is not None:
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol)
AllChem.MMFFOptimizeMolecule(mol, maxIters=1000)
mol.SetProp("SMILES", smiles)
# compute Gasteiger charges
if charge:
AllChem.ComputeGasteigerCharges(mol)
else:
return None
return mol

Then add atom charges in MOL2 file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

def mol_to_mol2_block(mol, mol_name):

......

if mol.HasSubstructMatch(amide_smarts):

# travese all atoms and set atom type
for atom in mol.GetAtoms():
#if atom.GetAtomicNum() != 8: continue
key = str(atom.GetAtomicNum()) + str(atom.GetHybridization()) + str(atom.GetIsAromatic())

......

atom_id = atom.GetIdx() + 1
atom_name = idx2element[atom.GetAtomicNum()]
coord = conf.GetAtomPosition(atom.GetIdx())
atom_type = rdkit2sybyl[key]
subst_id = 1
subst_name = 'UNL'
try:
charge = atom.GetPropsAsDict()['_GasteigerCharge']
except KeyError:
charge = 0.00

......

Future plan

In this script, I have make a function to convert rdkit.Mol to a Tripos MOL2 file format. It support:

  • most atom types and bond types
  • three major parts of MOL2 file
  • 3D conformation calculation
  • Gasteiger charge

There are still many unsupported features and weakness:

  • some atom types, including S.O, S.O2 and many metal ions
  • time consuming in conformation generation with metal ions (if input is SMILES)
  • each atom types require manual recording

It seems working fine in a small sample of drug-like molecules, you can download all code here. The scripts of # 415 is much clever and stable then mine, I shall update my scrips and test in more solid samples, like ZINC drug-like as the metioned in # 415. This pull request has been developed almost 4 years, since the developer is busy for his work, it might stopped there and integrated in rdkit.Contrib… I hope I can help to contribute some codes for debuging it.