Save MOL2 file from RDkit Mol
My company are now using a docking engine which requires MOL2 as input. Since I use rdkit alot, I want to use rdkit to save mol2 file, while rdkit doesn’t support Tripos MOL2 format in current release. I don’t want to move to OpenBabel or other packages because I’m lazy ;P. So how about write one MOL2 writer?
I have found many email or issue of rdkit, and I found a pull request https://github.com/rdkit/rdkit/pull/415 after I write this script (Sad!). When I wrote these script, I haven’t notict this pull request, but I have read MOL2 scripts of UCSF Chimera and an introduction for MOL2 format. Chimera has many internal atom types which is perfectly matching MOL2 supported atom types, while rdkit don’t. For example, the amide bond. If I directly write bond type of amide bond, most MOL2 reader take this a rotatable single bond… Thus the only solution is use SMARTS pattern matching to assign MOL2 atom types.
MOL2 atom and bond types
1 |
|
I use a structure like {AtomIdx}{SP?}{IsAromatic}
to identify sybyl atom type, like ‘6SP3False’ is a SP3 carboclic carbon. But these kind of symbl is not enough for many special type of atoms, like amide bond and planar amide. I have to use predifined pattern to recognize these special atom type, like:
1 | amide_smarts = '[OX1]=CN' |
MOL2 format
Detailed MOL2 format are refereced from here, or download from this link. I decide to use a simple MOL2 format, which means I ignore most parts of MOL2 file, and only preserve @<TRIPOS>MOLECULE
, @<TRIPOS>ATOM
, @<TRIPOS>BOND
and @<TRIPOS>SUBSTRUCTURE
. For atoms and bonds, I travese all of them for labeling special atoms and bonds.
1 |
|
3D coordinates and Gastiger charge support
I also found another email of rdkit asking ‘writing a Tripos MOL2 file with charges’, and I have tried to add rdkit calculated Gasteiger charges in MOL2 file.
1 | def smi2conf(smiles, charge=True): |
Then add atom charges in MOL2 file:
1 |
|
Future plan
In this script, I have make a function to convert rdkit.Mol to a Tripos MOL2 file format. It support:
- most atom types and bond types
- three major parts of MOL2 file
- 3D conformation calculation
- Gasteiger charge
There are still many unsupported features and weakness:
- some atom types, including
S.O
,S.O2
and many metal ions - time consuming in conformation generation with metal ions (if input is SMILES)
- each atom types require manual recording
It seems working fine in a small sample of drug-like molecules, you can download all code here. The scripts of # 415 is much clever and stable then mine, I shall update my scrips and test in more solid samples, like ZINC drug-like as the metioned in # 415. This pull request has been developed almost 4 years, since the developer is busy for his work, it might stopped there and integrated in rdkit.Contrib
… I hope I can help to contribute some codes for debuging it.