Shape Similarity and Electroshape Similarity Calculation

Now I want to introduce Molecular Shape Comparison function of Open Drug Discovery Toolkit (oddt). First of all, 3D structure of molecular is required for the calculation, and I use Maestro to generate MOL2 file of example moleculars.

USR (Ultrafast Shape Recognition) - function usr(molecule)

Ballester PJ, Richards WG (2007). Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of computational chemistry, 28(10):1711-23. http://dx.doi.org/10.1002/jcc.20681

USRCAT (USR with Credo Atom Types) - function usr_cat(molecule)

Adrian M Schreyer, Tom Blundell (2012). USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints. Journal of Cheminformatics, 2012 4:27. http://dx.doi.org/10.1186/1758-2946-4-27

Electroshape - function electroshape(molecule)

Armstrong, M. S. et al. ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics. J Comput Aided Mol Des 24, 789-801 (2010). http://dx.doi.org/doi:10.1007/s10822-010-9374-0

Calculate electro shape of moleculars

Then we should import oddt package and read file into python. toolkit.readfile returns a generator, so that we used next function to import molecular, and calculated electro shape. All compounds are co-crystalized ligands of Human Smoothened receptor (SMO), you can find them on RCSB PDB. I extracted their 3D coordinates from PDB file and saved as MOL2 file.

1
2
3
4
5
6
7
8
9
10
11
12
13
from oddt import toolkit
from oddt import shape

sant1 = shape.electroshape(next(toolkit.readfile('sdf', 'mol/sant1.sdf')))
cyc = shape.electroshape(next(toolkit.readfile('sdf', 'mol/cyc.sdf')))
ly = shape.electroshape(next(toolkit.readfile('sdf', 'mol/ly.sdf')))
sag = shape.electroshape(next(toolkit.readfile('sdf', 'mol/sag.sdf')))
vis = shape.electroshape(next(toolkit.readfile('sdf', 'mol/vis.sdf')))
chl = shape.electroshape(next(toolkit.readfile('sdf', 'mol/chl.sdf')))
ohc = shape.electroshape(next(toolkit.readfile('sdf', 'mol/ohc.sdf')))

ligfps = (allo1, allo2, sant1, ly, sag, vis, cyc, chl, ohc)
ligname = ('Sant-1', 'LY2940680', 'SAG1.5', 'Vismodegib', 'Cyclopamine', 'Cholesterol', '20(S)-OHC')

Then we use shape.usr_similarity calculate similarity of each pair of compounds.

1
2
3
4
5
6
7
8
9
10
11
import matplotlib.pyplot as plt
import numpy as np

mat = np.zeros((7, 7))
for i in range(0,7):
for j in range(0, 7):
mat[i, j] = shape.usr_similarity(ligfps[i], ligfps[j])

plt.pcolor(mat,cmap=plt.cm.Blues)
plt.show()
plt.close()

Fig. 1

Conclusion

Obviously, LY2940680 is the most unique ligands amoug all this structure. All compounds, except Cholestrol and 20(S)-OHC, are different to each other. This is a result of selection of crystalization. Another key conclusion is that Cyclopamine is similar with Cholectrol and 20(S)-OHC (Hydroxy-Cholestrol), which is in accordence with the biochemical and crystal evidence.

USR and USR_CAT calculation

Next, similar result for other comparison funtion:

USR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sant1 = shape.usr(next(toolkit.readfile('sdf', 'mol/sant1.sdf')))
cyc = shape.usr(next(toolkit.readfile('sdf', 'mol/cyc.sdf')))
ly = shape.usr(next(toolkit.readfile('sdf', 'mol/ly.sdf')))
sag = shape.usr(next(toolkit.readfile('sdf', 'mol/sag.sdf')))
vis = shape.usr(next(toolkit.readfile('sdf', 'mol/vis.sdf')))
chl = shape.usr(next(toolkit.readfile('sdf', 'mol/chl.sdf')))
ohc = shape.usr(next(toolkit.readfile('sdf', 'mol/ohc.sdf')))

ligfps = (sant1, ly, sag, vis, cyc, chl, ohc)
ligname = ('Sant-1', 'LY2940680', 'SAG1.5', 'Vismodegib', 'Cyclopamine', 'Cholesterol', '20(S)-OHC')

mat = np.zeros((7, 7))

for i in range(0,7):
for j in range(0, 7):
mat[i, j] = shape.usr_similarity(ligfps[i], ligfps[j])

plt.pcolor(mat,cmap=plt.cm.Blues)

plt.show()
plt.close()

Fig. 2

USR_CAT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sant1 = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/sant1.sdf')))
cyc = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/cyc.sdf')))
ly = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/ly.sdf')))
sag = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/sag.sdf')))
vis = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/vis.sdf')))
chl = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/chl.sdf')))
ohc = shape.usr_cat(next(toolkit.readfile('sdf', 'mol/ohc.sdf')))

ligfps = (sant1, ly, sag, vis, cyc, chl, ohc)
ligname = ('Sant-1', 'LY2940680', 'SAG1.5', 'Vismodegib', 'Cyclopamine', 'Cholesterol', '20(S)-OHC')

mat = np.zeros((7, 7))

for i in range(0,7):
for j in range(0, 7):
mat[i, j] = shape.usr_similarity(ligfps[i], ligfps[j])

plt.pcolor(mat,cmap=plt.cm.Blues)

plt.show()
plt.close()

Fig. 3

We have same result like electro shape similarity. Althogh you can use tanimoto similarity for 2D molecular, I would still recommend eclectroshape comparison function in oddt as an option.