Water solubility of compounds significantly affect its druggability, absorption and distribution property, such as oral bioavailability, intestinal absorption and BBB penetration. Typically, a low solubility goes along with a bad absorption and therefore the general aim is to avoid poorly soluble compounds. For convenient, water solubility (mol/Liter) are converted to logarithm value as LogS.

There are two major methods to predict LogS, atom contribution method and machine learning based method. The atom contribution method predict solubility via an increment system by adding atom contributions depending on their atom types. The machine learning method uses 2D or 3D features generated from molecular structures to fit a regression model for prediction.

The atom contribution method requires solid domain knowledge of cheminformatics, while machine learning method can use out-of-box cheminformatic toolkit to generate features for fitting models. Sounds easy, right? 😉

Here, we use python with rdkit and sklearn to predict LogS trained from a public dataset of water solubility

Read more »

There are two kinds of images: bitmap and vectorgram in the computer. The vectorgram is very suitable for illustrating chemical structures which can be easily drawled by edge and text.

The RDKit package can draw chemical structures in bitmap or vactorgram with only several codes. Sometime we want to dynamically render high quality figures of molecules in web, and the SVG format is the best choice to do this.

Read more »

最近拜读了一篇来自AstraZeneca R&D的深度学习结合化学信息学的文章,Molecular De-Novo Design through Deep Reinforcement Learning。在这篇文章中,作者Marcus Olivecrona使用了RNN的方法进行了inverse QSAR,从而达到设计新的活性药物分子。而且作者的方法中可以通过inforcement learning增加各种限制,从而达到精准的调控。作者的Github项目地址在这里。由于文章比较复杂,因此在这里稍作笔记以深入学习。

Read more »

Why We Hate Cilantro

I want to analysis SNP in coding region of olfactory receptor. In 2012, researchers studied SNP correlated with cilantro preference, and they found an SNP rs72921001 influence people’s feeling of cilantro. They said rs72921001 is a frequently accured SNP (an A -> C in DNA sequence) in OR6A2, an olfactory receptor.

Resonable, right? I immediately think this could cause an amino acid missense mutation. However, when I searched this SNP in Ensembl, I found it is located at upstream flanking region of OR10A2, and OR6A2 is another gene… Now I think maybe the name is changed during this time, and rs72921001 might influence the expression of OR10A2 in gene level… Umm.

Can I get more imformation of SNP in olfactory receptor? Let’s try by R!

Read more »

Protein contact map is a very helpfull tool to represent 3D protein structure in a 2D matrix format. More detail about contact map can be found here. Contact maps can not only be used as illustrations of protein, but also as tools to predict homology protein structure, especially for low homolog in evolution. Some scientists have already applied deep learning on this project and developed programs for structure prediction, such as RaptorX of TTIC.

There are several tools or program for calculate and display contact map on internet, here is an collection of them. I had a project several month ago, and I wanted to use a script to generate contact maps for a molecular dynamic trajectory, so that I use python to write a script to do it. I love python because I could always find usefull packages from internet. Here I found a package named MDtraj, And a package from Benjamin Rafferty (doi:10.4231/D35M62761)! Thanks to them, I can simply achieve my idea.

Read more »

Recently, I read blog from iwantobipen and found a python package for drug design, and it is called Open Drug Discovery Toolkit (oddt). Based on RDkit and OpenBable, it has two amazing function:

  • ligand-protein interaction finger print
  • electronic shape similarity calculation

Here I want to show you about the function of interaction calculation. All interaction, even including halogen bond, pi-pi stacting and cation-pi intercation.

Read more »
0%