March 29, 2008
Structure searching supported compound database based on MS Access
Why based on MS Access
MS Access is wildly used, easily available and almost the most user friendly database product. I know many stuffs are working on MS Excel for compound information management and there exists such commercial solutions, the reasons coincide.
The solution described here is for
- Small account of products catalog.
- Fast development needed.
- User interface and operation are crucial, or chemical information data is trivial part of the database.
Demonstration of functions
Platform and tools used
- Windows
- MS Access
- checkmol/matchmol
- JME Molecular Editor
- ChemAxon, Marvin, Molconvert
- Python Imaging Library (PIL)
Solution step by step
-
Sample data
SDF files are downloaded directly from Pubchem ftp, it’s free and public.
-
Import SDF file into Access
As the structure description part (mol file) exists inside each compound, it’s not easy to parse the SDF file to csv and other table based format. I write a simple Python script to transform SDF file to Access accepted XML file. You can create a desired table in Access database and export it to XML format to get the XML template.
When data are ready, just run import command from Access.
You can also try RDkit to manipuate SDF and structure information in Python, it’s professional.
-
Generation of structure pictures
When transforming SDF file with Python, some further more actions be taken to generate BMP structure pictures.
Use Molconvert to convert the mol file into PNG format
os.system( “molconvert png %s/%s.mol -o %s/%s.png” %(folder, molid, folder, molid) )
Use PIL to convert PNG to BMP
Image.open( “%s/%s.png”%(folder, molid)).save(”%s/%s.bmp”%(folder,molid))
BMP pictures are needed because native Access only accepts BMP picture as OLE object and displayed on form control.
-
Embed Ole object with VBA
Public Sub load_img()
Me.Recordset.MoveFirst
While Not Me.Recordset.EOF
Me.pic.OLETypeAllowed = acOLELinked
Me.pic.SourceDoc = CurrentProject.Path & “img” & Me.cid & “.bmp”
Me.pic.Action = acOLECreateLink
Me.pic.SizeMode = acOLESizeStretch
Me.Recordset.MoveNext
Wend
End Sub
-
JME as structure query input
JME is a java package and I don’t know any way to combine it with Access directly. So an webcontrol is introduced to the Access form and JME embedded on the source page of the webcontrol. The dom object visit is more simple than I had imaged.
mol = Me.WebBrowser2.Document.applets.Item(0).MolFile()
-
Implentation of sturcture search function
This is truely the key technic of the solution. It is powered by opensourced checkmol/machmol. Acturally, just after read the usage part of source code of its dll version, I begun to think about to do something on Access. Access database uses JETSQL engine, it’s not as powerful as T-SQL, but it supports VBA functions. VBA code can easily visit exernal dll function, so JETSQL could be extended greatly.
So, the sturcture search code is really simple
select * from sample_data where MatchMol(query_mol, mol) > 0
MatchMol function is pre-defined by dll apis of checkmol/matchmol.
Code copied from MATCHMOLDLL.pas
——————————————————————————–
Private Declare Sub mm_SetMol Lib “matchmolDLL.dll” (ByVal st As String)
Private Declare Sub mm_SetCurrentMolAsQuery Lib “matchmolDLL.dll” ()
Private Declare Function mm_Match Lib “matchmolDLL.dll” (ByVal Exact As Boolean) As Long
Private Declare Function mm_GetRings Lib “matchmolDLL.dll” () As Long
Private Declare Function mm_GetAtomRing Lib “matchmolDLL.dll” (ByVal AtomNumber As Long) As Long
Private Declare Sub mm_Version Lib “matchmolDLL.dll” (ByVal st As String)
Public Function MatchMol(Needle As String, Haystack As String, Optional ExactMatch As Boolean = False) As Boolean
Static oldNeedle As String
If oldNeedle <> Needle Then
oldNeedle = Needle
mm_SetMol Needle
mm_SetCurrentMolAsQuery
End If
mm_SetMol Haystack
If mm_Match(ExactMatch) <> 0 Then MatchMol = True
End Function
——————————————————————————–
Performance issues
As a small database, there are about 2700 compounds are imported.
- Ole picture in BMP format consumes much space. The database file size grows to 600MB.
- Substructure searching is too fast to recognize delay.
- If function group or fingerprint is introduced as database index, the substructure searching could archive great performance on large scale dataset.
- The data importing process is slow and the structure pictures generation process costs much more time and CPU. It took about half an hour to convert the 2700 BMP files on my thinkpad.
Other References
Appendix 1
Such Access DB can NOT serves for Asp.net or other applications connect through ODBC/DAO, for “ODBC and DAO do not use or know anything about the code modules inserted into an .mdb file by Access. Only Access recognizes the modules. ” announced by MSDN KB [Q166113] You cannot use user-defined modules through ODBC or DAO.
Filed by
charlie
at 11:29 pm under English, MS_ACCESS, chemoinformatics



我想做一个基于SQL server的子结构检索,现在刚刚开始,什么也不懂,能不能给我点建议,谢谢!
借鉴开源的思路。比如:
http://pgfoundry.org/docman/?group_id=1000059
http://mychem.sourceforge.net/
分别是基于PostgreSQL和MySQL的开源项目,实际上它们都是使用了OpenBabel作为化学计算的基础。
在SQL Server中可以使用Extended stored procedure或者 2005版本支持的CLR进行实现。
A good example.
I made a chemical databases by using MDL Direct, which is not a free software.