秃瓢

之前 之后
之前 之后

没哭,非常配合,完全出乎爸爸和姥姥的预料。不过也说明爸爸手艺高超。

嘿嘿,既然给我剃了个少林寺头,那我就练个无影手。

无影手

Structure searching supported compound database based on MS Access




Why based on MS Access

MS Access is wildly used, easily available and almost the most user friendly database product. I know many stuffs are working on MS Excel for compound information management and there exists such commercial solutions, the reasons coincide.

The solution described here is for

  • Small account of products catalog.
  • Fast development needed.
  • User interface and operation are crucial, or chemical information data is trivial part of the database.

Demonstration of functions


Platform and tools used

Solution step by step

  • Sample data

SDF files are downloaded directly from Pubchem ftp, it’s free and public.

  • Import SDF file into Access

As the structure description part (mol file) exists inside each compound, it’s not easy to parse the SDF file to csv and other table based format. I write a simple Python script to transform SDF file to Access accepted XML file. You can create a desired table in Access database and export it to XML format to get the XML template.

When data are ready, just run import command from Access.

You can also try RDkit to manipuate SDF and structure information in Python, it’s professional.

  • Generation of structure pictures

When transforming SDF file with Python, some further more actions be taken to generate BMP structure pictures.

Use Molconvert to convert the mol file into PNG format

os.system( “molconvert png %s/%s.mol -o %s/%s.png” %(folder, molid, folder, molid) )

Use PIL to convert PNG to BMP

Image.open( “%s/%s.png”%(folder, molid)).save(”%s/%s.bmp”%(folder,molid))

BMP pictures are needed because native Access only accepts BMP picture as OLE object and displayed on form control.

  • Embed Ole object with VBA

Public Sub load_img()
Me.Recordset.MoveFirst
While Not Me.Recordset.EOF
Me.pic.OLETypeAllowed = acOLELinked
Me.pic.SourceDoc = CurrentProject.Path & “img” & Me.cid & “.bmp”
Me.pic.Action = acOLECreateLink
Me.pic.SizeMode = acOLESizeStretch
Me.Recordset.MoveNext
Wend
End Sub

  • JME as structure query input

JME is a java package and I don’t know any way to combine it with Access directly. So an webcontrol is introduced to the Access form and JME embedded on the source page of the webcontrol. The dom object visit is more simple than I had imaged.

mol = Me.WebBrowser2.Document.applets.Item(0).MolFile()

  • Implentation of sturcture search function

This is truely the key technic of the solution. It is powered by opensourced checkmol/machmol. Acturally, just after read the usage part of source code of its dll version, I begun to think about to do something on Access. Access database uses JETSQL engine, it’s not as powerful as T-SQL, but it supports VBA functions. VBA code can easily visit exernal dll function, so JETSQL could be extended greatly.

So, the sturcture search code is really simple


select * from sample_data where MatchMol(query_mol, mol) > 0


MatchMol function is pre-defined by dll apis of checkmol/matchmol.

Code copied from MATCHMOLDLL.pas

——————————————————————————–
Private Declare Sub mm_SetMol Lib “matchmolDLL.dll” (ByVal st As String)
Private Declare Sub mm_SetCurrentMolAsQuery Lib “matchmolDLL.dll” ()
Private Declare Function mm_Match Lib “matchmolDLL.dll” (ByVal Exact As Boolean) As Long

Private Declare Function mm_GetRings Lib “matchmolDLL.dll” () As Long
Private Declare Function mm_GetAtomRing Lib “matchmolDLL.dll” (ByVal AtomNumber As Long) As Long
Private Declare Sub mm_Version Lib “matchmolDLL.dll” (ByVal st As String)

Public Function MatchMol(Needle As String, Haystack As String, Optional ExactMatch As Boolean = False) As Boolean
Static oldNeedle As String
If oldNeedle <> Needle Then
oldNeedle = Needle
mm_SetMol Needle
mm_SetCurrentMolAsQuery
End If

mm_SetMol Haystack
If mm_Match(ExactMatch) <> 0 Then MatchMol = True

End Function
——————————————————————————–

Performance issues

As a small database, there are about 2700 compounds are imported.

  • Ole picture in BMP format consumes much space. The database file size grows to 600MB.
  • Substructure searching is too fast to recognize delay.
  • If function group or fingerprint is introduced as database index, the substructure searching could archive great performance on large scale dataset.
  • The data importing process is slow and the structure pictures generation process costs much more time and CPU. It took about half an hour to convert the 2700 BMP files on my thinkpad.

Other References


Appendix 1

Such Access DB can NOT serves for Asp.net or other applications connect through ODBC/DAO, for “ODBC and DAO do not use or know anything about the code modules inserted into an .mdb file by Access. Only Access recognizes the modules. ” announced by MSDN KB [Q166113] You cannot use user-defined modules through ODBC or DAO.

windows search 4.0预览版发布

Windows Search 4.0 Preview

windows search以前叫msn desktop。自打vista,微软就很重视桌面搜索了。与之对应的更著名的桌面搜索软件是google desktop,实际上后者也更强大,更开放(windows search也是通过ifilter开放的)。

其实windows内置的搜索,很早很早以前就有了,叫index service。index service/ msn desktop/ vista/ windows search这一系列产品,貌似都是同样的技术基础。但是一个产品定位总是不准,形象一直变化,就显得不那么高档。这一点上来说,微软的市场能力还真不一定就比google强。

着两种桌面搜索产品以前都用过。说不上什么确切的理由,但是比较自然地用windows search替换了GDS。windows search在使用上,的确更贴近普通用户。贴两张图吧。

desktop-search-windows.png
desktop-search-google.png

Access中的事件和委托

实际上对.Net中的“委托(delegate)”的概念并不很懂。如果仅理解成自定义事件的话,在Access中也可以部分实现。

窗体2中的自定义事件FireFromF2被窗体1捕获处理,参数被传递。用法很简单,注意WithEvents关键字的使用。

accessevent1.png
accesseventf1.png

accesseventf21.png

上面的例子要求先f2开启状态下再打开f1才能成功注册事件(原因见最后的总结)。如果是子窗体的事件,就简单一些,应用更常见。

accesseventf31.png

总结

  • WithEvents设置了事件监听的钩子,这个钩子针对的是Object,是实例,而不是Class或类型。
  • 所以可以监听Application这样的全局物件,也可以监听某个具体的Form。但是不能对所有的Form(Access.Form类型)起作用。
  • VBA中不支持自动的up-casting;WithEvents也不支持对象数组。所以后期绑定的方法也基本行不通。

Random posts

  • Gridview排序和分页的基本机制
  • Python中的and和or
  • 心有灵犀啊
  • 一些厂商提供的数据仓库工具
  • 晨雾回龙观