April 15, 2007
Chemical Structure Similarity 笔记
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc.
it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.
A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree.
SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications.
Improved SMILES Substructure Searching , by Daylight
Daylight Theory Manual - Covering general information on representing molecules and an in-depth discussion of SMILESTM, SMARTS®, SMIRKS®, fingerprints, THOR database concepts, and Merlin analysis <html, pdf>
SMARTS - A Language for Describing Molecular Patterns
Fingerprints - Screening and Similarity
OpenBabel, including Implementation of Daylight SMARTS molecular matching syntax
makefp is a command line program to compute hashed path fingerprints from input smiles, or other file formats such as sdf or mol files.
Checkmol is a command-line utility program which reads molecular structure files in different formats (see below) and analyzes the input molecule for the presence of various functional groups and structural elements.
Search by Functional groups
PubChem Similar Searches search allows you to find similar chemical structures to the provided query. Similarity is measured using the Tanimoto equation and a binary fingerprint computed for every structure in the PubChem Compound database. This fingerprint consists of a series of chemical substructure “keys”. Each key denotes the presence or absence of a particular substructure in a molecule. The fingerprint does not consider variation in stereochemical or isotopic information. Collectively, these binary keys provide a “fingerprint” of a particular chemical structure valence-bond form.
PubChem Substructure search allows you to locate chemical structures that contain the particular connectivity and valence bond pattern that you provide in your query. For example, a substructure search of ethanol (SMILES: OCC) would return, among others, acetic acid (SMILE: OC(=O)C), since ethanol is a substructure of acetic acid.
OpenEye software
Roll Your Own Chemical Database With Free Components
Creating a Web-based, Searchable Molecular Structure Database Using Free Software
How to create a web-based molecular structure database with free software, a fine presentation to read.
Filed by
charlie
at 4:28 pm under Engineer, chemoinformatics, resource

你好:看到你的文章想请教下。你对SMILES码很熟悉吧,请问知道怎么做基于SMILES码的结构式搜索呢?
SMILE直接作结构式搜索是不行的。DayLight公司基于SMILE开发了SMARTS,“an expansion of SMILE allowing specification of molecular patterns and properties for substructure searching with varying levels of specificity”。不过SMARTS对于大量数据的查询,必然在性能上是缓慢的。
SMILE是David Weininger博士最早开发的编码方法,同时David Weininger也是DayLight公司的总裁(President)。