Chemical Structure Similarity 笔记

SMILES on Wikipeida

The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc.

it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree.

SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications.

Improved SMILES Substructure Searching , by Daylight

Daylight Theory Manual - Covering general information on representing molecules and an in-depth discussion of SMILESTM, SMARTS®, SMIRKS®, fingerprints, THOR database concepts, and Merlin analysis <html, pdf>

SMARTS - A Language for Describing Molecular Patterns

Fingerprints - Screening and Similarity

OpenBabel, including Implementation of Daylight SMARTS molecular matching syntax

makefp is a command line program to compute hashed path fingerprints from input smiles, or other file formats such as sdf or mol files.

Checkmol is a command-line utility program which reads molecular structure files in different formats (see below) and analyzes the input molecule for the presence of various functional groups and structural elements.

Search by Functional groups

PubChem Similar Searches search allows you to find similar chemical structures to the provided query. Similarity is measured using the Tanimoto equation and a binary fingerprint computed for every structure in the PubChem Compound database. This fingerprint consists of a series of chemical substructure “keys”. Each key denotes the presence or absence of a particular substructure in a molecule. The fingerprint does not consider variation in stereochemical or isotopic information. Collectively, these binary keys provide a “fingerprint” of a particular chemical structure valence-bond form.

PubChem Substructure search allows you to locate chemical structures that contain the particular connectivity and valence bond pattern that you provide in your query. For example, a substructure search of ethanol (SMILES: OCC) would return, among others, acetic acid (SMILE: OC(=O)C), since ethanol is a substructure of acetic acid.

OpenEye software

Roll Your Own Chemical Database With Free Components

Creating a Web-based, Searchable Molecular Structure Database Using Free Software

How to create a web-based molecular structure database with free software, a fine presentation to read.

2 Responses to “Chemical Structure Similarity 笔记”

  1. October 7th, 2007 | 7:15 pm

    你好:看到你的文章想请教下。你对SMILES码很熟悉吧,请问知道怎么做基于SMILES码的结构式搜索呢?

  2. October 8th, 2007 | 5:32 pm

    SMILE直接作结构式搜索是不行的。DayLight公司基于SMILE开发了SMARTS,“an expansion of SMILE allowing specification of molecular patterns and properties for substructure searching with varying levels of specificity”。不过SMARTS对于大量数据的查询,必然在性能上是缓慢的。
    SMILE是David Weininger博士最早开发的编码方法,同时David Weininger也是DayLight公司的总裁(President)。

Leave a reply

Random posts

  • Web应用中理想的化学标记法(Line Notation)的十一个要素
  • 现在用的主机
  • Re: 大家都说说自己遇到的最恐怖的老师吧
  • Fwd: Spy printers
  • 票贩子是如何炼成的?