Databases


 * 1) =Wiki=
 * 2) =Zinc=
 * 3) =PubChem=
 * 4) =DUD=
 * 5) =CDD=
 * 6) =e-Molecules=
 * 7) =PDB=
 * 8) = Drug Bank =
 * 9) =Chem Spider=
 * 10) =PubMed=
 * 11) =RISM-MOL=
 * 12) =PAN Pesticide Database=
 * 13) =MONARPOP=
 * 14) =ChemDB=
 * 15) =ChemNavigator=
 * 16) =Ligand.Info=
 * 17) =ILThermo=
 * 18) =pKa in non-aqueous media=
 * 19) =Open Notebook Science Challenge=
 * 20) =NIST=
 * ==Fundamental Physical Constants==
 * ==Online Databases==
 * 1) =Molport=
 * 2) =Catalytic Site Atlas database=
 * 1) =Catalytic Site Atlas database=

__Wiki__
@http://www.wikipedia.org/ Example of **INFORMATION**: acetylsalicylate acetylsalicylic acid O-acetylsalicylic acid || 1 g dose: 5 h 2 g dose: 9 h ||
 * ~ Systematic ([|IUPAC]) name ||
 * = 2-acetoxybenzoic acid ||
 * ~ Identifiers ||
 * ~ CAS number || 50-78-2 ||
 * ~ ATC code || A01 AD05 B01 AC06 , N02 BA01  ||
 * ~ PubChem || CID 2244 ||
 * ~ DrugBank || DB00945 ||
 * ~ ChemSpider || 2157 ||
 * ~ Chemical data ||
 * ~ [|Formula] || **C** 9 <span class="wiki_link_ext">**H**  8 <span class="wiki_link_ext">**O**  4 ||
 * ~ [|Mol. mass] || 180.157 g/mol ||
 * ~ [|SMILES] || <span class="wiki_link_ext">eMolecules & <span class="wiki_link_ext">PubChem  ||
 * ~ [|Synonyms] || 2-acetyloxybenzoic acid
 * ~ Physical data ||
 * ~ [|Density] || 1.40 g/cm³ ||
 * ~ [|Melt. point] || 135 °C (275 °F) ||
 * ~ [|Boiling point] || 140 °C (284 °F) (decomposes) ||
 * ~ [|Solubility] in [|water] || 3 mg/mL (20 °C) ||
 * ~ Pharmacokinetic data ||
 * ~ [|Bioavailability] || Rapidly and completely absorbed ||
 * ~ [|Protein binding] || 99.6% ||
 * ~ [|Metabolism] || [|Hepatic] ||
 * ~ [|Half-life] || 300–650 mg dose: 3.1–3.2 h
 * ~ [|Excretion] || [|Renal] ||
 * ~ Therapeutic considerations ||

__Zinc__
@http://zinc.docking.org/
 * SEARCH:** One can compose a query by specifying molecular property (Net charge, xLogP, Rotatable bonds, H-donors, Polar surface area, Molecular weight, etc.) or molecule constitution (SMILES/SMARTS). One may also specify ZINC IDs, and original catalog numbers.
 * INFORMATION:**
 * Supplier information; Representations (links to other databases)
 * Properties:xLogP, ap & p desolvation, HBD,HBA,Charge,Mwt,NRB (//comment//: no units, no solvent data, no descriptions)
 * Annotations; Similarity

__PubChem__
@http://pubchem.ncbi.nlm.nih.gov/
 * SEARCH:** 1) simple (chemical name); 2) advanced (Chemical Properties, Stereochemistry, BioAssays, Links, Elements)
 * INFORMATION**:
 * <span class="wiki_link_ext">BioMedical Annotation
 * <span class="wiki_link_ext">BioAssay Results
 * <span class="wiki_link_ext">Protein Structures with compound
 * <span class="wiki_link_ext">Synonyms
 * <span class="wiki_link_ext">Properties
 * Molecular Weight ||  ||
 * Molecular Formula ||  ||
 * XLogP3 ||  ||
 * H-Bond Donor ||  ||
 * H-Bond Acceptor ||  ||
 * Rotatable Bond Count ||  ||
 * Exact Mass ||  ||
 * MonoIsotopic Mass ||  ||
 * Topological Polar Surface Area ||  ||
 * Heavy Atom Count ||  ||
 * Formal Charge ||  ||
 * Complexity ||  ||
 * Isotope Atom Count ||  ||
 * Defined Atom StereoCenter Count ||  ||
 * Undefined Atom StereoCenter Count ||  ||
 * Defined Bond StereoCenter Count ||  ||
 * Undefined Bond StereoCenter Count ||  ||
 * Covalently-Bonded Unit Count ||  ||
 * <span class="wiki_link_ext">Descriptors ( IUPAC Name,Canonical SMILES,InChI, InChIKey<span class="wiki_link_ext">)
 * <span class="wiki_link_ext">Compound Information
 * <span class="wiki_link_ext">Substance Information ( Chemical Reactions, Journal Publishers, Metabolic Pathways, NIH Molecular Libraries, Physical Properties, etc.)
 * <span class="wiki_link_ext">Exports

__Dud__
@http://dud.docking.org/@http://pubs.acs.org/doi/abs/10.1021/jm0608356


 * DUD, a directory of useful decoys for benchmarking virtual screening.** DUD is designed to help test docking algorithms by providing challenging decoys. It contains:
 * A total of 2,950 active compounds against a total of 40 targets
 * For each active, 36 "decoys" with similar physical properties (e.g. molecular weight, calculated LogP) but dissimilar topology.

Every ligand has 36 decoy molecules that are physically similar but topologically distinct, leading to a database of 98266 compounds. For most targets, enrichment was at least half a log better with uncorrected databases such as the MDDR than with DUD, evidence of bias in the former. These calculations also allowed 40 × 40 cross-docking, where the enrichments of each ligand set could be compared for all 40 targets, enabling a specificity metric for the docking screens.

__CDD__
@http://www.collaborativedrug.com/


 * Collaborative Drug Discovery's** web-based software organizes preclinical research data to help scientists advance new drug candidates more effectively.

__e-Molecules__
@http://www.emolecules.com/


 * INFORMATION:**
 * Structure (2D)
 * Known Names
 * SMILES
 * Molecular weight
 * ACD log P
 * Supplier, Supplier's ID

__PDB__
@http://www.rcsb.org/pdb/home/home.do

The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety of tools and resources. Users can perform simple and advanced searches based on annotations relating to sequence, structure and function. These molecules are visualized, downloaded, and analyzed by users who range from students to specialized scientists.

__Drug Bank__
@http://www.drugbank.ca/

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs.
 * SEARCH:** 1) simple, 2) advanced
 * INFORMATION:**
 * Name, Synonyms
 * Drug Type
 * Brand Names, Brand Mixtures
 * Chemical IUPAC Name, Chemical Formula, Chemical Structure
 * RxList Link,PDRhealth Link, Wikipedia Link
 * Melting Point, Water Solubility (Experimental, Predicted), LogP/Hydrophobicity (Experimental, Predicted), LogS (Experimental, Predicted), Caco2 Permeability, pKa/Isoelectric Point
 * Structures: 2D, 3D, MOL,SDF, PDB, SMILES
 * Pharmacology, Mechanism of Action, Absorption, Toxicity, Protein Binding, Biotransformation, Half Life, Pathways, Patient Information

__Chem Spider__
[]

ChemSpider links together compound information across the web, providing free text and structure search access of millions of chemical structures. With an abundance of additional property information, tools to upload, curate and use the data, and integration to a multitude of other online services, ChemSpider is the richest single source of structure-based chemistry information. It is provided to the community by the Royal Society of Chemistry.


 * SEARCH:** 1) simple 2) advanced
 * INFORMATION:**
 * Empirical Formula, Molecular Weight, Mass
 * Wikipedia Article(s)
 * Associated data sources (links), Patents, Articles
 * Properties: Predicted (logP, LogD, volume, SASA, melting point, solubility, etc.) and Experimental
 * Spectra, Images, Curation

__PubMed__
[]

PubMed comprises more than 19 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.

RISM-MOL
>> P = getDataSets(DataSetsPath,'Frolov_TestSet','c'); >> [FL,KL,KFL,PL]=multiSelect(P); >> do_something('key.Closure=PLHNC; writeKey(keyfile,key)',FL,KL,KFL,PL);
 * 1. To add a key to key files for selection. Example:

Usage: ./script MolIndex - takes as command line input index of molecule (in the order they are present *.m file) to start the calculations from - gets molecule names from /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m file - runs a loop over molecules in the list starting with those with specified index. - runs the MATLAB, that executes all the commands in RunSet_TestSet.m file (with mol index as input). - waits until the MATLAB process disappears from the list of running processors or the time of the MATLAB execution is more that 5 min. If process is still in the list after 5 min => it is killed and warning is written.
 * 2. The script:

echo "Usage script StartInd" StartInd=$1 list=`cat /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m` ind=0 for val in $list ; do ind=$(($ind+1)) if [ $ind -ge $(($StartInd+1)) ]; then mol=${val:1:$(( ${#val}-6 ))} MolInd=$(($ind-1)) echo "Running mol: " $MolInd " " $mol
 * 1) !/bin/bash

/opt/local/bin/matlab -r "RunSet_TestSet($MolInd) " -nojvm -nosplash & pid=$! echo $pid > tmp_pid i='0' p='1' while [ $i -le 299 -a $p -gt 0 ] ; do p=`ps -u frolov | grep $pid | wc -l` sleep 5 i=$(($i+5)) echo $i " "$p done

if [ $p -gt 0 ] ; then echo "WARNING!!! KILLING pid: "$pid kill $pid fi fi done

The contents of the "/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m": TestSet={... '1,1,1,2-tetrachloroethane',... '1,1,1-trichloroethane',... '1,1,2,2-tetrachloroethane',...

...

'trichloromethane',... 'undecan-2-one',... 'tmp'};

Note: have to put the 'tmp' name at the end to keep the ",..." for the previous molecule. After calculating all molecules the SCRIPT will try to calculate the "t" ("tmp") molecule and will fail. This is OK.

The contents of "RunSet_TestSet.m": - sets the path to database functions - runs the script with cell array of molecule names - sets the DataBasePath - runs a loop for one entry: gets the RISM input, runs RISM input with "do_something" and "StartRISMscript".

function RunSet_any(ind) path(path,'/net/v215-2/data4/fedorov-group/Database/bin/'); run '/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m'; Set=TestSet; %run /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script.m %Set=TestSet; N=length(Set); DataSetsPath = '/net/v215-2/data4/fedorov-group/Database/DataSets'; P = getDataSets(DataSetsPath,'Frolov_TestSet','System'); ou = struct('Distance','Angstr','Energy','kcal_mol'); for i=ind:ind i name=TestSet{i}; [FL,KL,KFL,PL]=multiSelect(P,'Name',name); FL; do_something(StartRISMScript(DataSetsPath,ou,'user_Closure=HNC; user_MixingRules=LorentzBerthelot; user_LambdaCoupling=0.5;'),FL,KL,KFL,PL);

end quit end

Hope this helps!

**PAN** Pesticide Database
@http://www.pesticideinfo.org/

MONARPOP
@http://www.monarpop.at/

ChemBD
@http://cdb.ics.uci.edu/
 * [|ChemicalSearch]: Find Chemicals by Various Criteria ||
 * Find a chemical by basic criteria like molecular weight and predicted logP, or by the more abstract notion of structural similarity. ||
 * [|Virtual Chemical Space]: Retro-Synthesis and Combinatorial Library Design ||
 * Interactively deconstruct target compounds into component precursors and reconstruct similar building-blocks into combinatorial libraries representing the "virtual chemical space" near the target compound. ||
 * [|Reaction Explorer]: Synthesis Explorer and Mechanism Explorer ||
 * Interactive system for learning and practicing reactions, syntheses and mechanisms in organic chemistry, with advanced support for the automatic generation of random problems, curved-arrow mechanism diagrams, and inquiry-based learning. ||
 * [|Datasets]: For Machine Learning and Searching Experiments ||
 * Various available chemical datasets annotated with interesting properties to train and test machine-learning prediction and searching methods. ||
 * [|Supplements]: Articles and Support Material ||
 * Online articles relating to the system with supplementary data and figures referenced in them. ||

Predicts 3D Structure from SMILES || Generates 2D Images from SMILES || Molecule File Format Converter || Calculate / Predict Molecular Properties || Product Library Generation || Counts Functional Groups (sub-structures) || Screens Molecules by Functional Group Count || Fragments Molecules for Mass Spec Analysis || Searches ChemDB by Monoisotopic Mass and Substructure Filtering ||
 * Toolkits ||
 * [|COSMOS]
 * [|Smi2Depict]
 * [|Babel]
 * [|MolInfo]
 * [|Reaction Processor]
 * [|Pattern Match Counter]
 * [|Pattern Count Screen]
 * [|MSFragment]
 * [|Mass2Structure]

<span class="wiki_link_ext">ChemNavigator
@http://www.chemnavigator.com/

Ligand.Info
@http://ligand.info/ Ligand.Info is a compilation of various publicly available databases of small molecules such as [|ChemBank], [|ChemPDB], [|KEGG], [|NCI], [|AKos GmbH], [|Asinex Ltd], and [|TimTec]. The total size of the Meta-Database is 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded in SDF format and used for virtual high-throughput screening of new potential drugs. The database can also be screened using a [|Java-based tool].

ILThermo
@http://ilthermo.boulder.nist.gov/ILThermo/ ILThermo is free web research tool that allows users to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

pKa in non-aqueous media
[|http://tera.chem.ut.ee/~ivo/HA_UT/] Kind of database. Contains set of links to sources with pKa values in non-aqueous media

Open Notebook Science Challenge
http://onschallenge.wikispaces.com/ Phys-Chem properties

NIST
@http://www.nist.gov/srd/

@http://physics.nist.gov/cuu/Constants/index.html @http://www.nist.gov/srd/onlinelist.cfm
 * ==Fundamental Physical Constants==
 * ==Online Databases==

MolPort
@http://www.molport.com/buy-chemicals/moleculelink/N-2-hydroxyphenyl-acetamide/900799

Catalytic Site Atlas database
@http://www.ebi.ac.uk/thornton-srv/databases/CSA/

__**Introduction**__
//**References:**//
 * The Catalytic Site Atlas (CSA) is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. We defined a classification of catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by an enzyme. For a full description of the classification, see Reference 2.**
 * The CSA contains 2 types of entry:**
 * 1) **Original hand-annotated entries, derived from the primary literature. References for these entries are given.**
 * 2) **Homologous entries, found by PSI-BLAST alignment (using an e value cut-off of 0.00005) to one of the original entries. The equivalent residues, which align in sequence to the catalytic residues found in the original entry are documented.**
 * Access to the CSA is via PDB code, SWISS-PROT entry or E.C. number. Accessing via PDB code takes you straight to the CSA entry for that PDB, while accessing via SWISS-PROT or E.C. number gives a list of all PDB codes for structures assigned that particular SWISS-PROT identifier or E.C. number. Structures with entries in the CSA are given as hyperlinks.**
 * Each CSA entry lists the catalytic residues found in that entry, using PDB residue numbering. Each site is also marked with an evidence tag, which is either "Literature reference" or "PSI-BLAST hit". If the entry is a PSI-BLAST hit you can follow the link to the original entry. The active site can be visualised using RasMol.**
 * Each entry contains a link to a list of homologous entries found by PSI-BLAST, and a link to other PDB structures with identical E.C. numbers or SWISS-PROT identifier to the entry you are viewing.**
 * 1) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. **Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) Nucl. Acids. Res. 32: D129-D133.**
 * 2) Analysis of Catalytic Residues in Enzyme Active Sites. **Gail J. Bartlett, Craig T. Porter, Neera Borkakoti, and Janet M. Thornton (2002) J Mol Biol 324:105-121.**
 * 3) Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families. **James W. Torrance, Gail J. Bartlett, Craig T. Porter, Janet M. Thornton (2005) J Mol Biol. 347:565-81**