1. Wiki

  2. Zinc

  3. PubChem

  4. DUD

  5. CDD

  6. e-Molecules

  7. PDB

  8. Drug Bank

  9. Chem Spider

  10. PubMed

  11. RISM-MOL

  12. PAN Pesticide Database

  13. MONARPOP

  14. ChemDB

  15. ChemNavigator

  16. Ligand.Info

  17. ILThermo

  18. pKa in non-aqueous media

  19. Open Notebook Science Challenge

  20. NIST

  1. Molport

  2. Catalytic Site Atlas database


Wiki

http://www.wikipedia.org/
Example of INFORMATION:
Systematic (IUPAC) name
2-acetoxybenzoic acid
Identifiers
CAS number
50-78-2
ATC code
A01AD05 B01AC06, N02BA01
PubChem
CID 2244
DrugBank
DB00945
ChemSpider
2157
Chemical data
Formula
C9H8O4
Mol. mass
180.157 g/mol
SMILES
eMolecules & PubChem
Synonyms
2-acetyloxybenzoic acid
acetylsalicylate
acetylsalicylic acid
O-acetylsalicylic acid
Physical data
Density
1.40 g/cm³
Melt. point
135 °C (275 °F)
Boiling point
140 °C (284 °F) (decomposes)
Solubility in water
3 mg/mL (20 °C)
Pharmacokinetic data
Bioavailability
Rapidly and completely absorbed
Protein binding
99.6%
Metabolism
Hepatic
Half-life
300–650 mg dose: 3.1–3.2 h
1 g dose: 5 h
2 g dose: 9 h
Excretion
Renal
Therapeutic considerations

Zinc

http://zinc.docking.org/
SEARCH: One can compose a query by specifying molecular property (Net charge, xLogP, Rotatable bonds, H-donors, Polar surface area, Molecular weight, etc.) or molecule constitution (SMILES/SMARTS). One may also specify ZINC IDs, and original catalog numbers.
INFORMATION:
  • Supplier information; Representations (links to other databases)
  • Properties:xLogP, ap & p desolvation, HBD,HBA,Charge,Mwt,NRB (comment: no units, no solvent data, no descriptions)
  • Annotations; Similarity

PubChem

http://pubchem.ncbi.nlm.nih.gov/
SEARCH: 1) simple (chemical name); 2) advanced (Chemical Properties, Stereochemistry, BioAssays, Links, Elements)
INFORMATION:
  • BioMedical Annotation
  • BioAssay Results
  • Protein Structures with compound
  • Synonyms
  • Properties
Molecular Weight

Molecular Formula

XLogP3

H-Bond Donor

H-Bond Acceptor

Rotatable Bond Count

Exact Mass

MonoIsotopic Mass

Topological Polar Surface Area

Heavy Atom Count

Formal Charge

Complexity

Isotope Atom Count

Defined Atom StereoCenter Count

Undefined Atom StereoCenter Count

Defined Bond StereoCenter Count

Undefined Bond StereoCenter Count

Covalently-Bonded Unit Count

  • Descriptors (IUPAC Name,Canonical SMILES,InChI, InChIKey)
  • Compound Information
  • Substance Information (Chemical Reactions, Journal Publishers, Metabolic Pathways, NIH Molecular Libraries, Physical Properties, etc.)
  • Exports

Dud

http://dud.docking.org/http://pubs.acs.org/doi/abs/10.1021/jm0608356

DUD, a directory of useful decoys for benchmarking virtual screening. DUD is designed to help test docking algorithms by providing challenging decoys. It contains:
  • A total of 2,950 active compounds against a total of 40 targets
  • For each active, 36 "decoys" with similar physical properties (e.g. molecular weight, calculated LogP) but dissimilar topology.

Every ligand has 36 decoy molecules that are physically similar but topologically distinct, leading to a database of 98external image 2009.gif266 compounds. For most targets, enrichment was at least half a log better with uncorrected databases such as the MDDR than with DUD, evidence of bias in the former. These calculations also allowed 40 × 40 cross-docking, where the enrichments of each ligand set could be compared for all 40 targets, enabling a specificity metric for the docking screens.

CDD

http://www.collaborativedrug.com/

Collaborative Drug Discovery's web-based software organizes preclinical research data to help scientists advance new drug candidates more effectively.

e-Molecules

http://www.emolecules.com/

INFORMATION:
  • Structure (2D)
  • Known Names
  • SMILES
  • Molecular weight
  • ACD log P
  • Supplier, Supplier's ID

PDB

http://www.rcsb.org/pdb/home/home.do

The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety of tools and resources. Users can perform simple and advanced searches based on annotations relating to sequence, structure and function. These molecules are visualized, downloaded, and analyzed by users who range from students to specialized scientists.

Drug Bank

http://www.drugbank.ca/

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs.
SEARCH: 1) simple, 2) advanced
INFORMATION:
  • Name, Synonyms
  • Drug Type
  • Brand Names, Brand Mixtures
  • Chemical IUPAC Name, Chemical Formula, Chemical Structure
  • RxList Link,PDRhealth Link, Wikipedia Link
  • Melting Point, Water Solubility (Experimental, Predicted), LogP/Hydrophobicity (Experimental, Predicted), LogS (Experimental, Predicted), Caco2 Permeability, pKa/Isoelectric Point
  • Structures: 2D, 3D, MOL,SDF, PDB, SMILES
  • Pharmacology, Mechanism of Action, Absorption, Toxicity, Protein Binding, Biotransformation, Half Life, Pathways, Patient Information

Chem Spider

http://www.chemspider.com/

ChemSpider links together compound information across the web, providing free text and structure search access of millions of chemical structures. With an abundance of additional property information, tools to upload, curate and use the data, and integration to a multitude of other online services, ChemSpider is the richest single source of structure-based chemistry information. It is provided to the community by the Royal Society of Chemistry.

SEARCH: 1) simple 2) advanced
INFORMATION:
  • Empirical Formula, Molecular Weight, Mass
  • Wikipedia Article(s)
  • Associated data sources (links), Patents, Articles
  • Properties: Predicted (logP, LogD, volume, SASA, melting point, solubility, etc.) and Experimental
  • Spectra, Images, Curation

PubMed

http://www.ncbi.nlm.nih.gov/pubmed

PubMed comprises more than 19 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.


RISM-MOL

*1. To add a key to key files for selection. Example:
    • P = getDataSets(DataSetsPath,'Frolov_TestSet','c');
    • [FL,KL,KFL,PL]=multiSelect(P);
    • do_something('key.Closure=''PLHNC''; writeKey(keyfile,key)',FL,KL,KFL,PL);

*2. The script:
Usage: ./script MolIndex
- takes as command line input index of molecule (in the order they are present *.m file) to start the calculations from
- gets molecule names from /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m file
- runs a loop over molecules in the list starting with those with specified index.
- runs the MATLAB, that executes all the commands in RunSet_TestSet.m file (with mol index as input).
- waits until the MATLAB process disappears from the list of running processors or the time of the MATLAB execution is more that 5 min. If process is still in the list after 5 min => it is killed and warning is written.

#!/bin/bash
echo "Usage script StartInd"
StartInd=$1
list=`cat /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m`
ind=0
for val in $list ; do
ind=$(($ind+1))
if [ $ind -ge $(($StartInd+1)) ]; then
mol=${val:1:$(( ${#val}-6 ))}
MolInd=$(($ind-1))
echo "Running mol: " $MolInd " " $mol

/opt/local/bin/matlab -r "RunSet_TestSet($MolInd) " -nojvm -nosplash & pid=$!
echo $pid > tmp_pid
i='0'
p='1'
while [ $i -le 299 -a $p -gt 0 ] ; do
p=`ps -u frolov | grep $pid | wc -l`
sleep 5
i=$(($i+5))
echo $i " "$p
done

if [ $p -gt 0 ] ; then
echo "WARNING!!! KILLING pid: "$pid
kill $pid
fi
fi
done

The contents of the "/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m":
TestSet={...
'1,1,1,2-tetrachloroethane',...
'1,1,1-trichloroethane',...
'1,1,2,2-tetrachloroethane',...

...

'trichloromethane',...
'undecan-2-one',...
'tmp'};

Note: have to put the 'tmp' name at the end to keep the ",..." for the previous molecule. After calculating all molecules the SCRIPT will try to calculate the "t" ("tmp") molecule and will fail. This is OK.

The contents of "RunSet_TestSet.m":
- sets the path to database functions
- runs the script with cell array of molecule names
- sets the DataBasePath
- runs a loop for one entry: gets the RISM input, runs RISM input with "do_something" and "StartRISMscript".

function RunSet_any(ind)
path(path,'/net/v215-2/data4/fedorov-group/Database/bin/');
run '/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m';
Set=TestSet;
%run /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script.m
%Set=TestSet;
N=length(Set);
DataSetsPath = '/net/v215-2/data4/fedorov-group/Database/DataSets';
P = getDataSets(DataSetsPath,'Frolov_TestSet','System');
ou = struct('Distance','Angstr','Energy','kcal_mol');
for i=ind:ind
i
name=TestSet{i};
[FL,KL,KFL,PL]=multiSelect(P,'Name',name);
FL;
do_something(StartRISMScript(DataSetsPath,ou,'user_Closure=''HNC''; user_MixingRules=''LorentzBerthelot''; user_LambdaCoupling=0.5;'),FL,KL,KFL,PL);

end
quit
end


Hope this helps!

PAN Pesticide Database

http://www.pesticideinfo.org/


MONARPOP

http://www.monarpop.at/

ChemBD

http://cdb.ics.uci.edu/
ChemicalSearch: Find Chemicals by Various Criteria
Find a chemical by basic criteria like molecular weight and predicted logP, or by the more abstract notion of structural similarity.
Virtual Chemical Space: Retro-Synthesis and Combinatorial Library Design
Interactively deconstruct target compounds into component precursors and reconstruct similar building-blocks into combinatorial libraries representing the "virtual chemical space" near the target compound.
Reaction Explorer: Synthesis Explorer and Mechanism Explorer
Interactive system for learning and practicing reactions, syntheses and mechanisms in organic chemistry, with advanced support for the automatic generation of random problems, curved-arrow mechanism diagrams, and inquiry-based learning.
Datasets: For Machine Learning and Searching Experiments
Various available chemical datasets annotated with interesting properties to train and test machine-learning prediction and searching methods.
Supplements: Articles and Support Material
Online articles relating to the system with supplementary data and figures referenced in them.

Toolkits
COSMOS
Predicts 3D Structure from SMILES
Smi2Depict
Generates 2D Images from SMILES
Babel
Molecule File Format Converter
MolInfo
Calculate / Predict Molecular Properties
Reaction Processor
Product Library Generation
Pattern Match Counter
Counts Functional Groups (sub-structures)
Pattern Count Screen
Screens Molecules by Functional Group Count
MSFragment
Fragments Molecules for Mass Spec Analysis
Mass2Structure
Searches ChemDB by Monoisotopic Mass and Substructure Filtering

ChemNavigator

http://www.chemnavigator.com/

Ligand.Info

http://ligand.info/
Ligand.Info is a compilation of various publicly available databases of small molecules such as ChemBank, ChemPDB, KEGG, NCI, AKos GmbH, Asinex Ltd, and TimTec. The total size of the Meta-Database is 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded in SDF format and used for virtual high-throughput screening of new potential drugs. The database can also be screened using a Java-based tool.

ILThermo

http://ilthermo.boulder.nist.gov/ILThermo/
ILThermo is free web research tool that allows users to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

pKa in non-aqueous media


http://tera.chem.ut.ee/~ivo/HA_UT/
Kind of database. Contains set of links to sources with pKa values in non-aqueous media

Open Notebook Science Challenge

http://onschallenge.wikispaces.com/
Phys-Chem properties

NIST

http://www.nist.gov/srd/

  • Fundamental Physical Constants

http://physics.nist.gov/cuu/Constants/index.html
  • Online Databases

http://www.nist.gov/srd/onlinelist.cfm

MolPort

http://www.molport.com/buy-chemicals/moleculelink/N-2-hydroxyphenyl-acetamide/900799


Catalytic Site Atlas database

http://www.ebi.ac.uk/thornton-srv/databases/CSA/
Introduction
The Catalytic Site Atlas (CSA) is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. We defined a classification of catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by an enzyme. For a full description of the classification, see Reference 2.
The CSA contains 2 types of entry:
  1. Original hand-annotated entries, derived from the primary literature. References for these entries are given.
  2. Homologous entries, found by PSI-BLAST alignment (using an e value cut-off of 0.00005) to one of the original entries. The equivalent residues, which align in sequence to the catalytic residues found in the original entry are documented.
Access to the CSA is via PDB code, SWISS-PROT entry or E.C. number. Accessing via PDB code takes you straight to the CSA entry for that PDB, while accessing via SWISS-PROT or E.C. number gives a list of all PDB codes for structures assigned that particular SWISS-PROT identifier or E.C. number. Structures with entries in the CSA are given as hyperlinks.
Each CSA entry lists the catalytic residues found in that entry, using PDB residue numbering. Each site is also marked with an evidence tag, which is either "Literature reference" or "PSI-BLAST hit". If the entry is a PSI-BLAST hit you can follow the link to the original entry. The active site can be visualised using RasMol.
Each entry contains a link to a list of homologous entries found by PSI-BLAST, and a link to other PDB structures with identical E.C. numbers or SWISS-PROT identifier to the entry you are viewing.
References:
  1. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) Nucl. Acids. Res. 32: D129-D133.
  2. Analysis of Catalytic Residues in Enzyme Active Sites. Gail J. Bartlett, Craig T. Porter, Neera Borkakoti, and Janet M. Thornton (2002) J Mol Biol 324:105-121.
  3. Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families. James W. Torrance, Gail J. Bartlett, Craig T. Porter, Janet M. Thornton (2005) J Mol Biol. 347:565-81