Title: | Quantitative Structure Activity Relationship (QSAR) Data Sets |
---|---|
Description: | Molecular descriptors and outcomes for several public domain data sets |
Authors: | Max Kuhn |
Maintainer: | Max Kuhn <[email protected]> |
License: | GPL |
Version: | 1.02 |
Built: | 2024-11-01 11:16:44 UTC |
Source: | https://github.com/r-forge/qsardata |
These data were compiled and described by He and Jurs (2005). The data set consists of 322 compounds that were experimentally assessed for toxicity. The outcome is the negative log of activity (but is labled as "activity"). The structures and outcomes were obtained from http://www.qsarworld.com/index.php.
The package contains none sets of molecular descriptors: atom pair distances, Daylight fingerprints (http://www.daylight.com/dayhtml/doc/theory/theory.finger.html), Dragon descriptors (http://www.talete.mi.it/products/dragon_plus.htm), MOE2D, MOE2D fingerprints, MOE3D, PipelinePilot fingerprints (http://accelrys.com/products/pipeline-pilot/) and QuickProp descriptors (http://www.schrodinger.com/products/14/17/).
For fingerprints, the 500 most variable bits were selected whenever possible.
data(AquaticTox)
data(AquaticTox)
The data consist of several data frames. The first column of the descriptor data frames is called "Molecule" representing the compounds.
Atom pair descriptors
Daylight fingerprints (http://www.daylight.com/dayhtml/doc/theory/theory.finger.html)
Dragon descriptors (http://www.talete.mi.it/products/dragon_plus.htm)
LCALC descriptors
2 dimensional MOE descriptors
2 dimensional MOE fingerprints
3 dimensional MOE descriptors
PipelinePilot fingerprints (http://accelrys.com/products/pipeline-pilot/)
QuickProp descriptors
a factor with levels "Crosses" and "DoesNot"
a data frame with columns for the molecule name and the outcome (for merging)
He and Jurs. Assessing the reliability of a QSAR model's predictions. Journal of Molecular Graphics and Modelling (2005) vol. 23 (6) pp. 503-523
data(AquaticTox) head(AquaticTox_Outcome)
data(AquaticTox) head(AquaticTox_Outcome)
These data were compiled and described by Burns et al. (2004). The data set consists of 80 compounds that were designated as either crossing the blood-brain barrier or not crossing. The structures and outcomes were obtained from http://www.qsarworld.com/index.php.
The package contains none sets of molecular descriptors: atom pair distances, Daylight fingerprints (http://www.daylight.com/dayhtml/doc/theory/theory.finger.html), Dragon descriptors (http://www.talete.mi.it/products/dragon_plus.htm), MOE2D, MOE2D fingerprints, MOE3D, PipelinePilot fingerprints (http://accelrys.com/products/pipeline-pilot/) and QuickProp descriptors.
For fingerprints, the 500 most variable bits were selected whenever possible.
There are compounds with missing data for some descriptors.
The "2" in the name is due to another data set in the caret package for blood-brain barrier data (with numeric outcomes). These are a completely different set of compounds and have no connection.
data(bbb2)
data(bbb2)
The data consist of several data frames. The first column of the descriptor data frames is called "Molecule" representing the compounds.
Atom pair descriptors
Daylight fingerprints (http://www.daylight.com/dayhtml/doc/theory/theory.finger.html)
Dragon descriptors (http://www.talete.mi.it/products/dragon_plus.htm)
LCALC descriptors
2 dimensional MOE descriptors
2 dimensional MOE fingerprints
3 dimensional MOE descriptors
PipelinePilot fingerprints (http://accelrys.com/products/pipeline-pilot/)
QuickProp descriptors
a factor with levels "Crosses" and "DoesNot"
a data frame with columns for the molecule name and the outcome (for merging)
Burns et al. A mathematical model for prediction of drug molecule diffusion across the blood-brain barrier. The Canadian Journal of Neurological Sciences (2004) vol. 31 (4) pp. 520-527
data(bbb2) head(bbb2_Outcome)
data(bbb2) head(bbb2_Outcome)
Karthikeyan et al (2005) presented data where they used chemical descriptors to model the melting point of compounds (i.e. transition from solid to liquid state). They assembled 4401 compounds: 4126 for model training and 275 compounds as a final validation set. They calculated 2D and 3D MOE chemical descriptors.
data(MeltingPoint)
data(MeltingPoint)
The descriptors are contained in a data frame called MP_Descriptors
and the melting points are in a numeric vector MP_Outcome
. The original data set indicators are in a factor vector called MP_Data
with levels "Test" and "Train"
Karthikeyan et al. General melting point prediction based on a diverse compound data set and artificial neural networks. Journal of chemical information and modeling (2005) vol. 45 (3) pp. 581-90
data(MeltingPoint) head(MP_Descriptors)
data(MeltingPoint) head(MP_Descriptors)
Kazius et al (2005) investigated using chemical structure to predict mutagenicity (the increase of mutations due to the damage to genetic material). An Ames test was used to evaluate the mutagenicity potential of various chemicals. There were 4,337 compounds included in the data set with a mutagenicity rate of 55.3$%$. Using these compounds, the DragonX software (http://www.talete.mi.it/) was used to generate a baseline set of 1,579 predictors, including constitutional, topological and connectivity descriptors, among others. These variables consist of basic numeric variables (such as molecular weight) and counts variables (e.g., number of halogen atoms).
data(Mutagen)
data(Mutagen)
The descriptors are contained in a data frame called Mutagen_Dragon
and the outcomes are in a factor vector Mutagen_Outcomes
with levels "mutagen" and "nonmutagen"
Kazius et al. Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry(Print) (2005) vol. 48 (1) pp. 312-320
data(Mutagen) head(Mutagen_Dragon)
data(Mutagen) head(Mutagen_Dragon)