Title: | Partial LeAst Squares for Multiomic Analysis |
---|---|
Description: | Contains tools for supervised analyses of incomplete, overlapping multiomics datasets. Applies partial least squares in multiple steps to find models that predict survival outcomes. See Yamaguchi et al. (2023) <doi:10.1101/2023.03.10.532096>. |
Authors: | Kevin R. Coombes [cre, aut], Kyoko Yamaguchi [aut], Salma Abdelbaky [aut] |
Maintainer: | Kevin R. Coombes <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 1.1.3 |
Built: | 2024-11-10 03:01:04 UTC |
Source: | https://github.com/r-forge/oompa |
"CombinedWeights"
The CombinedWeights
object class merges the weight matrices for
all data sets in a plasma object.
combineAllWeights(pl) ## S4 method for signature 'CombinedWeights' summary(object, ...) ## S4 method for signature 'CombinedWeights' image(x, ...) stdize(object, type = c("standard", "robust")) interpret(object, component, alpha = 0.05)
combineAllWeights(pl) ## S4 method for signature 'CombinedWeights' summary(object, ...) ## S4 method for signature 'CombinedWeights' image(x, ...) stdize(object, type = c("standard", "robust")) interpret(object, component, alpha = 0.05)
pl |
An object of the |
object |
An object of the |
x |
An object of the |
type |
A single character string indicating how to standardize the object. Legal value are "standard" or "robust". |
component |
A single chaaracter string; which componen should be interpreted. |
alpha |
A single numerical value between 0 and 1; what signfiicance value should be used to select important features. |
... |
Ignored; potentially, extra arguments to the summary or image methods. |
The combineAllWeights
function returns a newly constructed object of the
CombinedWeights
class. The summary method returna list
containing four matrices. Each matrix has one row for each omics data
set and one column for each model component. Each amtric contains
different summary statistics, including the Mean, SD, Median, and MAD.
Objects are defined using the combineAllWeights
functions.
Simply supply an object of class plasma
.
combined
:a matrix of the original variables in
dataset N
as rows and the PLS components M
as
columns.
featureSize
:a numeric (usually integer) vector that stores the number of features in each omics data set.
dataSource
:a factor indicating which omics data set each feature came from.
summary
:outputs summary statistics for the contributions of dataset N
to components from all datasets in the case of getAllWeights
or dataset M
in the case of getCompositeWeights
.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) getCompositeWeights(object = pl, N = "ClinicalBin", M = "RPPA") cbin <- getAllWeights(object = pl, N = "ClinicalBin") summary(cbin) image(cbin) heat(cbin, cexCol = 0.5) cbin01 <- pickSignificant(object = cbin, alpha = 0.01) image(cbin01) heat(cbin01, cexCol = 0.5) getTop(object = cbin01, N = 3)
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) getCompositeWeights(object = pl, N = "ClinicalBin", M = "RPPA") cbin <- getAllWeights(object = pl, N = "ClinicalBin") summary(cbin) image(cbin) heat(cbin, cexCol = 0.5) cbin01 <- pickSignificant(object = cbin, alpha = 0.01) image(cbin01) heat(cbin01, cexCol = 0.5) getTop(object = cbin01, N = 3)
"Contribution"
The Contribution
object class contains the weight matrix between variables and the PLS components. The values in the weight matrix are a numeric representation of how much a variable from the omics datasets contributed to defining the final PLS components.
getCompositeWeights(object, N, M) getAllWeights(object, N) getFinalWeights(object) getTop(object, N = 1) pickSignificant(object, alpha) ## S4 method for signature 'Contribution' summary(object, ...) ## S4 method for signature 'Contribution' image(x, col = viridis(64), mai = c(1.82, 1.52, 0.32, 0.32), ...) ## S4 method for signature 'Contribution' heat(object, main = "Contributions", col = viridis(64), mai = c(1.52, 0.32, 0.82, 1.82), ...)
getCompositeWeights(object, N, M) getAllWeights(object, N) getFinalWeights(object) getTop(object, N = 1) pickSignificant(object, alpha) ## S4 method for signature 'Contribution' summary(object, ...) ## S4 method for signature 'Contribution' image(x, col = viridis(64), mai = c(1.82, 1.52, 0.32, 0.32), ...) ## S4 method for signature 'Contribution' heat(object, main = "Contributions", col = viridis(64), mai = c(1.52, 0.32, 0.82, 1.82), ...)
object |
In the first four functions, an object of the
|
N |
in the function |
M |
name of the dataset being modeled pairwise with dataset |
alpha |
level of significance used in the |
... |
other graphical parameters. |
x |
an object of the |
main |
A character vector of length one; the main plot title. |
col |
A vector of color descriptors. |
mai |
A vector of four nonnegative numbers. |
The plasma
function returns a newly constructed object of the
plasma
class.
Objects are defined using the getAllWeights
, getCompositeWeights
, getTop
, or pickSignificant
functions. In the simplest scenario, one would enter an object of class plasma
and any specific parameters associated with the function (see arguments section for more info).
contrib
:a matrix of the original variables in dataset N
as rows and the PLS components M
as columns.
datasets
:a character vector that stores the names of the datasets that were specified for the function.
summary
:outputs summary statistics for the contributions of dataset N
to components from all datasets in the case of getAllWeights
or dataset M
in the case of getCompositeWeights
.
image
:outputs a heatmap of the transposed contrib
matrix.
heat
:outputs a clustered heatmap of the contrib
matrix.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) getCompositeWeights(object = pl, N = "ClinicalBin", M = "RPPA") cbin <- getAllWeights(object = pl, N = "ClinicalBin") summary(cbin) image(cbin) heat(cbin, cexCol = 0.5) cbin01 <- pickSignificant(object = cbin, alpha = 0.01) image(cbin01) heat(cbin01, cexCol = 0.5) getTop(object = cbin01, N = 3)
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) getCompositeWeights(object = pl, N = "ClinicalBin", M = "RPPA") cbin <- getAllWeights(object = pl, N = "ClinicalBin") summary(cbin) image(cbin) heat(cbin, cexCol = 0.5) cbin01 <- pickSignificant(object = cbin, alpha = 0.01) image(cbin01) heat(cbin01, cexCol = 0.5) getTop(object = cbin01, N = 3)
The CombinedWeights
object class merges the weight matrices for
all data sets in a plasma object.
data(tfESCA) data(mirESCA)
data(tfESCA) data(mirESCA)
Both tfData
and mirESCA
are data frames containng two
columns. The first column is and ID
column containing the TCGA
sample barcode for an esophagela cancer sample. The second column,
called Type
identifies the sample as either "squamous" (for
likely squamous cell carcinomas that cluster near head and neck
cancers) or "adeno" (for likley adenocarcinomas that cluster near
stomach cancers).
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
All data supplied here are based upon esophageal cancer data generated by the TCGA Research Network (https://www.cancer.gov/tcga).
The transcription factor classifications of 196 esophageal cancer into squamous cell carcinoma or adenocarcinoma are taken from work published by Abrams and colleaagues in BMC Genomics.
The microRNA classifications of 195 esophageal cancer samples into squamous cell carcinoma or adenocarcinoma are taken from work published by Asiaee and colleaagues in J Comput Biol.
Abrams ZB, Zucker M, Wang M, Asiaee Taheri A, Abruzzo LV, Coombes KR.
Thirty biologically interpretable clusters of transcription
factors distinguish cancer type.
BMC Genomics. 2018 Oct 11;19(1):738. doi: 10.1186/s12864-018-5093-z.
Asiaee A, Abrams ZB, Nakayiza S, Sampath D, Coombes KR.
Explaining Gene Expression Using Twenty-One MicroRNAs.
J Comput Biol. 2020 Jul;27(7):1157-1170. doi: 10.1089/cmb.2019.0321.
Functions to impute missing data in omics data sets.
meanModeImputer(X) samplingImputer(X)
meanModeImputer(X) samplingImputer(X)
X |
A numeric matrix, where the columns represent independent observations (patients or samples) and the columns represent measured features (genes, proteins, clinical variables, etc). |
We recommend imputing small amounts of missing data in the input data
sets when using the plasma
package. The underlying issue is
that the PLS models we use for individual omics data sets will not be
able to make predictions on a sample if even one data point is
missing. As a result, if a sample is missing at least one data point in
every omics data set, then it will be impossible to use that sample at
all.
For a range of available imputation methods and R packages, consult
the CRAN Task
View on Missing Data. We also recommend the
R-miss-tastic web site on
missing data. Their simulations suggest that, for purposes of
producing predictive models from omics data, the imputation method is
not particularly important. Because of the latter finding, we have
only implemented two simple imputation methods in the plasma
package:
The meanModeImputer
function will replace any missing data by the
mean value of the observed data if there are more than five
distinct values; otherwise, it will replace missing data by the
mode. This approach works relatively well for both continuous
data and for binary or small categorical data.
The samplingImpute
function replaces missing values by sampling
randomly from the observed data distribution.
Both functions return a numeric matrix of the same size and with the same row and column names as the input variable
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
loadESCAdata() imputed <- with(plasmaEnv, lapply(assemble, samplingImputer) ) imputed <- with(plasmaEnv, lapply(assemble, meanModeImputer))
loadESCAdata() imputed <- with(plasmaEnv, lapply(assemble, samplingImputer) ) imputed <- with(plasmaEnv, lapply(assemble, meanModeImputer))
"MultiOmics"
The prepareMultiOmics
function returns a new object of MultiOmics
class for use in fitCoxModel
.
prepareMultiOmics(datalist, outcome) ## S4 method for signature 'MultiOmics' summary(object, ...) ## S4 method for signature 'MultiOmics,missing' plot(x, y, ...)
prepareMultiOmics(datalist, outcome) ## S4 method for signature 'MultiOmics' summary(object, ...) ## S4 method for signature 'MultiOmics,missing' plot(x, y, ...)
datalist |
a list of dataframes formatted to have variables as rows (dimension D) and samples as columns (dimension N). |
outcome |
a dataframe of clinical outcomes formatted to have sample names as row indexes and variable names as column indexes |
object |
An object of the |
x |
An object of the |
y |
Nothing; ignored. |
... |
Extra graphical or other parameters. |
The prepareMultiOmics
function returns a new object of the MultiOmics
class.
Objects should be defined using the prepareMultiOmics
constructor. In
the simplest case, you enter two objects: a list of dataframes and a dataframe of clinical outcomes.
data
:A list of dataframes with variables as rows or varying length and samples as columns of uniform length N, where N is the maximum value of non-missing samples in any given dataset. Note that NA
s have been added to “pad” to make the column length uniform across data types.
outcome
:A dataframe of clinical outcomes with variables as columns and samples as rows.
plot
:Produces a visual representation of the dimensionalities of each dataframe in datalist. D corresponds to the number of variables in each omics dataframe, and N corresponds to samples (or members) whose variable is not entirely missing. Gray areas correspond to missing samples.
summary
:Produces summary tables corresponding to datasets and outcomes.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } MO <- with(plasmaEnv, prepareMultiOmics(datalist = assemble, outcome = Outcome)) plot(MO) summary(MO)
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } MO <- with(plasmaEnv, prepareMultiOmics(datalist = assemble, outcome = Outcome)) plot(MO) summary(MO)
"MultiplePLSCoxModels"
The MultiplePLSCoxModels
object class ...
The validMultipleCoxModels
function checks if each data set contains the same set of samples.
The fitCoxModels
function fits many plsRcoxmodels and returns an S4 object of class MultiplePLSCoxModels
.
The getSizes
function returns a matrix with the list of dataframes of the MultiOmics
object as rownames and columns with NT, cNT, and p-values.
fitCoxModels(multi, timevar, eventvar, eventvalue, verbose) ## S4 method for signature 'MultiplePLSCoxModels' summary(object, ...) ## S4 method for signature 'MultiplePLSCoxModels,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...) ## S4 method for signature 'MultiplePLSCoxModels' predict(object, newdata, type = c("components", "risk", "split", "survfit"), ...)
fitCoxModels(multi, timevar, eventvar, eventvalue, verbose) ## S4 method for signature 'MultiplePLSCoxModels' summary(object, ...) ## S4 method for signature 'MultiplePLSCoxModels,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...) ## S4 method for signature 'MultiplePLSCoxModels' predict(object, newdata, type = c("components", "risk", "split", "survfit"), ...)
multi |
an object of class |
timevar |
a column in the |
eventvar |
a column in the |
eventvalue |
a character string specifying the value of the event in |
verbose |
logical; should the function report progress. |
object |
an object of class |
x |
an object of class |
y |
An ignored argrument for the plot method. |
col |
A vector of color specifications. Default is c(“blue”, “red”). |
lwd |
A vector specifying the line width. Default is “2”. |
xlab |
A character string to label the x-axis. Default is “”. |
ylab |
A character string to label the y-axis. Default is “Fraction Surviving”. |
mark.time |
A logical value; should tickmarks indicate censored data? Default is TRUE. |
legloc |
A character string indicating where to put the legend. Default is “topright”. |
... |
Other graphical parameters. |
newdata |
A |
type |
An enumerated character value. |
The fitCoxModels
function retuns a newly constructed object of
the MultiplePLSCoxModels
class. The plot
method
invisibly returns the object on which it was invoked. The
summary
method returns no value. The predict method returns a
list of prediction results, each of which comes from the
predict
method for the SingleModel-class.
models
:A list of SingleModel
objects, one for each assay.
timevar
:A character matching the name of the column containing the time-to-event.
eventvar
:A character matching the name of the column containing the event.
eventvalue
:A character specifying the event in eventvar.
plot
:Plots Kaplan-Meier curves for each omics dataset split into Low Risk and High Risk groups.
summary
:Returns a description of the
MultiplePLSCoxModels
object and the names of the omics
datasets used to build the model.
predict
:usually returns a list of numeric
vectors of predicted risk per data type. When type = "survfit"
,
retuns a list of survfit
objects.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
fitSingleModel
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") summary(firstPass) plot(firstPass) getSizes(firstPass) pre1 <- predict(firstPass, testD)
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") summary(firstPass) plot(firstPass) getSizes(firstPass) pre1 <- predict(firstPass, testD)
"plasma"
The plasma
object class is returned after running the plasma
function.
The plasma
function uses the PLSRCox
components from one
dataset as the predictor variables and the PLSRCox
components
of another dataset as the response variables to fit a partial least
squares regression (plsr
) model. Then, we take the mean of the
predictions to create a final matrix of samples versus components.
The matrix of components described earlier is then used to fit a Cox
Proportional Hazards (coxph
) model with AIC stepwise variable
selection to return a final object of class plasma
which
includes a coxph
model with a reduced number of predictors.
plasma(object, multi) ## S4 method for signature 'plasma,missing' plot(x, y, ...) ## S4 method for signature 'plasma' barplot(height, source, n, direction = c("both", "up","down"), lhcol = c("cyan", "red"), wt = c("raw", "std"), ...) ## S4 method for signature 'plasma' predict(object, newdata = NULL, type = c("components", "risk", "split"), ...)
plasma(object, multi) ## S4 method for signature 'plasma,missing' plot(x, y, ...) ## S4 method for signature 'plasma' barplot(height, source, n, direction = c("both", "up","down"), lhcol = c("cyan", "red"), wt = c("raw", "std"), ...) ## S4 method for signature 'plasma' predict(object, newdata = NULL, type = c("components", "risk", "split"), ...)
multi |
an object of the |
object |
an object of the |
height |
an object of the |
x |
an object of class |
y |
An ignored argrument for the plot method. |
source |
A length-one character vector; the name of a data set in
a |
n |
A length-one integer vector; the number of high-weight features to display. |
direction |
A length-one character vector; show features with positive weights (up), negative (down), or both. |
lhcol |
A chaacter vector of length 2, indicating the preferred colors for low (negative) or high (positive) weights. |
wt |
A character string indicating whether to plot raw weights or standardized weights. |
newdata |
A |
type |
An enumerated character value. |
... |
Additional graphical parameters. |
The plasma
function returns a newly constructed object of the plasma
class. The plot
method invisibly returns the object on which it was invoked. The predict
method returns an object of the plasmaPredictions
class.
Objects should be defined using the plasma
function.
traindata
:An object of class MultiOmics
used for training the model.
compModels
:A list containing objects in the form of plsr
.
fullModel
:A coxph object with variables (components) selected via AIC stepwise selection.
plot
:Plots a Kaplan-Meier curve of the final coxph
model that has been categorized into “low risk” and “high risk” based whether it is higher or lower, respectively, than the median value of risk.
predict
:creates an object of class plasmaPredictions
.
barplot
:Produces a barplot of the n
largest
weights assigned to features from the appropriate data source
.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
plasmaPredictions, plsr
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome) ) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) plot(pl, legloc = "topright", main = "Training Data") barplot(pl, "RPPA", 6) barplot(pl, "RPPA", 10, "up")
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome) ) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) plot(pl, legloc = "topright", main = "Training Data") barplot(pl, "RPPA", 6) barplot(pl, "RPPA", 10, "up")
"plasmaPredictions"
The plasmaPredictions
object class is returned when running the
predict
method on an object of class plasma
.
## S4 method for signature 'plasmaPredictions,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...)
## S4 method for signature 'plasmaPredictions,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...)
x |
An object of the |
y |
An ignored argument for the plot method. |
col |
A vector of color specifications. Default is c(“blue”, “red”). |
lwd |
A vactor specifying the line width. Default is “2”. |
xlab |
A character string to label the x-axis. Default is “”. |
ylab |
A character string to label the y-axis. Default is “Fraction Surviving”. |
mark.time |
A logical value; should tickmarks indicate censored data? Default is TRUE. |
legloc |
A character string indicating where to put the legend. Default is “topright”. |
... |
Other graphical parameters. |
The predict
method on an object of the plasma
class returns an object of the plasmaPredictions
class. The plot
method invisibly returns the value on which it
was invoked.
Users shold not create objects of this class directly. They will be
automatically created when you apply the predict
method to a
fully worked out plasma
model.
meanPredictions
:A matrix with samples as rows and factors as columns that is a result of taking the mean of the PLS component predictions from each dataset.
riskDF
:Object of type data.frame
containing
the original outcome
dataframe and additional columns for
"Risk", and "Split", corresponding to the risk of the event
calculated by the model, and patient assignment to low versus
high-risk groups, respectively.
riskModel
:Object of type coxph
that uses
predicted Risk (continuous) as the predictor variable and
survival as the response variable. See documentation for
link{coxph}
.
splitModel
:Object of type coxph
that uses
predicted Split (predicted Risk categorized into “high”
and “low” risk by the median predicted Risk) as the
predictor variable and survival as the response variable. See
documentation for link{coxph}
.
SF
:Object of type survfit
which is used by
the plot
method to plot Kaplan-Meier curves grouped by
predicted Split. See documentation for link{survfit}
.
plot
:Produces Kaplan-Meier curves for the low risk and high risk groups.
An object of plasmaPredictions
class contains many models that
are similar to an object of MultiplePLSCoxModels
class.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
plasma
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) testpred <- predict(pl, testD) plot(testpred, main = "Testing", xlab = "Time (Days)")
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } # restrict data set size MO <- with(plasmaEnv, prepareMultiOmics( assemble[c("ClinicalBin", "ClinicalCont", "RPPA")], Outcome)) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] firstPass <- fitCoxModels(trainD, "Days", "vital_status", "dead") pl <- plasma(object = trainD, multi = firstPass) testpred <- predict(pl, testD) plot(testpred, main = "Testing", xlab = "Time (Days)")
"SingleModel"
The fitSingleModel
function takes in an object of
MultiOmics
class and returns a new object of
SingleModel
class.
fitSingleModel(multi, N, timevar, eventvar, eventvalue) ## S4 method for signature 'SingleModel' summary(object, ...) ## S4 method for signature 'SingleModel,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...) ## S4 method for signature 'SingleModel' predict(object, newdata, type = c("components", "risk", "split", "survfit"), ...)
fitSingleModel(multi, N, timevar, eventvar, eventvalue) ## S4 method for signature 'SingleModel' summary(object, ...) ## S4 method for signature 'SingleModel,missing' plot(x, y, col = c("blue", "red"), lwd = 2, xlab = "", ylab = "Fraction Surviving", mark.time = TRUE, legloc = "topright", ...) ## S4 method for signature 'SingleModel' predict(object, newdata, type = c("components", "risk", "split", "survfit"), ...)
multi |
an object of class |
N |
A character string identifying the data set being modeled. |
timevar |
a column in the |
eventvar |
a column in the |
eventvalue |
a character string specifying the value of the event. |
x |
an object of class |
y |
An ignored argrument for the plot method. |
col |
A vector of color specifications. |
lwd |
A vactor specifying the line width. |
xlab |
A character string to label the x-axis. |
ylab |
A character string to label the y-axis. |
mark.time |
A logical value; should tickmarks indicate censored data? |
legloc |
A character string indicating where to put the legend. |
object |
an object of class |
newdata |
A |
type |
An enumerated character value. |
... |
other parameters used in graphing or prediction. |
The fitSingleModel
function returns a newly constructed object
of the SingleModel
class. The plot
method invisibly
returns the value on which it was invoked. The summary
method
returns an object summarizing the final model produced by PLS R cox
regression. The predict
method returns either a vector or
matrix depending on the type of predictions requested.
plsmod
:Object of class plsRcoxmodel
containing the fitted model.
Xout
:Object of type data.frame
containing
the original outcome
dataframe and additional columns for
"Risk", and "Split", corresponding to the risk of the event
calculated by the model, and patient assignment to low versus
high-risk groups, respectively.
dsname
:A character vector of length one; the name
of the data set being modeled from a MultiOmics
object.
SF
:Object of type survfit
which is used by the plot
method to plot Kaplan-Meier curves grouped by predicted Split. See documentation for link{survfit}
.
riskModel
:Object of type coxph
that uses predicted Risk (continuous) as the predictor variable and survival as the response variable. See documentation for link{coxph}
.
splitModel
:Object of type coxph
that uses predicted Split (predicted Risk categorized into “high” and “low” risk by the median predicted Risk) as the predictor variable and survival as the response variable. See documentation for link{coxph}
.
plot
:Plots Kaplan-Meier curves for each omics dataset split into Low Risk and High Risk groups.
summary
:Returns a description of the
MultiplePLSCoxModels
object and the names of the omics
datasets used to build the model.
predict
:Usually, a numeric vector containing
the predicted risk values. However, when using type =
"survfit"
, tghe return value is a survfit
object from
thesurvival
package.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } MO <- with(plasmaEnv, prepareMultiOmics(assemble, Outcome) ) MO <- MO[c("ClinicalBin", "ClinicalCont", "RPPA"),] set.seed(98765) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] zerothPass <- fitSingleModel(trainD, N = "RPPA", timevar = "Days", eventvar = "vital_status", eventvalue = "dead") summary(zerothPass) plot(zerothPass) pre0 <- predict(zerothPass, testD)
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") } MO <- with(plasmaEnv, prepareMultiOmics(assemble, Outcome) ) MO <- MO[c("ClinicalBin", "ClinicalCont", "RPPA"),] set.seed(98765) splitVec <- with(plasmaEnv, rbinom(nrow(Outcome), 1, 0.6)) trainD <- MO[, splitVec == 1] testD <- MO[, splitVec == 0] zerothPass <- fitSingleModel(trainD, N = "RPPA", timevar = "Days", eventvar = "vital_status", eventvalue = "dead") summary(zerothPass) plot(zerothPass) pre0 <- predict(zerothPass, testD)
The TCGA-ESCA
dataset contains the objects assemble
,
Outcome
, and m450info
for building the MultiOmics
object. Because its size exceeds the CRAN limits, the data is stored on
a remote server and must be loaded using the function
loadESCAdata
.
The TCGA-LUSC1
dataset is a parallel object for lung
squamous cell carcinoma (LUSC) data, whihc must be loaded using the
loadLUSCdata
function.
loadESCAdata(env = plasmaEnv) loadLUSCdata(env = plasmaEnv)
loadESCAdata(env = plasmaEnv) loadLUSCdata(env = plasmaEnv)
env |
an environment in which to load the data. The default
value is a private environment in the package, accessible as
|
The “TCGA-ESCA” dataset contains the following:
assemble
A list of 7 different omics dataframes with
varying numbers of features as rows (D) and varying number of
patients as columns (N). Note that some of these omics dataframes
had been manipulated to contain NAs, where these may be complete
on the GDC Dat Portal from which these data originally came. This
was done to illustrate the capability of the plasma
package
on working with missing data.
ClinicalBin
a dataframe (53x185) of clinical binary values.
ClinicalCont
a dataframe (6x185) of clinical continuous values.
MAF
a dataframe (566x184) of minor allele frequencies (MAF) that have been converted to binary based on whether they had a MAF greater than 0.03 (1) or not (0).
Meth450
a dataframe (1454x185) of continuous beta values from the Illumina Infinium HumanMethylation450 arrays. The features in this dataframe have been filtered on mean greater than 0.15 and a standard deviation greater than 0.3.
miRSeq
a dataframe (926x166) of continuous counts values from microRNA (miRNA) sequencing. The features in this dataframe have been filtered on a standard deviation of 0.05.
mRNASeq
a dataframe (2520x157) of continuous counts values from mRNA sequencing data. The features in this dataframe have been filtered on a mean greater than 4 and a standard deviation greater than 0.7.
RPPA
a dataframe (192x126) of continuous protein expression values from reverse phase protein array (RPPA) assays.
Outcome
a dataframe (185x5) containing the survival
outcomes for the patients in assemble
.
m450info
a dataframe (1454x3) containing gene symbol,
chromosome number, and genomic coordinate IDs corresponding to the
features (or “probes”) in Meth450
.
Kevin R. Coombes [email protected], Kyoko Yamaguchi [email protected]
https://portal.gdc.cancer.gov/projects/TCGA-ESCA
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") }
fls <- try(loadESCAdata()) if (inherits(fls, "try-error")) { stop("Unable to load data from remote server.") }