Package 'PreProcess' reference manual

Package 'PreProcess'

Title:	Basic Functions for Pre-Processing Microarrays
Description:	Provides classes to pre-process microarray gene expression data as part of the OOMPA collection of packages described at <http://oompa.r-forge.r-project.org/>.
Authors:	Kevin R. Coombes
Maintainer:	Kevin R. Coombes <[email protected]>
License:	Apache License (== 2.0)
Version:	3.1.7
Built:	2025-03-07 03:03:16 UTC
Source:	https://github.com/r-forge/oompa

Title:

Basic Functions for Pre-Processing Microarrays

Description:

Provides classes to pre-process microarray gene expression data as part of the OOMPA collection of packages described at <http://oompa.r-forge.r-project.org/>.

Authors:

Kevin R. Coombes

Maintainer:

Kevin R. Coombes <[email protected]>

License:

Apache License (== 2.0)

Version:

3.1.7

Built:

2025-03-07 03:03:16 UTC

Source:

https://github.com/r-forge/oompa

An object of the Channel class represents a single kind of measurement performed at all spots of a microarray channel. These objects are essentially just vectors of data, with length equal to the number of spots on the microarray, with some extra metadata attached.

Usage

Channel(parent, name, type, vec)
## S4 method for signature 'Channel,missing'
plot(x, y, ...)
## S4 method for signature 'Channel'
hist(x, breaks=67, xlab=x@name, main=x@parent, ...)
## S4 method for signature 'Channel'
summary(object, ...)
## S4 method for signature 'Channel'
print(x, ...)
## S4 method for signature 'Channel'
show(object)
## S4 method for signature 'Channel'
image(x, main=x@name, sub=NULL, ...)
Channel(parent, name, type, vec)
## S4 method for signature 'Channel,missing'
plot(x, y, ...)
## S4 method for signature 'Channel'
hist(x, breaks=67, xlab=x@name, main=x@parent, ...)
## S4 method for signature 'Channel'
summary(object, ...)
## S4 method for signature 'Channel'
print(x, ...)
## S4 method for signature 'Channel'
show(object)
## S4 method for signature 'Channel'
image(x, main=x@name, sub=NULL, ...)

Arguments

`parent`	character string representing the name of a parent object from which this object was derived
`name`	character string with a displayable name for this object
`type`	object of class `ChannelType`
`vec`	numeric vector
`x`	object of class `Channel`
`y`	nothing; the new Rd format requires documenting missing parameters
`breaks`	see the documentation for the default `hist`
`xlab`	character string specifying the label for x axis
`main`	character string specifying the main title for the plot
`sub`	character string specifying subtitle for the plot
`object`	object of class `Channel`
`...`	extra arguments for generic or plotting routines

Details

As described in the help pages for ChannelType, each microarray hybridization experiment produces one or more channels of data. Channel objects represent a single measurement performed at spots in one microarray channel. The raw data from a full experiment typically contains multiple measurements in multiple channels.

The full set of measurements is often highly processed (by, for example, background subtraction, normalization, log transformation, etc.) before it becomes useful. We have added a history slot that keeps track of how a Channel was produced. By allowing each object to maintain a record of its history, it becomes easier to document the processing when writing up the methods for reports or papers. The history slot of the object is updated using the generic function process together with a Processor object.

Value

The print, hist, and image methods all invisibly return the Channel object on which they were invoked.

The print and summary methods return nothing.

Slots

parent:: character string representing the name of a parent object from which this object was derived.
name:: character string with a displayable name for this object
type:: object of class ChannelType
x:: numeric vector
history:: list that keeps a record of the calls used to produce this object

Methods

print(object, ...): Print all the data on the object. Since this includes the entire data vector, you rarely want to do this.
show(object): Print all the data on the object. Since this includes the entire data vector, you rarely want to do this.
summary(object, ...): Write out a summary of the object.
plot(object, ...): Produce a scatter plot of the measurement values in the slot x of the object against their index , which serves as a surrogate for the position on the microarray. Additional graphical parameters are passed along.
hist(object, ...): Produce a histogram of the data values in slot x of the object. Additional graphical parameters are passed along.
image(object, ...): This method produces a two-dimensional "cartoon" image of the measurement values, with the position in the cartoon corresponding to the two-dimensional arrangement of spots on the actual microarray. Additional graphical parameters are passed along.

Author(s)

Kevin R. Coombes [email protected], P. Roebuck [email protected]

Examples

showClass("Channel")

## simulate a moderately realistic looking microarray
nc <- 100			# number of rows
nr <- 100			# number of columns
v <- rexp(nc*nr, 1/1000)	# "true" signal intensity (vol)
b <- rnorm(nc*nr, 80, 10)	# background noise
s <- sapply(v-b, max, 1)	# corrected signal intensity (svol)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
raw <- Channel(name='fraud', type=ct, parent='', vec=v)
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(nc, nr, v, b, s)		# clean some stuff

summary(subbed)
summary(raw)

par(mfrow=c(2,1))
plot(raw)
hist(raw)

par(mfrow=c(1,1))
image(raw)

## finish the cleanup
rm(ct, raw, subbed)
showClass("Channel")

## simulate a moderately realistic looking microarray
nc <- 100			# number of rows
nr <- 100			# number of columns
v <- rexp(nc*nr, 1/1000)	# "true" signal intensity (vol)
b <- rnorm(nc*nr, 80, 10)	# background noise
s <- sapply(v-b, max, 1)	# corrected signal intensity (svol)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
raw <- Channel(name='fraud', type=ct, parent='', vec=v)
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(nc, nr, v, b, s)		# clean some stuff

summary(subbed)
summary(raw)

par(mfrow=c(2,1))
plot(raw)
hist(raw)

par(mfrow=c(1,1))
image(raw)

## finish the cleanup
rm(ct, raw, subbed)

Method "channelize"

Description

channelize is a generic function used to propagate the class of derived objects through a processing pipeline.

Usage

## S4 method for signature 'ANY'
channelize(object, ...)
## S4 method for signature 'ANY'
channelize(object, ...)

Arguments

`object`	an object for which pipeline propagation is desired
`...`	additional arguments affecting the elapsed time produced

Details

Having abstracted away the notion of extracting a particular measurement from a CompleteChannel object and producing a simple Channel, we need a way to allow object-oriented programming and derived classes to work with our Processor and Pipeline routines. The underlying idea is that specific kinds of microarrays or specific software to quantify microarrays might have special properties that should be exploited in processing. For example, the first few generations of microarrays printed at M.D. Anderson spotted every cDNA clone in duplicate. The analysis of such arrays should exploit this additional structure. In order to do so, we must derive classes from CompleteChannel and Channel and ensure that the classes of extracted objects are propagated correctly through the processing pipeline. The channelize method achieves this goal.

Value

Returns a string, which represents the name of a class (suitable for passing to the new constructor) extracted from an object belonging to a class derived from CompleteChannel.

Note

The sections above document the method's usage by OOMPA's pipeline, not the actual intent of the generic itself.

Author(s)

Kevin R. Coombes [email protected], P. Roebuck [email protected]

Class "ChannelType"

Description

This class represents the "type" of a microarray channel.

Usage

ChannelType(mk, md, nc, nr, gl, design="")
setDesign(object, design)
getDesign(object)
## S4 method for signature 'ChannelType'
print(x, ...)
## S4 method for signature 'ChannelType'
show(object)
## S4 method for signature 'ChannelType'
summary(object, ...)
ChannelType(mk, md, nc, nr, gl, design="")
setDesign(object, design)
getDesign(object)
## S4 method for signature 'ChannelType'
print(x, ...)
## S4 method for signature 'ChannelType'
show(object)
## S4 method for signature 'ChannelType'
summary(object, ...)

Arguments

`mk`	character string specifying the name of the manufacturer of the microarray (e.g., 'Affymetrix')
`md`	character string specifying the model of the microarray (e.g., 'Hu95A')
`nc`	scalar integer specifying the number of columns in the array
`nr`	scalar integer specifying the number of rows in the array
`gl`	character string specifying the material used to label samples
`design`	character string containing the name of an object describing details about the design of the microarray
`object`	object of class `ChannelType`
`x`	object of class `ChannelType`
`...`	extra arguments for generic or plotting routines

Details

Microarrays come in numerous flavors. At present, the two most common types are the synthesized oligonucleotide arrays produced by Affymetrix and the printed cDNA arrays on glass, which started in Pat Brown's lab at Stanford. In earlier days, it was also common to find nylon microarrays, with the samples labeled using a radioactive isotope. The glass arrays are distinguished from other kinds of arrays in that they typically cohybridize two different samples simultaneously, using two different fluorescent dyes. The fluorescence from each dye is scanned separately, producing two images and thus two related sets of data from the same microarray. We refer to these parallel data sets within an array as “channels”.

An object of the ChannelType class represents a combination of the kind of microarray along with the kind of labeling procedure. These objects are intended to be passed around as part of more complex objects representing the actual gene expression data collected from particular experiments, in order to be able to eventually tie back into the description of what spots were laid down when the array was produced.

The ChannelType object only contains a high level description of the microarray, however. Detailed information about what biological material was laid down at each spot on the microarray is stored elsewhere, in a “design” object. Within a ChannelType object, the design is represented simply by a character string. This string should be the name of a separate object containing the detailed design information. This implementation allows us to defer the design details until later. It also saves space by putting the details in a single object instead of copying them into every microarray. Finally, it allows that single object to be updated when better biological annotations are available, with the benefits spreading immediately to all the microarray projects that use that design.

Value

The ChannelType constructor returns a valid object of the class.

The setDesign function invisibly returns the ChannelType object on which it was invoked.

The getDesign function returns the design object referred to by the design slot in the ChannelType object. If this string does not evaluate to the name of an object, then getDesign returns a NULL value.

Slots

maker:: character string specifying the name of the manufacturer of the microarray
model:: character string specifying the model of the microarray
nCol:: scalar integer specifying number of columns in the array
nRow:: scalar integer specifying number of rows in the array
glow:: character string specifying the material used to label samples
design:: character string containing the name of an object describing details about the design of the microarray

Methods

print(x, ...): Prints all the information in the object
show(object): Prints all the information in the object
summary(object, ...): Writes out a summary of the object

Author(s)

Kevin R. Coombes [email protected], P. Roebuck [email protected]

Examples

showClass("ChannelType")

x <- ChannelType('Affymetrix', 'oligo', 100, 100, 'fluor')
x
print(x)
summary(x)

y <- setDesign(x, 'fake.design')
print(y)
summary(y)
d <- getDesign(y)
d

rm(d, x, y) # cleanup
showClass("ChannelType")

x <- ChannelType('Affymetrix', 'oligo', 100, 100, 'fluor')
x
print(x)
summary(x)

y <- setDesign(x, 'fake.design')
print(y)
summary(y)
d <- getDesign(y)
d

rm(d, x, y) # cleanup

Class "CompleteChannel"

Description

An object of the CompleteChannel class represents one channel (red or green) of a two-color fluorescence microarray experiment. Alternatively, it can also represent the entirety of a radioactive microarray experiment. Affymetrix experiments produce data with a somewhat different structure because they use multiple probes for each target gene.

Usage

CompleteChannel(name, type, data)
## S4 method for signature 'CompleteChannel'
print(x, ...)
## S4 method for signature 'CompleteChannel'
show(object)
## S4 method for signature 'CompleteChannel'
summary(object, ...)
## S4 method for signature 'CompleteChannel'
as.data.frame(x, row.names=NULL, optional=FALSE)
## S4 method for signature 'CompleteChannel,missing'
plot(x, main=x@name, useLog=FALSE, ...)
## S4 method for signature 'CompleteChannel'
image(x, ...)
## S4 method for signature 'CompleteChannel'
analyze(object, useLog=FALSE, ...)
## S4 method for signature 'CompleteChannel,Processor'
process(object, action, parameter)
## S4 method for signature 'CompleteChannel'
channelize(object, ...)
CompleteChannel(name, type, data)
## S4 method for signature 'CompleteChannel'
print(x, ...)
## S4 method for signature 'CompleteChannel'
show(object)
## S4 method for signature 'CompleteChannel'
summary(object, ...)
## S4 method for signature 'CompleteChannel'
as.data.frame(x, row.names=NULL, optional=FALSE)
## S4 method for signature 'CompleteChannel,missing'
plot(x, main=x@name, useLog=FALSE, ...)
## S4 method for signature 'CompleteChannel'
image(x, ...)
## S4 method for signature 'CompleteChannel'
analyze(object, useLog=FALSE, ...)
## S4 method for signature 'CompleteChannel,Processor'
process(object, action, parameter)
## S4 method for signature 'CompleteChannel'
channelize(object, ...)

Arguments

`name`	character string specifying the name of the object
`type`	object of class `ChannelType`
`data`	data frame. For the pre-defined “extraction” processors to work correctly, this should include columns called `vol`, `bkgd`, `svol`, `SD`, and `SN`.
`x`	object of class `CompleteChannel`
`object`	object of class `CompleteChannel`
`main`	character string specifying the title for the plot
`useLog`	logical scalar. If `TRUE`, convert to logarithmic values.
`action`	object of class `Processor` used to process a `CompleteChannel`
`parameter`	any object that makes sense as a parameter to the function represented by the `Processor` `action`
`row.names`	See `as.data.frame`
`optional`	See `as.data.frame`
`...`	extra arguments for generic or plotting routines

Details

The names come from the default column names in the ArrayVision software package used at M.D. Anderson for quantifying glass or nylon microarrays. Column names used by other software packages should be mapped to these.

Value

The analyze method returns a list of three density functions.

The return value of the process function depends on the Processor performing the action, but is typically a Channel object.

Graphical methods invisibly return the object on which they were invoked.

Slots

name:: character string containing the name of the object
type:: object of class ChannelType
data:: data frame
history:: list that keeps a record of the calls used to produce this object

Methods

print(x, ...): Print all the data on the object. Since this includes the data frame, you rarely want to do this.
show(object): Print all the data on the object. Since this includes the data frame, you rarely want to do this.
summary(object, ...): Write out a summary of the object.
as.data.frame(x,row.names=NULL, optional=FALSE): Convert the CompleteChannel object into a data frame. As you might expect, this simply returns the data frame in the data slot of the object.
plot(x, useLog=FALSE, ...): Produces three estimated density plots: one for the signal, one for the background, and one for the background-corrected signal. Additional graphical parameters are passed along. The logical flag useLog determines whether the data are log-transformed before estimating and plotting densities.
analyze(object, useLog=FALSE, ...): This method computes the estimated probability density functions for the three data components (signal, background, and background-corrected signal), and returns them as a list.
image(object, ...): Uses the image method for Channel objects to produce geographically aligned images of the log-transformed intensity and background estimates.
channelize(object, ...): character string giving the name of the class of a channel that is produced when you process a CompleteChannel object.
process(object, action, parameter=NULL): Use the Processor action to process the CompleteChannel object. Returns an object of the class described by channelize, which defaults to Channel.

Pre-defined Processors

The library comes with several Processor objects already defined; each one takes a CompleteChannel as input, extracts a single value per spot, and produces a Channel as output.

PROC.BACKGROUND: Extract the vector of local background measurements.
PROC.SIGNAL: Extract the vector of foreground signal intensity measurements.
PROC.CORRECTED.SIGNAL: Extract the vector of background-corrected signal measurements. Note that many software packages automatically truncate these value below at zero, so this need not be the same as SIGNAL - BACKGROUND.
PROC.NEG.CORRECTED.SIGNAL: Extract the vector of background-corrected signal intensities by subtracting the local background from the observed foreground, without truncation.
PROC.SD.SIGNAL: Extract the vector of pixel standard deviations of the signal intensity.
PROC.SIGNAL.TO.NOISE: Extract the vector of signal-to-noise ratios, defined as CORRECTED.SIGNAL divided by the standard deviation of the background pixels.

Author(s)

Kevin R. Coombes [email protected], P. Roebuck [email protected]

Examples

showClass("CompleteChannel")

## simulate a complete channel object
v <- rexp(10000, 1/1000)
b <- rnorm(10000, 60, 6)
s <- sapply(v-b, function(x) {max(0, x)})
ct <- ChannelType('user', 'random', 100, 100, 'fake')
x <- CompleteChannel(name='fraud', type=ct,
                      data=data.frame(vol=v, bkgd=b, svol=s))
rm(v, b, s, ct)

summary(x)

opar <- par(mfrow=c(2,3))
plot(x)
plot(x, main='Log Scale', useLog=TRUE)
par(opar)

opar <- par(mfrow=c(2,1))
image(x)
par(opar)

b <- process(x, PROC.NEG.CORRECTED.SIGNAL)
summary(b)

q <- process(b, PIPELINE.STANDARD)
summary(q)

q <- process(x, PIPELINE.MDACC.DEFAULT)
summary(q)

## cleanup
rm(x, b, q, opar)
showClass("CompleteChannel")

## simulate a complete channel object
v <- rexp(10000, 1/1000)
b <- rnorm(10000, 60, 6)
s <- sapply(v-b, function(x) {max(0, x)})
ct <- ChannelType('user', 'random', 100, 100, 'fake')
x <- CompleteChannel(name='fraud', type=ct,
                      data=data.frame(vol=v, bkgd=b, svol=s))
rm(v, b, s, ct)

summary(x)

opar <- par(mfrow=c(2,3))
plot(x)
plot(x, main='Log Scale', useLog=TRUE)
par(opar)

opar <- par(mfrow=c(2,1))
image(x)
par(opar)

b <- process(x, PROC.NEG.CORRECTED.SIGNAL)
summary(b)

q <- process(b, PIPELINE.STANDARD)
summary(q)

q <- process(x, PIPELINE.MDACC.DEFAULT)
summary(q)

## cleanup
rm(x, b, q, opar)

Methods "process" and "analyze"

Description

New generic functions for processing and analyzing microarrays.

Usage

## S4 method for signature 'ANY'
process(object, action, parameter=NULL)
## S4 method for signature 'ANY'
analyze(object, ...)
## S4 method for signature 'ANY'
process(object, action, parameter=NULL)
## S4 method for signature 'ANY'
analyze(object, ...)

Arguments

`object`	any OOMPA class representing a microarrays or a set of microarrays
`action`	the action to process the class
`parameter`	any parameters needed to execute the process
`...`	extra arguments for generic routines

Details

In general, the analyze method represents an expensive computational step carried out in preparation for a graphical display, but the semantics may differ from class to class. The default implementation of the method performs the null analysis; that is, the return value is identical to the object that is passed in as the first argument.

The process method represents a function that acts on the data of some object to process it in some way. For example, normalizing a set of microarray data is typically one processing step in a long series that is required to take the raw data and turn it into something useful.

Value

The form of the value returned by either process or analyze depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.

Author(s)

Kevin R. Coombes [email protected], P. Roebuck [email protected]

OOMPA graphical utility functions

Description

Utility functions for graphics.

Usage

ellipse(a, b, x0=0, y0=0, ...)
f.qq(x, main="", cut=0, ...)
f.qt(x, df, main="", cut=0, ...)
ellipse(a, b, x0=0, y0=0, ...)
f.qq(x, main="", cut=0, ...)
f.qt(x, df, main="", cut=0, ...)

Arguments

`a`	Half the length of the elliptical axis in the x-direction
`b`	Half the length of the elliptical axis in the y-direction
`x0`	X-coordinate of the center of the ellipse
`y0`	Y-coordinate of the center of the ellipse
`main`	A text string
`cut`	A real number
`df`	An integer; the number of degrees of freedom in the t-test
`...`	Additional graphical parameters passed on to lower-level functions
`x`	A numeric vector

Details

The ellipse function draws an ellipse on an existing plots. The ellipses produced by this function are oriented with their major and minor axes parallel to the coordinate axes. The current implementation uses points internally.

The function f.qq is a wrapper that combines qqnorm and qqline into a single function call.

The function f.qt is a wrapper that produces quantile-quantile plots comparing the observed vector x with a T-distribution.

Author(s)

Kevin R. Coombes [email protected]

Examples

x <- rnorm(1000, 1, 2)
y <- rnorm(1000, 1, 2)
plot(x,y)
ellipse(1, 1, col=6, type='l', lwd=2)
ellipse(3, 2, col=6, type='l', lwd=2)
f.qq(x, main='Demo', col='blue')
f.qq(x, cut=3)
f.qt(x, df=3)
f.qt(x, df=40)
x <- rnorm(1000, 1, 2)
y <- rnorm(1000, 1, 2)
plot(x,y)
ellipse(1, 1, col=6, type='l', lwd=2)
ellipse(3, 2, col=6, type='l', lwd=2)
f.qq(x, main='Demo', col='blue')
f.qq(x, cut=3)
f.qt(x, df=3)
f.qt(x, df=40)

OOMPA Matrix Utility Functions

Description

Utility functions for manipulating matrices.

Usage

flipud(x)
fliplr(x)
flipud(x)
fliplr(x)

Arguments

x

a matrix

Value

The flipud function returns a matrix the same size as x, with the order of the rows reversed, so the matrix has been flipped vertically. The fliplr function returns a matrix the same size as x but flipped horizontally, with the order of the columns reversed.

Author(s)

Kevin R. Coombes [email protected]

Examples

mat <- matrix(1:6, 2, 3)
mat
flipud(mat)
fliplr(mat)
mat <- matrix(1:6, 2, 3)
mat
flipud(mat)
fliplr(mat)

Class "Pipeline"

Description

A Pipeline represents a standard multi-step procedure for processing microarray data. A Pipeline represents a series of Processors that should be applied in order. You can think of a pipeline as a completely defined (and reusable) set of transformations that is applied uniformly to every microarray in a data set.

Usage

## S4 method for signature 'ANY,Pipeline'
process(object, action, parameter=NULL)
## S4 method for signature 'Pipeline'
summary(object, ...)
makeDefaultPipeline(ef = PROC.SIGNAL, ep = 0,
                    nf = PROC.GLOBAL.NORMALIZATION, np = 0,
                    tf = PROC.THRESHOLD, tp = 25,
                    lf = PROC.LOG.TRANSFORM, lp = 2,
                    name = "standard pipe",
                    description = "my method")
## S4 method for signature 'ANY,Pipeline'
process(object, action, parameter=NULL)
## S4 method for signature 'Pipeline'
summary(object, ...)
makeDefaultPipeline(ef = PROC.SIGNAL, ep = 0,
                    nf = PROC.GLOBAL.NORMALIZATION, np = 0,
                    tf = PROC.THRESHOLD, tp = 25,
                    lf = PROC.LOG.TRANSFORM, lp = 2,
                    name = "standard pipe",
                    description = "my method")

Arguments

`object`	In the `process` method, any object appropriate for the input to the `Pipeline`. In the `summary` method, a `Pipeline` object.
`action`	A `Pipeline` object used to process an object.
`parameter`	Irrelevant, since the `Pipeline` ignores the parameter when `process` is invoked.
`...`	Additional arguments are as in the underlying generic methods.
`ef`	“Extractor function”: First `Processor` in the `Pipeline`, typically a method that extracts a single kind of raw measurement from a microarray
`ep`	Default parameter value for `ef`
`nf`	“Normalization function” : Second `Processor` in the `Pipeline`, typically a normalization step.
`np`	Default parameter value for `nf`
`tf`	“Threshold function” : Third `Processor` in the `Pipeline`, typically a step that truncates data below at some threshold.
`tp`	Default parameter value for `tf`
`lf`	“Log function” : Fourth `Processor` in the `Pipeline`, typically a log transformation.
`lp`	Default parameter value for `lf`
`name`	A string; the name of the pipeline
`description`	A string; a longer description of the pipeline

Details

A key feature of a Pipeline is that it is supposed to represent a standard algorithm that is applied to all objects when processing a microarray data set. For that reason, the parameter that can be passed to the process function is ignored, ensuring that the same parameter values are used to process all objects. By contrast, each Processor that is inserted into a Pipeline allows the user to supply a parameter that overrides its default value.

We provide a single constructor, makeDefaultPipeline to build a specialized kind of Pipeline, tailored to the analysis of fluorescently labeled single channels in a microarray experiment. More general Pipelines can be constructed using new.

Value

The return value of the generic function process is always an object related to its input, which keeps a record of its history. The precise class of the result depends on the functions used to create the Pipeline.

Slots

proclist:: A list of Processor objects.
name:: A string containing the name of the object
description:: A string containing a longer description of the object

Methods

process(object, action, parameter): Apply the series of functions represented by the Pipeline action to the object, updating its history appropriately. The parameter is ignored, since the Pipeline always uses its default values.
summary(object, ...): Write out a summary of the object.

Pre-defined Pipelines

The library comes with two Pipeline objects already defined

PIPELINE.STANDARD: Takes a Channel object as input. Performs global normalization by rescaling the 75th percentile to 1000, truncates below at 25, then performs log (base-two) transformation.
PIPELINE.MDACC.DEFAULT: Takes a CompleteChannel as input, extracts the raw signal intensity, and then performs the same processing as PIPELINE.STANDARD.

Author(s)

Kevin R. Coombes [email protected]

Examples

showClass("Pipeline")

## simulate a moderately realistic looking microarray
nc <- 100
nr <- 100
v <- rexp(nc*nr, 1/1000)
b <- rnorm(nc*nr, 80, 10)
s <- sapply(v-b, max, 1)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(ct, nc, nr, v, b, s)		# clean some stuff

## example of standard data processing
processed <- process(subbed, PIPELINE.STANDARD)

summary(processed)

par(mfrow=c(2,1))
plot(processed)
hist(processed)

par(mfrow=c(1,1))
image(processed)

rm(subbed, processed)
showClass("Pipeline")

## simulate a moderately realistic looking microarray
nc <- 100
nr <- 100
v <- rexp(nc*nr, 1/1000)
b <- rnorm(nc*nr, 80, 10)
s <- sapply(v-b, max, 1)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(ct, nc, nr, v, b, s)		# clean some stuff

## example of standard data processing
processed <- process(subbed, PIPELINE.STANDARD)

summary(processed)

par(mfrow=c(2,1))
plot(processed)
hist(processed)

par(mfrow=c(1,1))
image(processed)

rm(subbed, processed)

Class "Processor"

Description

A Processor represents a function that acts on the data of a some object to process it in some way. The result is always another related object, which should record some history about exactly how it was processed.

Usage

## S4 method for signature 'Channel,Processor'
process(object, action, parameter=NULL)
## S4 method for signature 'Processor'
summary(object, ...)
## S4 method for signature 'Channel,Processor'
process(object, action, parameter=NULL)
## S4 method for signature 'Processor'
summary(object, ...)

Arguments

`object`	In the `process` method, a `Channel` object. In the `summary` method, a `Processor` object
`action`	A `Processor` object used to process a `Channel`.
`parameter`	Any object that makes sense as a parameter to the function represented by the `Processor` `action`
`...`	Additional arguments are as in the underlying generic methods.

Value

The return value of the generic function process is always an object related to its Channel input, which keeps a record of its history. The precise class of the result depends on the function used to create the Processor.

Slots

f:: A function that will be used to process microarray-related object
default:: The default value of the parameters to the function f
name:: A string containing the name of the object
description:: A string containing a longer description of the object

Methods

process(object, action, parameter): Apply the function represented by action to the Channel object, updating the history appropriately. If the parameter is NULL, then use the default value.
summary(object, ...): Write out a summary of the object.

Pre-defined Processors

The library comes with several Processor objects already defined; each one takes a Channel as input and produces a modified Channel as output.

PROC.SUBTRACTOR: Subtracts a global constant (default: 0) from the data vector in the Channel.
PROC.THRESHOLD: Truncates the data vector below, replacing the values below a threshold (default: 0) with the threshold value.
PROC.GLOBAL.NORMALIZATION: Normalizes the data vector in the Channel by dividing by a global constant. If the parameter takes on its default value of 0, then divide by the 75th percentile.
PROC.LOG.TRANSFORM: Performs a log transformation of the data vector. The parameter specifies the base of the logarithm (default: 2).
PROC.MEDIAN.EXPRESSED.NORMALIZATION: Normalizes the data vector by dividing by the median of the expressed genes, where “expressed” is taken to mean “greater than zero”.
PROC.SUBSET.NORMALIZATION: Normalizes the data vector by dividing by the median of a subset of genes. When the parameter has a default value of 0, then this method uses the global median. Otherwise, the parameter should be set to a logical or numerical vector that selects the subset of genes to be used for normalization.
PROC.SUBSET.MEAN.NORMALIZATION: Normalizes the data vector by dividing by the mean of a subset of genes. When the parameter has a default value of 0, then this method uses the global mean. Otherwise, the parameter should be set to a logical or numerical vector that selects the subset of genes to be used for normalization.

Author(s)

Kevin R. Coombes [email protected]

Examples

showClass("Processor")

## simulate a moderately realistic looking microarray
nc <- 100
nr <- 100
v <- rexp(nc*nr, 1/1000)
b <- rnorm(nc*nr, 80, 10)
s <- sapply(v-b, max, 1)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(ct, nc, nr, v, b, s)		# clean some stuff

## example of standard data processing
nor <- process(subbed, PROC.GLOBAL.NORMALIZATION)
thr <- process(nor, PROC.THRESHOLD, 25)
processed <- process(thr, PROC.LOG.TRANSFORM, 2)

summary(processed)

par(mfrow=c(2,1))
plot(processed)
hist(processed)

par(mfrow=c(1,1))
image(processed)

rm(nor, thr, subbed, processed)
showClass("Processor")

## simulate a moderately realistic looking microarray
nc <- 100
nr <- 100
v <- rexp(nc*nr, 1/1000)
b <- rnorm(nc*nr, 80, 10)
s <- sapply(v-b, max, 1)
ct <- ChannelType('user', 'random', nc, nr,  'fake')
subbed <- Channel(name='fraud', parent='', type=ct, vec=s)
rm(ct, nc, nr, v, b, s)		# clean some stuff

## example of standard data processing
nor <- process(subbed, PROC.GLOBAL.NORMALIZATION)
thr <- process(nor, PROC.THRESHOLD, 25)
processed <- process(thr, PROC.LOG.TRANSFORM, 2)

summary(processed)

par(mfrow=c(2,1))
plot(processed)
hist(processed)

par(mfrow=c(1,1))
image(processed)

rm(nor, thr, subbed, processed)

OOMPA Statistical Utility Functions

Description

Utility functions for statistical computations.

Usage

f.above.thresh(a, t)
f.cord(x, y, inf.rm)
f.oneway.rankings(r, s)
f.above.thresh(a, t)
f.cord(x, y, inf.rm)
f.oneway.rankings(r, s)

Arguments

`a`	a vector
`t`	a real number
`x`	a vector
`y`	a vector
`inf.rm`	a logical value
`r`	vector
`s`	vector

Value

f.above.thresh returns the fraction of elements in the vector a that are greater than the threshold t.

f.cord returns the concordance coefficient between the two input vectors x and y. If inf.rm is true, then infinite values are removed before computing the concordance; missing values are always removed.

f.oneway.rankings is implemented as order(s)[r] and I cannot recall why we defined it or where we used it.

Author(s)

Kevin R. Coombes [email protected]

Examples

x <- rnorm(1000, 1, 2)
y <- rnorm(1000, 1, 2)
f.above.thresh(x, 0)
f.above.thresh(y, 0)
f.cord(x, y)
x <- rnorm(1000, 1, 2)
y <- rnorm(1000, 1, 2)
f.above.thresh(x, 0)
f.above.thresh(y, 0)
f.cord(x, y)

Package 'PreProcess'

Help Index

Class "Channel"

Description

Usage

Arguments

Details

Value

Slots

Methods

Author(s)

See Also

Examples

Method "channelize"

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Class "ChannelType"

Description

Usage

Arguments

Details

Value

Slots

Methods

Author(s)

See Also

Examples

Class "CompleteChannel"

Description

Usage

Arguments

Details

Value

Slots

Methods

Pre-defined Processors

Author(s)

See Also

Examples

Methods "process" and "analyze"

Description

Usage

Arguments

Details

Value

Author(s)

See Also

OOMPA graphical utility functions

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

OOMPA Matrix Utility Functions

Description

Usage

Arguments

Value

Author(s)

Examples

Class "Pipeline"

Description

Usage

Arguments

Details

Value

Slots

Methods

Pre-defined Pipelines

Author(s)

See Also

Examples