Title: | Rmetrics - Regression Based Decision and Prediction |
---|---|
Description: | A collection of functions for linear and non-linear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R. |
Authors: | Diethelm Wuertz [aut], Tobias Setz [aut], Yohan Chalabi [aut], Paul J. Northrop [cre, ctb] |
Maintainer: | Paul J. Northrop <[email protected]> |
License: | GPL (>= 2) |
Version: | 4021.83.9000 |
Built: | 2024-12-14 03:02:47 UTC |
Source: | https://github.com/r-forge/rmetrics |
The Rmetrics "fRegression" package is a collection of functions for linear and non-linear regression modelling.
Package: | fRegression |
Type: | Package |
Version: | R 3.0.1 |
Date: | 2014 |
License: | GPL Version 2 or later |
Copyright: | (c) 1999-2014 Rmetrics Association |
Repository: | R-FORGE |
URL: | https://www.rmetrics.org |
Regression modelling, especially linear modelling, LM, is a widely used application in financial engineering. In finance it mostly appears in form that a variable is modelled as a linear or more complex relationship as a function of other variables. For example the decision of buying or selling in a trading model may be triggered by the outcome of a regression model, e.g. neural networks are a well known tool in this field.
Rmetrics has build a unique interface to several regression
models available in the base and contributed packages of R.
The following regression models are interfaced and
available through a common function regFit
. The
argument use
allows to select the desired model:
regFit fits regression models - lm fits a linear model [stats] - rlm fits a LM by robust regression [MASS] - glm fits a generliazed linear model [stats] - gam fits a generlized additive model [mgcv] - ppr fits a projection pursuit regression model [stats] - nnet fits a single hidden-layer neural network model [nnet] - polymars fits an adaptive polynomial spline regression [polspline]
An advantage of the regFit
function is, that all the
underlying functions of its family can be called with the same
list of arguments, and the value returned is always an unique
object, an object of class "fREG"
with the following slots:
@call
, @formula
, @method
, @data
,
@fit
, @residuals
, @fitted
, @title
,
and @description
.
Furthermore, independent of the selected regression model applied
we can use the same S4 methods for all types of regressions. This
includes, print
,plot
, summary
, predict
,
fitted
, residuals
, coef
, vcov
, and
formula
methods.
It is possible to add further regression models to this framework
either his own implementations or implementations available through
other contributed R packages. Suggestions include biglm
,
earth
amongst others.
contains a function to simulate artificial regression models, mostly used for testing.
regSim simulates artificial regression model data sets
These generic functions are:
fitted extracts fitted values from a fitted 'fREG' object residuals extracts residuals from a fitted 'fREG' object coef extracts coefficients from a fitted 'fREG' object formula extracts formula expression from a fitted 'fREG' object vcov extracts variance-covariance matrix of fitted parameters
The function predict
returns predicted values based on the
fitted model object.
predict forecasts from an object of class 'fREG'
For printing and plotting use the functions:
print prints the results from a regression fit plot plots the results from a gression fit summary returns a summary report
The fRegression
Rmetrics package is written for educational
support in teaching "Computational Finance and Financial Engineering"
and licensed under the GPL.
Extracts coefficients from a fitted regression model.
Generic function.
Extractor function for coefficients.
coef
is a generic function which extracts the coefficients
from objects returned by modeling functions, here the regFit
and gregFit
parameter estimation functions.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## coef - coef(fit)
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## coef - coef(fit)
Extracts fitted values from a fitted regression model.
Generic function
Extractor function for fitted values.
fitted
is a generic function which extracts fitted values
from objects returned by modeling functions, here the regFit
and gregFit
parameter estimation functions.
The class of the fitted values is the same as the class of the
data input to the function regFit
or gregFit
. In
contrast the slot fitted
returns a numeric vector.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x.df = regSim(model = "LM3", n = 50) ## regFit - # Use data.frame input: fit = regFit(Y ~ X1 + X2 + X3, data = x.df, use = "lm") ## fitted - val = slot(fit, "fitted") head(val) class(val) val = fitted(fit) head(val) class(val) ## regFit - # Convert to dummy timeSeries Object: library(timeSeries) x.tS = as.timeSeries(x.df) fit = regFit(Y ~ X1 + X2 + X3, data = x.tS, use = "lm") ## fitted - val = slot(fit, "fitted") head(val) class(val) val = fitted(fit) head(val) class(val)
## regSim - x.df = regSim(model = "LM3", n = 50) ## regFit - # Use data.frame input: fit = regFit(Y ~ X1 + X2 + X3, data = x.df, use = "lm") ## fitted - val = slot(fit, "fitted") head(val) class(val) val = fitted(fit) head(val) class(val) ## regFit - # Convert to dummy timeSeries Object: library(timeSeries) x.tS = as.timeSeries(x.df) fit = regFit(Y ~ X1 + X2 + X3, data = x.tS, use = "lm") ## fitted - val = slot(fit, "fitted") head(val) class(val) val = fitted(fit) head(val) class(val)
Extracts formula from a fitted regression model.
Generic function
Formula
formula
is a generic function which extracts the formula
expression from objects returned by modeling functions, here the
regFit
and gregFit
parameter estimation function.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## formula - formula(fit)
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## formula - formula(fit)
The class 'fREG' represents a fitted model of an heteroskedastic time series process.
Objects can be created by calls of the function regFit
.
The returned object represents parameter estimates of linear and
generalized linear models.
call
:Object of class "call"
:
the call of the garch
function.
formula
:Object of class "formula"
:
the formula used in parameter estimation.
family
:Object of class "character"
:
the family objects provide a convenient way to specify
the details of the models used by function grefFit
For details we refer to the documentation for the function
glm
in R's base package on how such model fitting
takes place.
method
:Object of class "character"
:
a string denoting the regression model in use, i.e. one
of those listed in the use
argument of the function
regFit
or gregFit
.
data
:Object of class "list"
:
a list with at least two entries named x
containing the
data frame used for the estimation, and data
with the
object of the rectangular input data.
fit
:Object of class "list"
:
a list with the results from the parameter estimation. The entries
of the list depend on the selected algorithm, see below.
residuals
:Object of class "numeric"
:
a numeric vector with the residual values.
fitted
:Object of class "numeric"
:
a numeric vector with the fitted values.
title
:Object of class "character"
:
a title string.
description
:Object of class "character"
:
a string with a brief description.
signature(object = "fREG")
:
prints an object of class 'fREG'.
signature(x = "fREG", y = "missing")
:
plots an object of class 'fREG'.
signature(object = "fREG")
:
summarizes results and diagnostic analysis of an object
of class 'fREG'.
signature(object = "fREG")
:
forecasts mean and volatility from an object of class 'fREG'.
signature(object = "fREG")
:
extracts fitted values from an object of class 'fREG'.
signature(object = "fREG")
:
extracts fresiduals from an object of class 'fREG'.
signature(object = "fREG")
:
extracts fitted coefficients from an object of class 'fREG'.
signature(x = "fREG")
:
extracts formula expression from an object of class 'fREG'.
Diethelm Wuertz and Rmetrics Core Team.
Plots results obtained from a fitted regression model.
## S4 method for signature 'fREG,missing' plot(x, which = "ask", ...)
## S4 method for signature 'fREG,missing' plot(x, which = "ask", ...)
x |
an object of class 'fREG'. |
which |
a character string selecting which plot should be displayed.
By default |
... |
additional arguments to be passed to the underlying plot functions. |
The plots are a set of graphs which are common to the regression
models implemented in the function regFit
. This includes
linear regression models use="lm"
,
robust linear regression models use="rlm"
,
generalized linear regression models use = "glm"
,
generalized additive regression models use = "gam"
,
projection pursuit regression models use = "ppr"
,
neural network regression models use = "nnet"
, and
polychotomous MARS models use = "polymars"
.
In addition one can also use the original plot functions of the
original models, .e.g. plot(slot(object, "fit")
.
Generic function.
Plot function to display results obtained from a fitted regression model.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## plot -
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## plot -
Predicts a time series from a fitted regression model.
## S4 method for signature 'fREG' predict(object, newdata, se.fit = FALSE, type = "response", ...)
## S4 method for signature 'fREG' predict(object, newdata, se.fit = FALSE, type = "response", ...)
newdata |
new data. |
object |
an object of class |
se.fit |
a logical flag. Should standard errors be included?
By default |
type |
a character string by default |
... |
arguments to be passed. |
returns ...
Generic function
Predict method for regression models.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
Estimates the parameters of a regression model.
regFit(formula, data, family = gaussian, use = c("lm", "rlm", "glm", "gam", "ppr", "nnet", "polymars"), title = NULL, description = NULL, ...)
regFit(formula, data, family = gaussian, use = c("lm", "rlm", "glm", "gam", "ppr", "nnet", "polymars"), title = NULL, description = NULL, ...)
data |
|
description |
a brief description of the project of type character. |
family |
a description of the error distribution and link function to be
used in |
formula |
a symbolic description of the model to be fit.
|
use |
denotes the regression method by a character string used to fit
the model.
|
title |
a character string which allows for a project title. |
... |
additional optional arguments to be passed to the underlying
functions. For details we refer to inspect the following help
pages: |
The function regFit
was created to provide a selection of
regression models working together with Rmetrics' "timeSeries"
objects and providing a common S4 object as the returned value. These
models include linear modeling, robust linear modeling, generalized
linear modeling, generalized additive modelling, projection pursuit
regression, neural networks, and polychotomous MARS models.
LM – Linear Modelling:
Univariate linear regression analysis is a statistical methodology
that assumes a linear relationship between some predictor variables
and a response variable. The goal is to estimate the coefficients
and to predict new data from the estimated linear relationship.
R's base function
lm(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance, although
aov
may provide a more convenient interface for these.
Rmetrics' function
regFit(formula, data, use = "lm", ...)
calls R's base function lm
but with the difference that the
data
argument, may be any rectangular object which can be
transferred by the function as.data.frame
into a data frame
with named columns, e.g. an object of class "timeSeries"
.
The function regFit
returns an S4 object of class "fREG"
whose slot @fit
is the object as returned by the function
"lm"
. In addition we have S4 methods fitted
and
residuals
which allow to retrieve the fitted values and the
residuals as objects of same class as defined by the argument
data
.
The function plot.lm
provides four plots: a plot of residuals
against fitted values, a Scale-Location plot of sqrt(| residuals |)
against fitted values, a normal QQ plot, and a plot of Cook's
distances versus row labels.[stats:lm]
LM – Robust Linear Modelling:
To fit a linear model by robust regression using an M estimator R offers the function
rlm(formula, data, weights, ..., subset, na.action,
method = c("M", "MM", "model.frame"),
wt.method = c("inv.var", "case"),
model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)
from package MASS
. Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "rlm", ...)
which allows us to use for example S4 timeSeries
objects as
input and to get the output as an S4 object with the known slots.[MASS::rlm]
GLM – Generalized Linear Models:
Generalized linear modelling extends the linear model in two directions.
(i) with a monotonic differentiable link function describing how the
expected values are related to the linear predictor, and (ii) with
response variables having a probability distribution from an exponential
family.
R's base function from package stats
comes with the function
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = glm.control(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)
Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "gam", ...)
[stats::glm]
GAM – Generalized Additive Models:
An additive model generalizes a linear model by smoothing individually
each predictor term. A generalized additive model extends the additive
model in the same spirit as the generalized linear model extends the
linear model, namely for allowing a link function and for allowing
non-normal distributions from the exponential family.[mgcv:gam]
PPR – Projection Pursuit Regression:
The basic method is given by Friedman (1984), and is essentially
the same code used by S-PLUS's ppreg
. It is observed that
this code is extremely sensitive to the compiler used. The algorithm
first adds up to max.terms
, by default ppr.nterms
,
ridge terms one at a time; it will use less if it is unable to find
a term to add that makes sufficient difference. The levels of
optimization, argument optlevel
, by default 2, differ in
how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are. Levels 2 and 3 refit
all the terms; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion. The
plot
method plots Ridge functions for the projection pursuit
regression fit.[stats:ppr]
POLYMARS – Polychotomous MARS:
The algorithm employed by polymars
is different from the
MARS(tm) algorithm of Friedman (1991), though it has many similarities.
Also the name polymars
has been used for this algorithm well
before MARS was trademarked.[polyclass:polymars]
NNET – Feedforward Neural Network Regression:
If the response in formula
is a factor, an appropriate
classification network is constructed; this has one output and
entropy fit if the number of levels is two, and a number of
outputs equal to the number of classes and a softmax output
stage for more levels. If the response is not a factor, it is
passed on unchanged to nnet.default
. A quasi-Newton
optimizer is used, written in C
. [nnet:nnet]
returns an S4 object of class "fREG"
.
The R core team for the lm
functions from R's base
package,
B.R. Ripley for the glm
functions from R's base
package,
S.N. Wood for the gam
functions from R's mgcv
package,
N.N. for the ppr
functions from R's modreg
package,
M. O' Connors for the polymars
functions from R's ?
package,
The R core team for the nnet
functions from R's nnet
package,
Diethelm Wuertz for the Rmetrics R-port.
Belsley D.A., Kuh E., Welsch R.E. (1980); Regression Diagnostics; Wiley, New York.
Dobson, A.J. (1990); An Introduction to Generalized Linear Models; Chapman and Hall, London.
Draper N.R., Smith H. (1981); Applied Regression Analysis; Wiley, New York.
Friedman, J.H. (1991); Multivariate Adaptive Regression Splines (with discussion), The Annals of Statistics 19, 1–141.
Friedman J.H., and Stuetzle W. (1981); Projection Pursuit Regression; Journal of the American Statistical Association 76, 817-823.
Friedman J.H. (1984); SMART User's Guide; Laboratory for Computational Statistics, Stanford University Technical Report No. 1.
Green, Silverman (1994); Nonparametric Regression and Generalized Linear Models; Chapman and Hall.
Gu, Wahba (1991); Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method; SIAM J. Sci. Statist. Comput. 12, 383-398.
Hastie T., Tibshirani R. (1990); Generalized Additive Models; Chapman and Hall, London.
Kooperberg Ch., Bose S., and Stone C.J. (1997); Polychotomous Regression, Journal of the American Statistical Association 92, 117–127.
McCullagh P., Nelder, J.A. (1989); Generalized Linear Models; Chapman and Hall, London.
Myers R.H. (1986); Classical and Modern Regression with Applications; Duxbury, Boston.
Rousseeuw P.J., Leroy, A. (1987); Robust Regression and Outlier Detection; Wiley, New York.
Seber G.A.F. (1977); Linear Regression Analysis; Wiley, New York.
Stone C.J., Hansen M., Kooperberg Ch., and Truong Y.K. (1997); The use of polynomial splines and their tensor products in extended linear modeling (with discussion).
Venables, W.N., Ripley, B.D. (1999); Modern Applied Statistics with S-PLUS; Springer, New York.
Wahba (1990); Spline Models of Observational Data; SIAM.
Weisberg S. (1985); Applied Linear Regression; Wiley, New York.
Wood (2000); Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties; JRSSB 62, 413-428.
Wood (2001); mgcv: GAMs and Generalized Ridge Regression for R. R News 1, 20-25.
Wood (2001); Thin Plate Regression Splines.
There exists a vast literature on regression. The references listed above are just a small sample of what is available. The book by Myers' is an introductory text book that covers discussions of much of the recent advances in regression technology. Seber's book is at a higher mathematical level and covers much of the classical theory of least squares.
## regSim - x <- regSim(model = "LM3", n = 100) # LM regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") # RLM regFit(Y ~ X1 + X2 + X3, data = x, use = "rlm") # AM regFit(Y ~ X1 + X2 + X3, data = x, use = "gam") # PPR regFit(Y ~ X1 + X2 + X3, data = x, use = "ppr") # NNET regFit(Y ~ X1 + X2 + X3, data = x, use = "nnet") # POLYMARS regFit(Y ~ X1 + X2 + X3, data = x, use = "polymars")
## regSim - x <- regSim(model = "LM3", n = 100) # LM regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") # RLM regFit(Y ~ X1 + X2 + X3, data = x, use = "rlm") # AM regFit(Y ~ X1 + X2 + X3, data = x, use = "gam") # PPR regFit(Y ~ X1 + X2 + X3, data = x, use = "ppr") # NNET regFit(Y ~ X1 + X2 + X3, data = x, use = "nnet") # POLYMARS regFit(Y ~ X1 + X2 + X3, data = x, use = "polymars")
A collection and description of functions
to test linear regression models, including
tests for higher serial correlations, for
heteroskedasticity, for autocorrelations
of disturbances, for linearity, and functional
relations.
The methods are:
"bg" |
Breusch--Godfrey test for higher order serial correlation, |
"bp" |
Breusch--Pagan test for heteroskedasticity, |
"dw" |
Durbin--Watson test for autocorrelation of disturbances, |
"gq" |
Goldfeld--Quandt test for heteroskedasticity, |
"harv" |
Harvey--Collier test for linearity, |
"hmc" |
Harrison--McCabe test for heteroskedasticity, |
"rain" |
Rainbow test for linearity, and |
"reset" |
Ramsey's RESET test for functional relation. |
There is nothing new, it's just a wrapper to the underlying test
functions from R's contributed package lmtest
. The functions
are available as "Builtin" functions. Nevertheless, the user can
still install and use the original functions from R's lmtest
package.
lmTest(formula, method = c("bg", "bp", "dw", "gq", "harv", "hmc", "rain", "reset"), data = list(), ...) bgTest(formula, order = 1, type = c("Chisq", "F"), data = list()) bpTest(formula, varformula = NULL, studentize = TRUE, data = list()) dwTest(formula, alternative = c("greater", "two.sided", "less"), iterations = 15, exact = NULL, tol = 1e-10, data = list()) gqTest(formula, point=0.5, order.by = NULL, data = list()) harvTest(formula, order.by = NULL, data = list()) hmcTest(formula, point = 0.5, order.by = NULL, simulate.p = TRUE, nsim = 1000, plot = FALSE, data = list()) rainTest(formula, fraction = 0.5, order.by = NULL, center = NULL, data = list()) resetTest(formula, power = 2:3, type = c("fitted", "regressor", "princomp"), data = list())
lmTest(formula, method = c("bg", "bp", "dw", "gq", "harv", "hmc", "rain", "reset"), data = list(), ...) bgTest(formula, order = 1, type = c("Chisq", "F"), data = list()) bpTest(formula, varformula = NULL, studentize = TRUE, data = list()) dwTest(formula, alternative = c("greater", "two.sided", "less"), iterations = 15, exact = NULL, tol = 1e-10, data = list()) gqTest(formula, point=0.5, order.by = NULL, data = list()) harvTest(formula, order.by = NULL, data = list()) hmcTest(formula, point = 0.5, order.by = NULL, simulate.p = TRUE, nsim = 1000, plot = FALSE, data = list()) rainTest(formula, fraction = 0.5, order.by = NULL, center = NULL, data = list()) resetTest(formula, power = 2:3, type = c("fitted", "regressor", "princomp"), data = list())
alternative |
[dwTest] - |
center |
[rainTest] - |
data |
an optional data frame containing the variables in the model.
By default the variables are taken from the environment which
|
exact |
[dwTest] - |
formula |
a symbolic description for the linear model to be tested. |
fraction |
[rainTest] - |
iterations |
[dwTest] - |
method |
the test method which should be applied. |
nsim |
[hmcTest] - |
order |
[bgTest] - |
order.by |
[gqTest][harvTest] - |
plot |
[hmcTest] - |
point |
[gqTest][hmcTest] - |
power |
[resetTest] - |
simulate.p |
[hmcTest] - |
studentize |
[bpTest] - |
tol |
[dwTest] - |
type |
[bgTest] - |
varformula |
[bpTest] - |
... |
[regTest] - |
bg – Breusch Godfrey Test:
Under the test statistic is asymptotically Chi-squared
with degrees of freedom as given in
parameter
.
If type
is set to "F"
the function returns
the exact F statistic which, under , follows an
distribution with degrees of freedom as given in
parameter
.
The starting values for the lagged residuals in the supplementary
regression are chosen to be 0.[lmtest:bgtest]
bp – Breusch Pagan Test:
The Breusch–Pagan test fits a linear regression model to the
residuals of a linear regression model (by default the same
explanatory variables are taken as in the main regression
model) and rejects if too much of the variance
is explained by the additional explanatory variables.
Under the test statistic of the Breusch-Pagan test
follows a chi-squared distribution with
parameter
(the number of regressors without the constant in the model)
degrees of freedom.[lmtest:bptest]
dw – Durbin Watson Test:
The Durbin–Watson test has the null hypothesis that the autocorrelation
of the disturbances is 0; it can be tested against the alternative
that it is greater than, not equal to, or less than 0 respectively.
This can be specified by the alternative
argument.
The null distribution of the Durbin-Watson test statistic is a linear
combination of chi-squared distributions. The p value is computed using a
Fortran version of the Applied Statistics Algorithm AS 153 by Farebrother
(1980, 1984). This algorithm is called "pan" or "gradsol". For large sample
sizes the algorithm might fail to compute the p value; in that case a
warning is printed and an approximate p value will be given; this p
value is computed using a normal approximation with mean and variance
of the Durbin-Watson test statistic.[lmtest:dwtest]
gq – Goldfeld Quandt Test:
The Goldfeld–Quandt test compares the variances of two submodels
divided by a specified breakpoint and rejects if the variances differ.
Under the test statistic of the Goldfeld-Quandt test
follows an F distribution with the degrees of freedom as given in
parameter
.[lmtest:gqtest]
harv - Harvey Collier Test:
The Harvey-Collier test performs a t-test (with parameter
degrees of freedom) on the recursive residuals. If the true relationship
is not linear but convex or concave the mean of the recursive residuals
should differ from 0 significantly.[lmtest:harvtest]
hmc – Harrison McCabe Test:
The Harrison–McCabe test statistic is the fraction of the residual
sum of squares that relates to the fraction of the data before the
breakpoint. Under the test statistic should be close to
the size of this fraction, e.g. in the default case close to 0.5.
The null hypothesis is reject if the statistic is too small.
[lmtest:hmctest]
rain – Rainbow Test:
The basic idea of the Rainbow test is that even if the true
relationship is non-linear, a good linear fit can be achieved
on a subsample in the "middle" of the data. The null hypothesis
is rejected whenever the overall fit is significantly inferior
to the fit of the subsample. The test statistic under
follows an F distribution with
parameter
degrees of
freedom.[lmtest:raintest]
reset – Ramsey's RESET Test
RESET test is popular means of diagnostic for correctness of
functional form. The basic assumption is that under the alternative,
the model can be written by the regression
.
Z
is generated by taking powers either of the fitted response,
the regressor variables or the first principal component of X
.
A standard F-Test is then applied to determine whether these additional
variables have significant influence. The test statistic under
follows an F distribution with
parameter
degrees
of freedom.[lmtest:reset]
A list with class "htest"
containing the following components:
statistic |
the value of the test statistic. |
parameter |
the lag order. |
p.value |
the p-value of the test. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data. |
alternative |
a character string describing the alternative hypothesis. |
The underlying lmtest
package comes wit a lot of helpful
examples. We highly recommend to install the lmtest
package
and to study the examples given therein.
Achim Zeileis and Torsten Hothorn for the lmtest
package,
Diethelm Wuertz for the Rmetrics R-port.
Breusch, T.S. (1979); Testing for Autocorrelation in Dynamic Linear Models, Australian Economic Papers 17, 334–355.
Breusch T.S. and Pagan A.R. (1979); A Simple Test for Heteroscedasticity and Random Coefficient Variation, Econometrica 47, 1287–1294
Durbin J. and Watson G.S. (1950); Testing for Serial Correlation in Least Squares Regression I, Biometrika 37, 409–428.
Durbin J. and Watson G.S. (1951); Testing for Serial Correlation in Least Squares Regression II, Biometrika 38, 159–178.
Durbin J. and Watson G.S. (1971); Testing for Serial Correlation in Least Squares Regression III, Biometrika 58, 1–19.
Farebrother R.W. (1980); Pan's Procedure for the Tail Probabilities of the Durbin-Watson Statistic, Applied Statistics 29, 224–227.
Farebrother R.W. (1984);
The Distribution of a Linear Combination of
Random Variables,
Applied Statistics 33, 366–369.
Godfrey, L.G. (1978); Testing Against General Autoregressive and Moving Average Error Models when the Regressors Include Lagged Dependent Variables, Econometrica 46, 1293–1302.
Goldfeld S.M. and Quandt R.E. (1965); Some Tests for Homoskedasticity Journal of the American Statistical Association 60, 539–547.
Harrison M.J. and McCabe B.P.M. (1979); A Test for Heteroscedasticity based on Ordinary Least Squares Residuals Journal of the American Statistical Association 74, 494–499.
Harvey A. and Collier P. (1977); Testing for Functional Misspecification in Regression Analysis, Journal of Econometrics 6, 103–119.
Johnston, J. (1984); Econometric Methods, Third Edition, McGraw Hill Inc.
Kraemer W. and Sonnberger H. (1986); The Linear Regression Model under Test, Heidelberg: Physica.
Racine J. and Hyndman R. (2002); Using R To Teach Econometrics, Journal of Applied Econometrics 17, 175–189.
Ramsey J.B. (1969); Tests for Specification Error in Classical Linear Least Squares Regression Analysis, Journal of the Royal Statistical Society, Series B 31, 350–371.
Utts J.M. (1982); The Rainbow Test for Lack of Fit in Regression, Communications in Statistics - Theory and Methods 11, 1801–1815.
## bg | dw - # Generate a Stationary and an AR(1) Series: x = rep(c(1, -1), 50) y1 = 1 + x + rnorm(100) # Perform Breusch-Godfrey Test for 1st order serial correlation: lmTest(y1 ~ x, "bg") # ... or for fourth order serial correlation: lmTest(y1 ~ x, "bg", order = 4) # Compare with Durbin-Watson Test Results: lmTest(y1 ~ x, "dw") y2 = filter(y1, 0.5, method = "recursive") lmTest(y2 ~ x, "bg") ## bp - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate heteroskedastic and homoskedastic Disturbances err1 = rnorm(100, sd = rep(c(1, 2), 50)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Breusch-Pagan Test bp = lmTest(y1 ~ x, "bp") bp # Calculate Critical Value for 0.05 Level qchisq(0.95, bp$parameter) lmTest(y2 ~ x, "bp") ## dw - # Generate two AR(1) Error Terms # with parameter rho = 0 (white noise) # and rho = 0.9 respectively err1 = rnorm(100) # Generate Regressor and Dependent Variable x = rep(c(-1,1), 50) y1 = 1 + x + err1 # Perform Durbin-Watson Test: lmTest(y1 ~ x, "dw") err2 = filter(err1, 0.9, method = "recursive") y2 = 1 + x + err2 lmTest(y2 ~ x, "dw") ## gq - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate Heteroskedastic and Homoskedastic Disturbances: err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Goldfeld-Quandt Test: lmTest(y1 ~ x, "gq") lmTest(y2 ~ x, "gq") ## harv - # Generate a Regressor and Dependent Variable: x = 1:50 y1 = 1 + x + rnorm(50) y2 = y1 + 0.3*x^2 # Perform Harvey-Collier Test: harv = lmTest(y1 ~ x, "harv") harv # Calculate Critical Value vor 0.05 level: qt(0.95, harv$parameter) lmTest(y2 ~ x, "harv") ## hmc - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate Heteroskedastic and Homoskedastic Disturbances: err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Harrison-McCabe Test: lmTest(y1 ~ x, "hmc") lmTest(y2 ~ x, "hmc") ## rain - # Generate Series: x = c(1:30) y = x^2 + rnorm(30, 0, 2) # Perform rainbow Test rain = lmTest(y ~ x, "rain") rain # Compute Critical Value: qf(0.95, rain$parameter[1], rain$parameter[2]) ## reset - # Generate Series: x = c(1:30) y1 = 1 + x + x^2 + rnorm(30) y2 = 1 + x + rnorm(30) # Perform RESET Test: lmTest(y1 ~ x , "reset", power = 2, type = "regressor") lmTest(y2 ~ x , "reset", power = 2, type = "regressor")
## bg | dw - # Generate a Stationary and an AR(1) Series: x = rep(c(1, -1), 50) y1 = 1 + x + rnorm(100) # Perform Breusch-Godfrey Test for 1st order serial correlation: lmTest(y1 ~ x, "bg") # ... or for fourth order serial correlation: lmTest(y1 ~ x, "bg", order = 4) # Compare with Durbin-Watson Test Results: lmTest(y1 ~ x, "dw") y2 = filter(y1, 0.5, method = "recursive") lmTest(y2 ~ x, "bg") ## bp - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate heteroskedastic and homoskedastic Disturbances err1 = rnorm(100, sd = rep(c(1, 2), 50)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Breusch-Pagan Test bp = lmTest(y1 ~ x, "bp") bp # Calculate Critical Value for 0.05 Level qchisq(0.95, bp$parameter) lmTest(y2 ~ x, "bp") ## dw - # Generate two AR(1) Error Terms # with parameter rho = 0 (white noise) # and rho = 0.9 respectively err1 = rnorm(100) # Generate Regressor and Dependent Variable x = rep(c(-1,1), 50) y1 = 1 + x + err1 # Perform Durbin-Watson Test: lmTest(y1 ~ x, "dw") err2 = filter(err1, 0.9, method = "recursive") y2 = 1 + x + err2 lmTest(y2 ~ x, "dw") ## gq - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate Heteroskedastic and Homoskedastic Disturbances: err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Goldfeld-Quandt Test: lmTest(y1 ~ x, "gq") lmTest(y2 ~ x, "gq") ## harv - # Generate a Regressor and Dependent Variable: x = 1:50 y1 = 1 + x + rnorm(50) y2 = y1 + 0.3*x^2 # Perform Harvey-Collier Test: harv = lmTest(y1 ~ x, "harv") harv # Calculate Critical Value vor 0.05 level: qt(0.95, harv$parameter) lmTest(y2 ~ x, "harv") ## hmc - # Generate a Regressor: x = rep(c(-1, 1), 50) # Generate Heteroskedastic and Homoskedastic Disturbances: err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2)) err2 = rnorm(100) # Generate a Linear Relationship: y1 = 1 + x + err1 y2 = 1 + x + err2 # Perform Harrison-McCabe Test: lmTest(y1 ~ x, "hmc") lmTest(y2 ~ x, "hmc") ## rain - # Generate Series: x = c(1:30) y = x^2 + rnorm(30, 0, 2) # Perform rainbow Test rain = lmTest(y ~ x, "rain") rain # Compute Critical Value: qf(0.95, rain$parameter[1], rain$parameter[2]) ## reset - # Generate Series: x = c(1:30) y1 = 1 + x + x^2 + rnorm(30) y2 = 1 + x + rnorm(30) # Perform RESET Test: lmTest(y1 ~ x , "reset", power = 2, type = "regressor") lmTest(y2 ~ x , "reset", power = 2, type = "regressor")
Simulates regression models.
regSim(model = "LM3", n = 100, ...) LM3(n = 100, seed = 4711) LOGIT3(n = 100, seed = 4711) GAM3(n = 100, seed = 4711)
regSim(model = "LM3", n = 100, ...) LM3(n = 100, seed = 4711) LOGIT3(n = 100, seed = 4711) GAM3(n = 100, seed = 4711)
model |
a character string defining the function name from which the regression model will be simulated. |
n |
an integer value setting the length, i.e. the number of records
of the output series, an integer value. By default |
seed |
an integer value, the recommended way to specify seeds for random number generation. |
... |
arguments to be passed to the underlying function specified by
the |
The function regSim
allows to simulate from various regression
models defined by one of the three example functions LM3
,
LOGIT3
, GAM3
or by a user specified function.
The examples are defined in the following way:
# LM3:
> y = 0.75 * x1 + 0.25 * x2 - 0.5 * x3 + 0.1 * eps
# LOGIT3:
> y = 1 / (1 + exp(- 0.75 * x1 + 0.25 * x2 - 0.5 * x3 + eps))
# GAM3:
> y = scale(scale(sin(2 * pi * x1)) + scale(exp(x2)) + scale(x3))
> y = y + 0.1 * rnorm(n, sd = sd(y))
"LM3"
models a liner regression model, "LOGIT3"
a generalized
linear regression model expressed by a logit model, and "GAM"
an
additive model. x1
, x2
, x3
, and eps
are random
normal deviates of length n
.
The model
function should return an rectangular series defined
as an object of class data.frame
, timeSeries
or mts
which can be accepted from the parameter estimation
functions regFit
and gregFit
.
The function garchSim
returns an object of the same class
as returned by the underlying function match.fun(model)
.
These may be objects of class data.frame
, timeSeries
or
mts
.
This function is still under development. For the future we plan,
that the function regSim
will be able to generate general
regression models.
Diethelm Wuertz for the Rmetrics R-port.
## LM2 - # Data for a user defined linear regression model: LM2 = function(n){ x = rnorm(n) y = rnorm(n) eps = 0.1 * rnorm(n) z = 0.5 + 0.75 * x + 0.25 * y + eps data.frame(Z = z, X = x, Y = y) } for (FUN in c("LM2", "LM3")) { cat(FUN, ":\n", sep = "") print(regSim(model = FUN, n = 10)) }
## LM2 - # Data for a user defined linear regression model: LM2 = function(n){ x = rnorm(n) y = rnorm(n) eps = 0.1 * rnorm(n) z = 0.5 + 0.75 * x + 0.25 * y + eps data.frame(Z = z, X = x, Y = y) } for (FUN in c("LM2", "LM3")) { cat(FUN, ":\n", sep = "") print(regSim(model = FUN, n = 10)) }
Extracts residuals from a fitted regression object.
## S4 method for signature 'fREG' residuals(object)
## S4 method for signature 'fREG' residuals(object)
object |
an object of class |
Generic function
Residuals
residuals
is a generic function which extracts residual values
from objects returned by modeling functions.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## residuals - residuals(fit)
## regSim - x = regSim(model = "LM3", n = 50) ## regFit - fit = regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## residuals - residuals(fit)
Show methods for regression modelling.
The show or print method returns the same information for all
supported regression models through the use
argument in
the function regFit
.
These are the 'title', the 'formula', the 'family' and the 'model parameters'.
Generic function.
Print method for objects of class 'fREG'.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## print - print(fit)
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## print - print(fit)
Summary methods for regressing modelling.
Generic function
Summary method for objects of class 'fREG'.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## summary summary(fit)
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## summary summary(fit)
Plots results obtained from a fitted regression model.
## S3 method for class 'fREG' termPlot(model, ...)
## S3 method for class 'fREG' termPlot(model, ...)
model |
an object of class 'fREG'. |
... |
additional arguments to be passed to the underlying functions. |
Generic function.
Term plot function.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
Plots results obtained from a fitted regression model.
## S4 method for signature 'fREG' terms(x, ...)
## S4 method for signature 'fREG' terms(x, ...)
x |
an object of class 'fREG'. |
... |
additional arguments to be passed to the underlying functions. |
Generic function.
Terms extractor function.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
Extracts vcov from a fitted regression model.
Generic function
Extractor function for vcov.
vcov
is a generic function which extracts fitted values
from objects returned by modeling functions, here the regFit
and gregFit
parameter estimation functions.
Diethelm Wuertz for the Rmetrics R-port.
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## vcov - vcov(fit)
## regSim - x <- regSim(model = "LM3", n = 50) ## regFit - fit <- regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") ## vcov - vcov(fit)