Title: | Censored Regression with Conditional Heteroscedasticity |
---|---|
Description: | Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals. Infrastructure for working with censored or truncated normal, logistic, and Student-t distributions, i.e., d/p/q/r functions and distributions3 objects. |
Authors: | Achim Zeileis [aut, cre] , Jakob W. Messner [aut] , Reto Stauffer [aut] , Ioannis Kosmidis [ctb] , Georg J. Mayr [ctb] |
Maintainer: | Achim Zeileis <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.2-1 |
Built: | 2024-11-20 05:51:33 UTC |
Source: | https://github.com/r-forge/topmodels |
Class and methods for left-, right-, and interval-censored logistic distributions using the workflow from the distributions3 package.
CensoredLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
CensoredLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
location |
numeric. The location parameter of the underlying uncensored
logistic distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying uncensored logistic distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
The constructor function CensoredLogistic
sets up a distribution
object, representing the censored logistic probability distribution by the
corresponding parameters: the latent mean location
= and
latent standard deviation
scale
= (i.e., the parameters
of the underlying uncensored logistic variable), the
left
censoring
point (with -Inf
corresponding to uncensored), and the
right
censoring point (with Inf
corresponding to uncensored).
The censored logistic distribution has probability density function (PDF) :
|
if
|
|
if
|
|
otherwise |
where and
are the cumulative distribution function
and probability density function of the standard logistic distribution,
respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored logistic distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredLogistic
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored logistic distributions in the crch
package, see dclogis
, and the crps_clogis
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
A CensoredLogistic
distribution object.
dclogis
, Logistic
, TruncatedLogistic
,
CensoredNormal
, CensoredStudentsT
## package and random seed library("distributions3") set.seed(6020) ## three censored logistic distributions: ## - uncensored standard logistic ## - left-censored at zero with latent location = 1 and scale = 1 ## - interval-censored in [0, 5] with latent location = 2 and scale = 1 X <- CensoredLogistic( location = c( 0, 1, 2), scale = c( 1, 1, 1), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the censored distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three censored logistic distributions: ## - uncensored standard logistic ## - left-censored at zero with latent location = 1 and scale = 1 ## - interval-censored in [0, 5] with latent location = 2 and scale = 1 X <- CensoredLogistic( location = c( 0, 1, 2), scale = c( 1, 1, 1), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the censored distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Class and methods for left-, right-, and interval-censored normal distributions using the workflow from the distributions3 package.
CensoredNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
CensoredNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
mu |
numeric. The location parameter of the underlying uncensored
normal distribution, typically written |
sigma |
numeric. The scale parameter (standard deviation) of
the underlying uncensored normal distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
The constructor function CensoredNormal
sets up a distribution
object, representing the censored normal probability distribution by the
corresponding parameters: the latent mean mu
= and
latent standard deviation
sigma
= (i.e., the parameters
of the underlying uncensored normal variable), the
left
censoring
point (with -Inf
corresponding to uncensored), and the
right
censoring point (with Inf
corresponding to uncensored).
The censored normal distribution has probability density function (PDF) :
|
if
|
|
if
|
|
otherwise |
where and
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored normal distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredNormal
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored normal distributions in the crch
package, see dcnorm
, and the crps_cnorm
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
A CensoredNormal
distribution object.
dcnorm
, Normal
, TruncatedNormal
,
CensoredLogistic
, CensoredStudentsT
## package and random seed library("distributions3") set.seed(6020) ## three censored normal distributions: ## - uncensored standard normal ## - left-censored at zero (Tobit) with latent mu = 1 and sigma = 1 ## - interval-censored in [0, 5] with latent mu = 1 and sigma = 2 X <- CensoredNormal( mu = c( 0, 1, 1), sigma = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean and variance of the censored distribution mean(X) variance(X) ## higher moments (skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three censored normal distributions: ## - uncensored standard normal ## - left-censored at zero (Tobit) with latent mu = 1 and sigma = 1 ## - interval-censored in [0, 5] with latent mu = 1 and sigma = 2 X <- CensoredNormal( mu = c( 0, 1, 1), sigma = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean and variance of the censored distribution mean(X) variance(X) ## higher moments (skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Class and methods for left-, right-, and interval-censored t distributions using the workflow from the distributions3 package.
CensoredStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
CensoredStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
df |
numeric. The degrees of freedom of the underlying uncensored
t distribution. Can be any positive number, with |
location |
numeric. The location parameter of the underlying uncensored
t distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying uncensored t distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
The constructor function CensoredStudentsT
sets up a distribution
object, representing the censored t probability distribution by the
corresponding parameters: the degrees of freedom df
, the latent mean
location
= and latent scale parameter
scale
=
(i.e., the parameters of the underlying uncensored t variable),
the
left
censoring point (with -Inf
corresponding to uncensored),
and the right
censoring point (with Inf
corresponding to uncensored).
The censored t distribution has probability density function (PDF) :
|
if
|
|
if
|
|
otherwise |
where and
are the cumulative distribution function
and probability density function of the standard t distribution with
df
degrees of freedom, respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored t distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredStudentsT
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored t distributions in the crch
package, see dct
, and the crps_ct
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
A CensoredStudentsT
distribution object.
dct
, StudentsT
, TruncatedStudentsT
,
CensoredNormal
, CensoredLogistic
## package and random seed library("distributions3") set.seed(6020) ## three censored t distributions: ## - uncensored standard t with 5 degrees of freedom ## - left-censored at zero with 5 df, latent location = 1 and scale = 1 ## - interval-censored in [0, 5] with 5 df, latent location = 2 and scale = 2 X <- CensoredStudentsT( df = c( 5, 5, 5), location = c( 0, 1, 2), scale = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the censored distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three censored t distributions: ## - uncensored standard t with 5 degrees of freedom ## - left-censored at zero with 5 df, latent location = 1 and scale = 1 ## - interval-censored in [0, 5] with 5 df, latent location = 2 and scale = 2 X <- CensoredStudentsT( df = c( 5, 5, 5), location = c( 0, 1, 2), scale = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the censored distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "uncensored") hist(x[2, ], main = "left-censored at zero") hist(x[3, ], main = "interval-censored in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Density, distribution function, quantile function, and random generation for the left and/or right censored logistic distribution.
dclogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE) pclogis(q, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qclogis(p, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rclogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
dclogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE) pclogis(q, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qclogis(p, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rclogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored logistic distribution has density :
|
if
|
|
if
|
|
if
|
where and
are the cumulative distribution function
and probability density function of the standard logistic distribution
respectively,
is the location of the distribution, and
the scale.
dclogis
gives the density, pclogis
gives the distribution
function, qclogis
gives the quantile function, and rclogis
generates random deviates.
Density, distribution function, quantile function, and random generation for the left and/or right censored normal distribution.
dcnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE) pcnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qcnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rcnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
dcnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE) pcnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qcnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rcnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
mean |
vector of means. |
sd |
vector of standard deviations. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If mean
or sd
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored normal distribution has density :
|
if
|
|
if
|
|
if
|
where and
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively,
is the mean of the distribution, and
the standard deviation.
dcnorm
gives the density, pcnorm
gives the distribution
function, qcnorm
gives the quantile function, and rcnorm
generates random deviates.
Methods for extracting information from fitted crch
objects.
## S3 method for class 'crch' coef(object, model = c("full", "location", "scale", "df"), ...) ## S3 method for class 'crch' vcov(object, model = c("full", "location", "scale", "df"), ...) ## S3 method for class 'crch' terms(x, model = c("location", "scale", "full"), ...) ## S3 method for class 'crch' fitted(object, type = c("location", "scale"), ...)
## S3 method for class 'crch' coef(object, model = c("full", "location", "scale", "df"), ...) ## S3 method for class 'crch' vcov(object, model = c("full", "location", "scale", "df"), ...) ## S3 method for class 'crch' terms(x, model = c("location", "scale", "full"), ...) ## S3 method for class 'crch' fitted(object, type = c("location", "scale"), ...)
object , x
|
an object of class |
model |
model for which coefficients shall be returned. |
type |
type of fitted values. |
... |
further arguments passed to or from other methods. |
In addition to the methods above, a set of standard extractor functions for
"crch"
objects is available, including methods to the generic
functions print
, summary
,
logLik
, and residuals
.
Methods for extracting information from fitted crch.boost
objects.
## S3 method for class 'crch.boost' coef(object, model = c("full", "location", "scale", "df"), mstop = NULL, zero.coefficients = FALSE, standardize = FALSE, ...) ## S3 method for class 'crch.boost' print(x, digits = max(3, getOption("digits") - 3), mstop = NULL, zero.coefficients = FALSE, ...) ## S3 method for class 'crch.boost' summary(object, mstop = NULL, zero.coefficients = FALSE, ...) ## S3 method for class 'crch.boost' logLik(object, mstop = NULL, ...)
## S3 method for class 'crch.boost' coef(object, model = c("full", "location", "scale", "df"), mstop = NULL, zero.coefficients = FALSE, standardize = FALSE, ...) ## S3 method for class 'crch.boost' print(x, digits = max(3, getOption("digits") - 3), mstop = NULL, zero.coefficients = FALSE, ...) ## S3 method for class 'crch.boost' summary(object, mstop = NULL, zero.coefficients = FALSE, ...) ## S3 method for class 'crch.boost' logLik(object, mstop = NULL, ...)
object , x
|
an object of class |
model |
model for which coefficients shall be returned. |
mstop |
stopping iteration for which coefficients shall be returned.
Can be either a character ( |
zero.coefficients |
logical whether zero coefficients are returned. |
standardize |
logical whether coefficients shall be standardized. |
digits |
the number of significant digits to use when printing. |
... |
further arguments passed to or from other methods. |
In addition to the methods above, the "crch"
methods
terms
, model.frame
, model.matrix
,
residuals
, and fitted
can be used also for
"crch.boost"
objects .
Methods for extracting information from fitted hxlr
objects.
## S3 method for class 'hxlr' coef(object, model = c("full", "intercept", "location", "scale"), type = c("CLM", "latent"), ...) ## S3 method for class 'hxlr' vcov(object, model = c("full", "intercept", "location", "scale"), type = c("CLM", "latent"), ...) ## S3 method for class 'hxlr' terms(x, model = c("full", "location", "scale"), ...)
## S3 method for class 'hxlr' coef(object, model = c("full", "intercept", "location", "scale"), type = c("CLM", "latent"), ...) ## S3 method for class 'hxlr' vcov(object, model = c("full", "intercept", "location", "scale"), type = c("CLM", "latent"), ...) ## S3 method for class 'hxlr' terms(x, model = c("full", "location", "scale"), ...)
object , x
|
an object of class |
model |
model for which coefficients shall be returned. |
type |
type of coefficients. Default are CLM type coefficients. For
type |
... |
further arguments passed to or from other methods. |
In addition to the methods above, a set of standard extractor functions for
"hxlr"
objects is available, including methods to the generic
functions print
, summary
, and
logLik
.
Fitting censored (tobit) or truncated regression models with conditional heteroscedasticy.
crch(formula, data, subset, na.action, weights, offset, link.scale = c("log", "identity", "quadratic"), dist = c("gaussian", "logistic", "student"), df = NULL, left = -Inf, right = Inf, truncated = FALSE, type = c("ml", "crps"), control = crch.control(...), model = TRUE, x = FALSE, y = FALSE, ...) trch(formula, data, subset, na.action, weights, offset, link.scale = c("log", "identity", "quadratic"), dist = c("gaussian", "logistic", "student"), df = NULL, left = -Inf, right = Inf, truncated = TRUE, type = c("ml", "crps"), control = crch.control(...), model = TRUE, x = FALSE, y = FALSE, ...) crch.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian", df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, control = crch.control())
crch(formula, data, subset, na.action, weights, offset, link.scale = c("log", "identity", "quadratic"), dist = c("gaussian", "logistic", "student"), df = NULL, left = -Inf, right = Inf, truncated = FALSE, type = c("ml", "crps"), control = crch.control(...), model = TRUE, x = FALSE, y = FALSE, ...) trch(formula, data, subset, na.action, weights, offset, link.scale = c("log", "identity", "quadratic"), dist = c("gaussian", "logistic", "student"), df = NULL, left = -Inf, right = Inf, truncated = TRUE, type = c("ml", "crps"), control = crch.control(...), model = TRUE, x = FALSE, y = FALSE, ...) crch.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian", df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, control = crch.control())
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
subset |
an optional vector specifying a subset of observations to be used for fitting. |
na.action |
a function which indicates what should happen when the data
contain |
weights |
optional case weights in fitting. |
offset |
optional numeric vector with a priori known component to
be included in the linear predictor for the location. For |
link.scale |
character specification of the link function in
the scale model. Currently, |
dist |
assumed distribution for the dependent variable |
df |
optional degrees of freedom for |
left |
left limit for the censored dependent variable |
right |
right limit for the censored dependent variable |
truncated |
logical. If |
type |
loss function to be optimized. Can be either |
control |
a list of control parameters passed to |
model |
logical. If |
x , y
|
for |
z |
a design matrix with regressors for the scale. |
... |
arguments to be used to form the default |
crch
fits censored (tobit) or truncated regression models with conditional
heteroscedasticy with maximum likelihood estimation. Student-t, Gaussian, and
logistic distributions can be fitted to left- and/or right censored or
truncated responses. Different regressors can be used to model the location
and the scale of this distribution. If control=crch.boost()
optimization is performed by boosting.
trch
is a wrapper function for crch
with default
truncated = TRUE
.
crch.fit
is the lower level function where the actual
fitting takes place.
An object of class "crch"
or "crch.boost"
, i.e., a list with the
following elements.
coefficients |
list of coefficients for location, scale, and df. Scale and df coefficients are in log-scale. |
df |
if |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
list of fitted location and scale parameters. |
dist |
assumed distribution for the dependent variable |
cens |
list of censoring points. |
optim |
output from optimization from |
method |
optimization method used for |
type |
used loss function (maximum likelihood or minimum CRPS). |
control |
list of control parameters passed to |
start |
starting values of coefficients used in the optimization. |
weights |
case weights used for fitting. |
offset |
list of offsets for location and scale. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
vcov |
covariance matrix. |
link |
a list with element |
truncated |
logical indicating wheter a truncated model has been fitted. |
converged |
logical variable whether optimization has converged or not. |
iterations |
number of iterations in optimization. |
call |
function call. |
formula |
the formula supplied. |
terms |
the |
levels |
list of levels of the factors used in fitting for location and scale respectively. |
contrasts |
(where relevant) the contrasts used. |
y |
if requested, the response used. |
x |
if requested, the model matrix used. |
model |
if requested, the model frame used. |
stepsize , mstop , mstopopt , standardize
|
return values of boosting
optimization. See |
Messner JW, Mayr GJ, Zeileis A (2016). Heteroscedastic Censored and Truncated Regression with crch. The R Journal, 3(1), 173–181. doi:10.32614/RJ-2016-012
Messner JW, Zeileis A, Broecker J, Mayr GJ (2014). Probabilistic Wind Power Forecasts with an Inverse Power Curve Transformation and Censored Regression. Wind Energy, 17(11), 1753–1766. doi:10.1002/we.1666
predict.crch
, crch.control
, crch.boost
data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## fit linear regression model with Gaussian distribution CRCH <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian") ## same as lm? all.equal( coef(lm(sqrt(rain) ~ sqrtensmean, data = RainIbk)), head(coef(CRCH), -1), tol = 1e-6) ## print CRCH ## summary summary(CRCH) ## left censored regression model with censoring point 0: CRCH2 <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian", left = 0) ## left censored regression model with censoring point 0 and ## conditional heteroscedasticy: CRCH3 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, dist = "gaussian", left = 0) ## left censored regression model with censoring point 0 and ## conditional heteroscedasticy with logistic distribution: CRCH4 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, dist = "logistic", left = 0) ## compare AIC AIC(CRCH, CRCH2, CRCH3, CRCH4)
data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## fit linear regression model with Gaussian distribution CRCH <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian") ## same as lm? all.equal( coef(lm(sqrt(rain) ~ sqrtensmean, data = RainIbk)), head(coef(CRCH), -1), tol = 1e-6) ## print CRCH ## summary summary(CRCH) ## left censored regression model with censoring point 0: CRCH2 <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian", left = 0) ## left censored regression model with censoring point 0 and ## conditional heteroscedasticy: CRCH3 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, dist = "gaussian", left = 0) ## left censored regression model with censoring point 0 and ## conditional heteroscedasticy with logistic distribution: CRCH4 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, dist = "logistic", left = 0) ## compare AIC AIC(CRCH, CRCH2, CRCH3, CRCH4)
Auxiliary functions to fit crch
models via boosting
crch.boost(maxit = 100, nu = 0.1, start = NULL, dot = "separate", mstop = c("max", "aic", "bic", "cv"), nfolds = 10, foldid = NULL, maxvar = NULL) crch.boost.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian", df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, control = crch.boost())
crch.boost(maxit = 100, nu = 0.1, start = NULL, dot = "separate", mstop = c("max", "aic", "bic", "cv"), nfolds = 10, foldid = NULL, maxvar = NULL) crch.boost.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian", df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, control = crch.boost())
maxit |
the maximum number of boosting iterations. |
nu |
boosting step size. Default is 0.1. |
start |
a previously boosted but not converged |
dot |
character specifying how to process formula parts with a dot
( |
mstop |
method to find optimum stopping iteration. Default is |
nfolds |
if |
foldid |
if |
maxvar |
Positive |
x , z , y , left , right , truncated , dist , df , link.scale , type , weights , offset , control
|
see |
crch.boost
extends crch
to fit censored (tobit) or
truncated regression models with conditional heteroscedasticy by
boosting. If crch.boost()
is supplied as control
in
crch
then crch.boost.fit
is used as lower level fitting
function. Note that crch.control()
with method=boosting
is equivalent to crch.boost()
. Thus, boosting can more
conveniently be called with crch(..., method = "boosting")
.
For crch.boost
: A list with components named as the arguments.
For crch.boost.fit
: An object of class "crch.boost"
,
i.e., a list with the following elements.
coefficients |
list of coefficients for location and scale. Scale
coefficients are in log-scale. Coefficients are of optimum stopping
stopping iteration specified by |
df |
if |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
list of fitted location and scale parameters at
optimum stopping iteration specified by |
dist |
assumed distribution for the dependent variable |
cens |
list of censoring points. |
control |
list of control parameters. |
weights |
case weights used for fitting. |
offset |
list of offsets for location and scale. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
link |
a list with element |
truncated |
logical indicating wheter a truncated model has been fitted. |
iterations |
number of boosting iterations. |
stepsize |
boosting stepsize |
mstop |
criterion used to find optimum stopping iteration. |
mstopopt |
optimum stopping iterations for different criteria. |
standardize |
list of center and scale values used to standardize response and regressors. |
Messner JW, Mayr GJ, Zeileis A (2017). Non-Homogeneous Boosting for Predictor Selection in Ensemble Post-Processing. Monthly Weather Review, 145(1), 137–147. doi:10.1175/MWR-D-16-0088.1
# generate data suppressWarnings(RNGversion("3.5.0")) set.seed(5) x <- matrix(rnorm(1000*20),1000,20) y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3])) y <- pmax(0, y) data <- data.frame(cbind(y, x)) # fit model with maximum likelihood CRCH <- crch(y ~ .|., data = data, dist = "gaussian", left = 0) # fit model with boosting boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0, control = crch.boost(mstop = "aic")) # more conveniently, the same model can also be fit through # boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0, # method = "boosting", mstop = "aic") # AIC comparison AIC(CRCH, boost) # summary summary(boost) # plot plot(boost)
# generate data suppressWarnings(RNGversion("3.5.0")) set.seed(5) x <- matrix(rnorm(1000*20),1000,20) y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3])) y <- pmax(0, y) data <- data.frame(cbind(y, x)) # fit model with maximum likelihood CRCH <- crch(y ~ .|., data = data, dist = "gaussian", left = 0) # fit model with boosting boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0, control = crch.boost(mstop = "aic")) # more conveniently, the same model can also be fit through # boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0, # method = "boosting", mstop = "aic") # AIC comparison AIC(CRCH, boost) # summary summary(boost) # plot plot(boost)
Auxiliary function for crch
fitting. Specifies a list of values passed
to optim
.
crch.control(method = "BFGS", maxit = NULL, hessian = NULL, trace = FALSE, start = NULL, dot = "separate", lower = -Inf, upper = Inf, ...)
crch.control(method = "BFGS", maxit = NULL, hessian = NULL, trace = FALSE, start = NULL, dot = "separate", lower = -Inf, upper = Inf, ...)
method |
optimization method passed to |
maxit |
the maximum number of iterations. Default is 5000 except for
|
hessian |
logical or NULL. If TRUE the numerical Hessian matrix from the
|
trace |
non-negative integer. If positive, tracing information on the progress of the optimization is produced. |
start |
initial values for the parameters to be optimized over. |
dot |
character specifying how to process formula parts with a dot
( |
lower , upper
|
bounds on the variables for the |
... |
additional parameters passed to |
A list with components named as the arguments.
Auxilirary function which allows to do stability selection on heteroscedastic
crch
models based on crch.boost
.
crch.stabsel(formula, data, ..., nu = 0.1, q, B = 100, thr = 0.9, maxit = 2000, data_percentage = 0.5)
crch.stabsel(formula, data, ..., nu = 0.1, q, B = 100, thr = 0.9, maxit = 2000, data_percentage = 0.5)
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
... |
Additional attributes to control the |
nu |
Boosting step size (see |
q |
Positive |
B |
|
thr |
|
maxit |
Positive |
data_percentage |
Percentage of data which should be sampled in each of the
iterations. Default (and suggested) is |
crch.boost
allows to perform gradient boosting on heteroscedastic
additive models. crch.stabsel
is a wrapper around the core crch.boost
algorithm to perform stability selection (see references).
Half of the data set (data
) is sampled B
times to perform boosting
(based on crch.boost
). Rather than perform the boosting iterations
until a certain stopping criterion is reached (e.g., maximum number of iterations
maxit
) the algorithm stops as soon as q
parameters have been selected.
The number of parameters is computed across both parameters location and scale.
Intercepts are not counted.
Returns an object of class "stabsel.crch"
containing the stability
selection summary and the new formula based on the stability selection.
table |
A table object containing the parameters which have been selected and the corresponding frequency of selection. |
formula.org |
Original formula used to perform the stability selection. |
formula.new |
New formula based including the coefficients selected during stability selection. |
family |
A list object which contains the distribution-specification from
the |
parameter |
List with the parameters used to perform the stability selection
including |
Meinhausen N, Buehlmann P (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. doi:10.1111/j.1467-9868.2010.00740.x.
# generate data suppressWarnings(RNGversion("3.5.0")) set.seed(5) x <- matrix(rnorm(1000*20),1000,20) y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3])) y <- pmax(0, y) data <- data.frame(cbind(y, x)) # fit model with maximum likelihood CRCH1 <- crch(y ~ .|., data = data, dist = "gaussian", left = 0) # Perform stability selection stabsel <- crch.stabsel(y ~ .|., data = data, dist = "gaussian", left = 0, q = 8, B = 5) # Show stability selection summary print(stabsel); plot(stabsel) CRCH2 <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0 ) BOOST <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0, control = crch.boost() ) ### AIC comparison sapply( list(CRCH1,CRCH2,BOOST), logLik )
# generate data suppressWarnings(RNGversion("3.5.0")) set.seed(5) x <- matrix(rnorm(1000*20),1000,20) y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3])) y <- pmax(0, y) data <- data.frame(cbind(y, x)) # fit model with maximum likelihood CRCH1 <- crch(y ~ .|., data = data, dist = "gaussian", left = 0) # Perform stability selection stabsel <- crch.stabsel(y ~ .|., data = data, dist = "gaussian", left = 0, q = 8, B = 5) # Show stability selection summary print(stabsel); plot(stabsel) CRCH2 <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0 ) BOOST <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0, control = crch.boost() ) ### AIC comparison sapply( list(CRCH1,CRCH2,BOOST), logLik )
Density, distribution function, quantile function, and random generation
for the left and/or right censored student-t distribution with df
degrees of freedom.
dct(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE) pct(q, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qct(p, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rct(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
dct(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE) pct(q, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qct(p, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rct(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
df |
degrees of freedom (> 0, maybe non-integer). |
left |
left censoring point. |
right |
right censoring point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored student-t distribution has density :
|
if
|
|
if
|
|
if
|
where and
are the cumulative distribution function
and probability density function of the student-t distribution with
df
degrees of freedom respectively, is the location of the
distribution, and
the scale.
dct
gives the density, pct
gives the distribution
function, qct
gives the quantile function, and rct
generates random deviates.
This is a wrapper function for clm
(from package
ordinal) to fit (heteroscedastic) extended logistic regression (HXLR)
models (Messner et al. 2013).
hxlr(formula, data, subset, na.action, weights, thresholds, link, control, ...)
hxlr(formula, data, subset, na.action, weights, thresholds, link, control, ...)
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
subset |
an optional vector specifying a subset of observations to be used for fitting. |
na.action |
a function which indicates what should happen when the data
contain |
weights |
optional case weights in fitting. |
thresholds |
vector of (transformed) thresholds that are used to cut the continuous response into categories. Data frames or matrices with multiple columns are allowed as well. Then each column is used as separate predictor variable for the intercept model. |
link |
link function, i.e., the type of location-scale distribution
assumed for the latent distribution. Default is |
control |
a list of control parameters passed to |
... |
arguments to be used to form the default |
Extended logistic regression (Wilks 2009) extends binary logistic regression to multi-category responses by including the thresholds, that are used to cut a continuous variable into categories, in the regression equation. Heteroscedastic extended logistic regression (Messner et al. 2013) extends this model further and allows to add additional predictor variables that are used to predict the scale of the latent logistic distribution.
An object of class "hxlr"
, i.e., a list with the following elements.
coefficients |
list of CLM coefficients for intercept, location, and scale model. |
fitted.values |
list of fitted location and scale parameters. |
optim |
output from optimization from |
method |
Optimization method used for |
control |
list of control parameters passed to |
start |
starting values of coefficients used in the optimization. |
weights |
case weights used for fitting. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
vcov |
covariance matrix. |
converged |
logical variable whether optimization has converged or not. |
iterations |
number of iterations in optimization. |
call |
function call. |
scale |
the formula supplied. |
terms |
the |
levels |
list of levels of the factors used in fitting for location and scale respectively. |
thresholds |
the thresholds supplied. |
Messner JW, Mayr GJ, Zeileis A, Wilks DS (2014). Extending Extended Logistic Regression to Effectively Utilize the Ensemble Spread. Monthly Weather Review, 142, 448–456. doi:10.1175/MWR-D-13-00271.1.
Wilks DS (2009). Extending Logistic Regression to Provide Full-Probability-Distribution MOS Forecasts. Meteorological Applications, 368, 361–368.
data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## climatological deciles q <- unique(quantile(RainIbk$rain, seq(0.1, 0.9, 0.1))) ## fit ordinary extended logistic regression with ensemble mean as ## predictor variable XLR <- hxlr(sqrt(rain) ~ sqrtensmean, data = RainIbk, thresholds = sqrt(q)) ## print XLR ## summary summary(XLR) ## fit ordinary extended logistic regression with ensemble mean ## and standard deviation as predictor variables XLRS <- hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = RainIbk, thresholds = sqrt(q)) ## fit heteroscedastic extended logistic regression with ensemble ## standard deviation as predictor for the scale HXLR <- hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = RainIbk, thresholds = sqrt(q)) ## compare AIC of different models AIC(XLR, XLRS, HXLR) ## XLRS and HXLR are nested in XLR -> likelihood-ratio-tests if(require("lmtest")) { lrtest(XLR, XLRS) lrtest(XLR, HXLR) } ## Not run: ################################################################### ## Cross-validation and bootstrapping RPS for different models ## (like in Messner 2013). N <- NROW(RainIbk) ## function that returns model fits fits <- function(data, weights = rep(1, N)) { list( "XLR" = hxlr(sqrt(rain) ~ sqrtensmean, data = data, weights = weights, thresholds = sqrt(q)), "XLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)), "XLR:SM" = hxlr(sqrt(rain) ~ sqrtensmean + I(sqrtensmean*sqrtenssd), data = data, weights = weights, thresholds = sqrt(q)), "HXLR" = hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)), "HXLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd | sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)) ) } ## cross validation id <- sample(1:10, N, replace = TRUE) obs <- NULL pred <- list(NULL) for(i in 1:10) { ## splitting into test and training data set trainIndex <- which(id != i) testIndex <- which(id == i) ## weights that are used for fitting the models weights <- as.numeric(table(factor(trainIndex, levels = c(1:N)))) ## testdata testdata <- RainIbk[testIndex,] ## observations obs <- c(obs, RainIbk$rain[testIndex]) ## estimation modelfits <- fits(RainIbk, weights) ## Prediction pred2 <- lapply(modelfits, predict, newdata = testdata, type = "cumprob") pred <- mapply(rbind, pred, pred2, SIMPLIFY = FALSE) } names(pred) <- c(names(modelfits)) ## function to compute RPS rps <- function(pred, obs) { OBS <- NULL for(i in 1:N) OBS <- rbind(OBS, rep(0:1, c(obs[i] - 1, length(q) - obs[i] + 1))) apply((OBS-pred)^2, 1, sum) } ## compute rps RPS <- lapply(pred, rps, obs = as.numeric(cut(obs, c(-Inf, q, Inf)))) ## bootstrapping mean rps rpsall <- NULL for(i in 1:250) { index <- sample(length(obs), replace = TRUE) rpsall <- rbind(rpsall, sapply(RPS, function(x) mean(x[index]))) } rpssall <- 1 - rpsall/rpsall[,1] boxplot(rpssall[,-1], ylab = "RPSS", main = "RPSS relative to XLR") abline(h = 0, lty = 2) ## End(Not run)
data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## climatological deciles q <- unique(quantile(RainIbk$rain, seq(0.1, 0.9, 0.1))) ## fit ordinary extended logistic regression with ensemble mean as ## predictor variable XLR <- hxlr(sqrt(rain) ~ sqrtensmean, data = RainIbk, thresholds = sqrt(q)) ## print XLR ## summary summary(XLR) ## fit ordinary extended logistic regression with ensemble mean ## and standard deviation as predictor variables XLRS <- hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = RainIbk, thresholds = sqrt(q)) ## fit heteroscedastic extended logistic regression with ensemble ## standard deviation as predictor for the scale HXLR <- hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = RainIbk, thresholds = sqrt(q)) ## compare AIC of different models AIC(XLR, XLRS, HXLR) ## XLRS and HXLR are nested in XLR -> likelihood-ratio-tests if(require("lmtest")) { lrtest(XLR, XLRS) lrtest(XLR, HXLR) } ## Not run: ################################################################### ## Cross-validation and bootstrapping RPS for different models ## (like in Messner 2013). N <- NROW(RainIbk) ## function that returns model fits fits <- function(data, weights = rep(1, N)) { list( "XLR" = hxlr(sqrt(rain) ~ sqrtensmean, data = data, weights = weights, thresholds = sqrt(q)), "XLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)), "XLR:SM" = hxlr(sqrt(rain) ~ sqrtensmean + I(sqrtensmean*sqrtenssd), data = data, weights = weights, thresholds = sqrt(q)), "HXLR" = hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)), "HXLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd | sqrtenssd, data = data, weights = weights, thresholds = sqrt(q)) ) } ## cross validation id <- sample(1:10, N, replace = TRUE) obs <- NULL pred <- list(NULL) for(i in 1:10) { ## splitting into test and training data set trainIndex <- which(id != i) testIndex <- which(id == i) ## weights that are used for fitting the models weights <- as.numeric(table(factor(trainIndex, levels = c(1:N)))) ## testdata testdata <- RainIbk[testIndex,] ## observations obs <- c(obs, RainIbk$rain[testIndex]) ## estimation modelfits <- fits(RainIbk, weights) ## Prediction pred2 <- lapply(modelfits, predict, newdata = testdata, type = "cumprob") pred <- mapply(rbind, pred, pred2, SIMPLIFY = FALSE) } names(pred) <- c(names(modelfits)) ## function to compute RPS rps <- function(pred, obs) { OBS <- NULL for(i in 1:N) OBS <- rbind(OBS, rep(0:1, c(obs[i] - 1, length(q) - obs[i] + 1))) apply((OBS-pred)^2, 1, sum) } ## compute rps RPS <- lapply(pred, rps, obs = as.numeric(cut(obs, c(-Inf, q, Inf)))) ## bootstrapping mean rps rpsall <- NULL for(i in 1:250) { index <- sample(length(obs), replace = TRUE) rpsall <- rbind(rpsall, sapply(RPS, function(x) mean(x[index]))) } rpssall <- 1 - rpsall/rpsall[,1] boxplot(rpssall[,-1], ylab = "RPSS", main = "RPSS relative to XLR") abline(h = 0, lty = 2) ## End(Not run)
Auxiliary function for hxlr
fitting. Specifies a list of values passed
to optim
.
hxlr.control(method = "BFGS", maxit = 5000, hessian = TRUE, trace = FALSE, start = NULL, ...)
hxlr.control(method = "BFGS", maxit = 5000, hessian = TRUE, trace = FALSE, start = NULL, ...)
method |
optimization method used in |
maxit |
the maximum number of iterations. |
hessian |
logical. Should a numerically differentiated Hessian matrix be returned? |
trace |
non-negative integer. If positive, tracing information on the progress of the optimization is produced. |
start |
initial values for the parameters to be optimized over. |
... |
Additional parameters passed to |
A list with components named as the arguments.
Plot paths of coefficients or log-likelihood contributions for crch.boost
models.
## S3 method for class 'crch.boost' plot(x, loglik = FALSE, standardize = TRUE, which = c("both", "location", "scale"), mstop = NULL, coef.label = TRUE, col = NULL, ...)
## S3 method for class 'crch.boost' plot(x, loglik = FALSE, standardize = TRUE, which = c("both", "location", "scale"), mstop = NULL, coef.label = TRUE, col = NULL, ...)
x |
an object of class |
loglik |
logical whether log-likelihood contribution shall be plotted instead of coefficient value. |
standardize |
logical whether coefficients shall be standardized.
Not used if |
which |
which coefficients: |
mstop |
Stopping iteration for which a vertical line is plotted.
Possible choices are |
coef.label |
logical whether paths shall be labeled. |
col |
Color(s) for the paths. If |
... |
further arguments passed to |
Obtains various types of predictions for crch
models.
## S3 method for class 'crch' predict(object, newdata = NULL, type = c("location", "scale", "response", "parameter", "density", "probability", "quantile", "crps"), na.action = na.pass, at = 0.5, left = NULL, right = NULL, ...) ## S3 method for class 'crch' prodist(object, newdata = NULL, na.action = na.pass, left = NULL, right = NULL, ...)
## S3 method for class 'crch' predict(object, newdata = NULL, type = c("location", "scale", "response", "parameter", "density", "probability", "quantile", "crps"), na.action = na.pass, at = 0.5, left = NULL, right = NULL, ...) ## S3 method for class 'crch' prodist(object, newdata = NULL, na.action = na.pass, left = NULL, right = NULL, ...)
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction: |
na.action |
a function which indicates what should happen when the data
contain |
at |
a vector of values to evaluate the predictive density ( |
left |
left censoring or truncation point. Only used for |
right |
right censoring or truncation point. Only used for |
... |
further arguments passed to or from other methods. |
The predict
method, for type "response"
, "location"
, or "scale"
,
returns a vector with either the location or the scale of the predicted distribution.
For type "quantile"
a matrix of predicted quantiles each column
corresponding to an element of at
.
The prodist
method returns the fitted/predict probability distribution object.
Obtains various types of predictions for crch.boost
models.
## S3 method for class 'crch.boost' predict(object, newdata = NULL, mstop = NULL, ...)
## S3 method for class 'crch.boost' predict(object, newdata = NULL, mstop = NULL, ...)
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
mstop |
stopping iteration. Can be either a character ( |
... |
further arguments passed to or from other methods. |
For type "response"
, "location"
, or "scale"
a vector with
either the location or the scale of the predicted distribution.
For type "quantile"
a matrix of predicted quantiles each column
corresponding to an element of at
.
Obtains various types of predictions/fitted values for heteroscedastic extended logistic regression (HXLR) models.
## S3 method for class 'hxlr' predict(object, newdata = NULL, type = c("class", "probability", "cumprob", "location", "scale"), thresholds = object$thresholds, na.action = na.pass, ...) ## S3 method for class 'hxlr' fitted(object, type = c("class", "probability", "cumprob", "location", "scale"), ...)
## S3 method for class 'hxlr' predict(object, newdata = NULL, type = c("class", "probability", "cumprob", "location", "scale"), thresholds = object$thresholds, na.action = na.pass, ...) ## S3 method for class 'hxlr' fitted(object, type = c("class", "probability", "cumprob", "location", "scale"), ...)
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction: |
thresholds |
optional thresholds used for defining the thresholds for
types |
na.action |
A function which indicates what should happen when the data
contain |
... |
further arguments passed to or from other methods. |
For type "prob"
a matrix with number of intervals (= number of
thresholds + 1) columns is produced. Each row corresponds to a row in newdata
and contains the predicted probabilities to fall in the corresponding interval.
For type "cumprob"
a matrix with number of thresholds columns is
produced. Each row corresponds to a row in newdata
and contains the
predicted probabilities to fall below the corresponding threshold.
For types "class"
, "location"
, and "scale"
a vector is
returned respectively with either the most probable categories ("class"
)
or the location ("location"
) or scale (scale
) of the latent
distribution.
Accumulated 5-8 days precipitation amount for Innsbruck. Data includes GEFS reforecasts (Hamill et al. 2013) and observations from SYNOP station Innsbruck Airport (11120) from 2000-01-01 to 2013-09-17.
data("RainIbk", package = "crch")
data("RainIbk", package = "crch")
A data frame with 4977 rows. The first column (rain
) are 3 days
accumulated precipitation amount observations, Columns 2-12 (rainfc
)
are 5-8 days accumulated precipitation amount forecasts from the individual
ensemble members.
Observations: https://www.ogimet.com/synops.phtml.en
Reforecasts: https://psl.noaa.gov/forecasts/reforecast2/
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau Jr TJ, Zhu Y, Lapenta W (2013). NOAA's Second-Generation Global Medium-Range Ensemble Reforecast Data Set. Bulletin of the American Meteorological Society, 94(10), 1553-1565.
## Spread skill relationship ## ## load and prepare data data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## quintiles of sqrtenssd sdcat <- cut(RainIbk$sqrtenssd, c(-Inf, quantile(RainIbk$sqrtenssd, seq(0.2,0.8,0.2)), Inf), labels = c(1:5)) ## mean forecast errors for each quintile m <- NULL for(i in levels(sdcat)) { m <- c(m, mean((sqrt(RainIbk$rain)[sdcat == i] - RainIbk$sqrtensmean[sdcat == i])^2, na.rm = TRUE)) } ## plot boxplot((sqrt(rain) - sqrtensmean)^2~sdcat, RainIbk, xlab = "Quintile of ensemble standard deviation", ylab = "mean squared error", main = "Spread skill relationship")
## Spread skill relationship ## ## load and prepare data data("RainIbk", package = "crch") ## mean and standard deviation of square root transformed ensemble forecasts RainIbk$sqrtensmean <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean) RainIbk$sqrtenssd <- apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd) ## quintiles of sqrtenssd sdcat <- cut(RainIbk$sqrtenssd, c(-Inf, quantile(RainIbk$sqrtenssd, seq(0.2,0.8,0.2)), Inf), labels = c(1:5)) ## mean forecast errors for each quintile m <- NULL for(i in levels(sdcat)) { m <- c(m, mean((sqrt(RainIbk$rain)[sdcat == i] - RainIbk$sqrtensmean[sdcat == i])^2, na.rm = TRUE)) } ## plot boxplot((sqrt(rain) - sqrtensmean)^2~sdcat, RainIbk, xlab = "Quintile of ensemble standard deviation", ylab = "mean squared error", main = "Spread skill relationship")
Density, distribution function, quantile function, and random generation for the left and/or right truncated logistic distribution.
dtlogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE) ptlogis(q, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtlogis(p, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtlogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
dtlogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE) ptlogis(q, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtlogis(p, location = 0, scale = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtlogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
left |
left truncation point. |
right |
right truncation point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated logistic distribution has density
for , and 0 otherwise.
and
are the cumulative distribution function
and probability density function of the standard logistic distribution
respectively,
is the location of the distribution, and
the scale.
dtlogis
gives the density, ptlogis
gives the distribution
function, qtlogis
gives the quantile function, and rtlogis
generates random deviates.
Density, distribution function, quantile function, and random generation for the left and/or right truncated normal distribution.
dtnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE) ptnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
dtnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE) ptnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
mean |
vector of means. |
sd |
vector of standard deviations. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If mean
or sd
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated normal distribution has density
for , and 0 otherwise.
and
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively,
is the mean of the distribution, and
the standard deviation.
dtnorm
gives the density, ptnorm
gives the distribution
function, qtnorm
gives the quantile function, and rtnorm
generates random deviates.
Class and methods for left-, right-, and interval-truncated logistic distributions using the workflow from the distributions3 package.
TruncatedLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
TruncatedLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
location |
numeric. The location parameter of the underlying untruncated
logistic distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying untruncated logistic distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
The constructor function TruncatedLogistic
sets up a distribution
object, representing the truncated logistic probability distribution by the
corresponding parameters: the latent mean location
= and
latent standard deviation
scale
= (i.e., the parameters
of the underlying untruncated logistic variable), the
left
truncation
point (with -Inf
corresponding to untruncated), and the
right
truncation point (with Inf
corresponding to untruncated).
The truncated logistic distribution has probability density function (PDF):
for , and 0 otherwise,
where
and
are the cumulative distribution function
and probability density function of the standard logistic distribution,
respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated logistic distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedLogistic
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated logistic distributions in the crch
package, see dtlogis
, and the crps_tlogis
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
A TruncatedLogistic
distribution object.
dtlogis
, Logistic
, CensoredLogistic
,
TruncatedNormal
, TruncatedStudentsT
## package and random seed library("distributions3") set.seed(6020) ## three truncated logistic distributions: ## - untruncated standard logistic ## - left-truncated at zero with latent location = 1 and scale = 1 ## - interval-truncated in [0, 5] with latent location = 2 and scale = 1 X <- TruncatedLogistic( location = c( 0, 1, 2), scale = c( 1, 1, 1), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the truncated distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at truncation points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three truncated logistic distributions: ## - untruncated standard logistic ## - left-truncated at zero with latent location = 1 and scale = 1 ## - interval-truncated in [0, 5] with latent location = 2 and scale = 1 X <- TruncatedLogistic( location = c( 0, 1, 2), scale = c( 1, 1, 1), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the truncated distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at truncation points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Class and methods for left-, right-, and interval-truncated normal distributions using the workflow from the distributions3 package.
TruncatedNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
TruncatedNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
mu |
numeric. The location parameter of the underlying untruncated
normal distribution, typically written |
sigma |
numeric. The scale parameter (standard deviation) of
the underlying untruncated normal distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
The constructor function TruncatedNormal
sets up a distribution
object, representing the truncated normal probability distribution by the
corresponding parameters: the latent mean mu
= and
latent standard deviation
sigma
= (i.e., the parameters
of the underlying untruncated normal variable), the
left
truncation
point (with -Inf
corresponding to untruncated), and the
right
truncation point (with Inf
corresponding to untruncated).
The truncated normal distribution has probability density function (PDF):
for , and 0 otherwise,
where
and
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated normal distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedNormal
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated normal distributions in the crch
package, see dtnorm
, and the crps_tnorm
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
A TruncatedNormal
distribution object.
dtnorm
, Normal
, CensoredNormal
,
TruncatedLogistic
, TruncatedStudentsT
## package and random seed library("distributions3") set.seed(6020) ## three truncated normal distributions: ## - untruncated standard normal ## - left-truncated at zero with latent mu = 1 and sigma = 1 ## - interval-truncated in [0, 5] with latent mu = 1 and sigma = 2 X <- TruncatedNormal( mu = c( 0, 1, 1), sigma = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean and variance of the truncated distribution mean(X) variance(X) ## higher moments (skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three truncated normal distributions: ## - untruncated standard normal ## - left-truncated at zero with latent mu = 1 and sigma = 1 ## - interval-truncated in [0, 5] with latent mu = 1 and sigma = 2 X <- TruncatedNormal( mu = c( 0, 1, 1), sigma = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean and variance of the truncated distribution mean(X) variance(X) ## higher moments (skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Class and methods for left-, right-, and interval-truncated t distributions using the workflow from the distributions3 package.
TruncatedStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
TruncatedStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
df |
numeric. The degrees of freedom of the underlying untruncated
t distribution. Can be any positive number, with |
location |
numeric. The location parameter of the underlying untruncated
t distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying untruncated t distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
The constructor function TruncatedStudentsT
sets up a distribution
object, representing the truncated t probability distribution by the
corresponding parameters: the degrees of freedom df
, the latent mean
location
= and latent scale parameter
scale
=
(i.e., the parameters of the underlying untruncated t variable),
the
left
truncation point (with -Inf
corresponding to untruncated),
and the right
truncation point (with Inf
corresponding to untruncated).
The truncated t distribution has probability density function (PDF) :
for , and 0 otherwise,
where
and
are the cumulative distribution function
and probability density function of the standard t distribution with
df
degrees of freedom, respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated t distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedStudentsT
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated t distributions in the crch
package, see dtt
, and the crps_tt
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
A TruncatedStudentsT
distribution object.
dtt
, StudentsT
, CensoredStudentsT
,
TruncatedNormal
, TruncatedLogistic
## package and random seed library("distributions3") set.seed(6020) ## three truncated t distributions: ## - untruncated standard t with 5 degrees of freedom ## - left-truncated at zero with 5 df, latent location = 1 and scale = 1 ## - interval-truncated in [0, 5] with 5 df, latent location = 2 and scale = 2 X <- TruncatedStudentsT( df = c( 5, 5, 5), location = c( 0, 1, 2), scale = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the truncated distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at truncation points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
## package and random seed library("distributions3") set.seed(6020) ## three truncated t distributions: ## - untruncated standard t with 5 degrees of freedom ## - left-truncated at zero with 5 df, latent location = 1 and scale = 1 ## - interval-truncated in [0, 5] with 5 df, latent location = 2 and scale = 2 X <- TruncatedStudentsT( df = c( 5, 5, 5), location = c( 0, 1, 2), scale = c( 1, 1, 2), left = c(-Inf, 0, 0), right = c( Inf, Inf, 5) ) X ## compute mean of the truncated distribution mean(X) ## higher moments (variance, skewness, kurtosis) are not implemented yet ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "untruncated") hist(x[2, ], main = "left-truncated at zero") hist(x[3, ], main = "interval-truncated in [0, 5]") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0, 0, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at truncation points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) ) ## evaluate continuous ranked probability score (CRPS) using scoringRules library("scoringRules") crps(X, x)
Density, distribution function, quantile function, and random generation
for the left and/or right truncated student-t distribution with df
degrees of freedom.
dtt(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE) ptt(q, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtt(p, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtt(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
dtt(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE) ptt(q, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) qtt(p, location = 0, scale = 1, df, left = -Inf, right = Inf, lower.tail = TRUE, log.p = FALSE) rtt(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
df |
degrees of freedom (> 0, maybe non-integer). |
left |
left censoring point. |
right |
right censoring point. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated student-t distribution has density
for , and 0 otherwise.
where and
are the cumulative distribution function
and probability density function of the student-t distribution with
df
degrees of freedom respectively, is the location of the
distribution, and
the scale.
dtt
gives the density, ptt
gives the distribution
function, qtt
gives the quantile function, and rtt
generates random deviates.