Title: | Analysis of Count Time Series |
---|---|
Description: | Likelihood-based methods for model fitting and assessment, prediction and intervention analysis of count time series following generalized linear models are provided, see Liboschik et al. (2017) <doi:10.18637/jss.v082.i05>. Models with the identity and with the logarithmic link function are allowed. The conditional distribution can be Poisson or Negative Binomial. |
Authors: | Tobias Liboschik [aut, cre], Roland Fried [aut], Konstantinos Fokianos [aut], Philipp Probst [aut], Jonathan Rathjens [ctb], Nicolò Rubattu [ctb] |
Maintainer: | Tobias Liboschik <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.4.4 |
Built: | 2024-11-24 06:26:17 UTC |
Source: | https://github.com/r-forge/tscount |
Collection of R functions for analysis of count time series. Currently the focus is on count time series following generalised linear models.
See the main function tsglm
for more details on the usage of the package. There is a vignette available which introduces the functionality of the package and its underlying statistical methods (vignette("tsglm", package="tscount")
).
Tobias Liboschik <[email protected]>
Christou, V. and Fokianos, K. (2014) Quasi-likelihood inference for negative binomial time series models. Journal of Time Series Analysis 35(1), 55–78, http://dx.doi.org/10.1002/jtsa.12050.
Christou, V. and Fokianos, K. (2015) Estimation and testing linearity for non-linear mixed poisson autoregressions. Electronic Journal of Statistics 9, 1357–1377, http://dx.doi.org/10.1214/15-EJS1044.
Ferland, R., Latour, A. and Oraichi, D. (2006) Integer-valued GARCH process. Journal of Time Series Analysis 27(6), 923–942, http://dx.doi.org/10.1111/j.1467-9892.2006.00496.x.
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Fokianos, K., Rahbek, A. and Tjostheim, D. (2009) Poisson autoregression. Journal of the American Statistical Association 104(488), 1430–1439, http://dx.doi.org/10.1198/jasa.2009.tm08270.
Fokianos, K. and Tjostheim, D. (2011) Log-linear Poisson autoregression. Journal of Multivariate Analysis 102(3), 563–578, http://dx.doi.org/10.1016/j.jmva.2010.11.002.
Liboschik, T. (2016) Modelling count time series following generalized linear models. PhD Thesis TU Dortmund University, http://dx.doi.org/10.17877/DE290R-17191.
Liboschik, T., Kerschke, P., Fokianos, K. and Fried, R. (2016) Modelling interventions in INGARCH processes. International Journal of Computer Mathematics 93(4), 640–657, http://dx.doi.org/10.1080/00207160.2014.949250.
Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.
Time series with the number of cases of campylobacter infections in the north of the province Quebec (Canada) in four week intervals from January 1990 to the end of October 2000. It has 13 observations per year and 140 observations in total. Campylobacterosis is an acute bacterial infectious disease attacking the digestive system.
campy
campy
A time series of class "ts"
.
Ferland, R., Latour, A. and Oraichi, D. (2006) Integer-valued GARCH process. Journal of Time Series Analysis 27(6), 923–942, http://dx.doi.org/10.1111/j.1467-9892.2006.00496.x.
ecoli
, ehec
, influenza
, measles
in this package, polio
in package gamlss.data
plot(campy) #Fit the INGARCH model used in Ferland et al. (2006): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) summary(campyfit) plot(campyfit) #Note that these parameter estimations differ from those obtained by #Ferland et al. (2006). This might be due to a different initialisation #of pre-sample values and different optimisation algorithms (they use #Microsoft Excel Solver Macro).
plot(campy) #Fit the INGARCH model used in Ferland et al. (2006): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) summary(campyfit) plot(campyfit) #Note that these parameter estimations differ from those obtained by #Ferland et al. (2006). This might be due to a different initialisation #of pre-sample values and different optimisation algorithms (they use #Microsoft Excel Solver Macro).
Density, distribution function, quantile function, random generation, standard deviation and Anscombe residuals for some count data distributions. These auxiliary functions are used by several functions of the tscount
package.
ddistr(x, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) pdistr(q, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) qdistr(p, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) rdistr(n, meanvalue, distr=c("poisson", "nbinom"), distrcoefs) sddistr(meanvalue, distr=c("poisson", "nbinom"), distrcoefs) ardistr(response, meanvalue, distr=c("poisson", "nbinom"), distrcoefs) checkdistr(distr=c("poisson", "nbinom"), distrcoefs)
ddistr(x, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) pdistr(q, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) qdistr(p, meanvalue, distr=c("poisson", "nbinom"), distrcoefs, ...) rdistr(n, meanvalue, distr=c("poisson", "nbinom"), distrcoefs) sddistr(meanvalue, distr=c("poisson", "nbinom"), distrcoefs) ardistr(response, meanvalue, distr=c("poisson", "nbinom"), distrcoefs) checkdistr(distr=c("poisson", "nbinom"), distrcoefs)
x |
vector of (non-negative integer) quantiles. |
q |
vector of quantiles. |
p |
vector of probabilities. |
n |
positive integer value giving the number of random values to return. |
response |
vector of true observations for calculation of residuals. |
meanvalue |
non-negative numeric vector of means. |
distr |
character value giving the distribution. Possible values are currently |
distrcoefs |
vector of additional distribution coefficients. For the Poisson distribution this argument can be omitted. For the negative binomial distribution it needs to be a vector of length one giving the value for the parameter |
... |
additional arguments |
Basically, these function are wrappers for specific functions for the respective distribution. The function ddistr
gives the density of the specified distribution, pdistr
the distribution function, qdistr
the quantile function and rdistr
generates random deviates from this distribution. These functions are a generalisation of the respective functions where distr
is replaced by either pois
or nbinom
. The function sddistr
returns the standard deviation of the specified distribution. The function ardistr
calculates Anscombe residuals for given values of the response. The function checkdistr
is for verification of the arguments distr
and distrcoefs
.
Tobias Liboschik
Poisson
for the Poisson distribution and NegBinomial
for the negative binomial distribution.
tsglm
for fitting a more genereal GLM for time series of counts.
Weekly number of reported disease cases caused by Escherichia coli in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013, excluding cases of EHEC and HUS.
ecoli
ecoli
A data frame with variables year
and week
giving the year and calendar week of observation, and with a variable cases
giving the number of reported cases in the respective week.
Robert Koch Institute: SurvStat@RKI, https://survstat.rki.de, accessed on 10th June 2013.
The data are provided with kind permission of the Robert Koch Institute. Further details and terms of usage are given at https://survstat.rki.de. More data reported under the German Infectious Diseases Protection Act is available via the SurvStat@RKI web application linked above.
campy
, ehec
, influenza
, measles
in this package, polio
in package gamlss.data
Weekly number of reported EHEC/HUS infections in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013.
ehec
ehec
A data frame with variables year
and week
giving the year and calendar week of observation, and with a variable cases
giving the number of reported cases in the respective week.
Robert Koch Institute: SurvStat@RKI, https://survstat.rki.de, accessed on 10th June 2013.
The data are provided with kind permission of the Robert Koch Institute. Further details and terms of usage are given at https://survstat.rki.de. More data reported under the German Infectious Diseases Protection Act is available via the SurvStat@RKI web application linked above.
campy
, ecoli
, influenza
, measles
in this package, polio
in package gamlss.data
Weekly number of reported influenza cases in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013.
influenza
influenza
A data frame with variables year
and week
giving the year and calendar week of observation, and with a variable cases
giving the number of reported cases in the respective week.
Robert Koch Institute: SurvStat@RKI, https://survstat.rki.de, accessed on 10th June 2013.
The data are provided with kind permission of the Robert Koch Institute. Further details and terms of usage are given at https://survstat.rki.de. More data reported under the German Infectious Diseases Protection Act is available via the SurvStat@RKI web application linked above.
campy
, ecoli
, ehec
, measles
in this package, polio
in package gamlss.data
Functions to calculate the analytical mean, variance and autocorrelation / partial autocorrelation / autocovariance function of an integer-valued generalised autoregressive conditional heteroscedasticity (INGARCH) process.
ingarch.mean(intercept, past_obs=NULL, past_mean=NULL) ingarch.var(intercept, past_obs=NULL, past_mean=NULL) ingarch.acf(intercept, past_obs=NULL, past_mean=NULL, lag.max=10, type=c("acf", "pacf", "acvf"), plot=TRUE, ...)
ingarch.mean(intercept, past_obs=NULL, past_mean=NULL) ingarch.var(intercept, past_obs=NULL, past_mean=NULL) ingarch.acf(intercept, past_obs=NULL, past_mean=NULL, lag.max=10, type=c("acf", "pacf", "acvf"), plot=TRUE, ...)
intercept |
numeric positive value for the intercept |
past_obs |
numeric non-negative vector containing the coefficients |
past_mean |
numeric non-negative vector containing the coefficients |
lag.max |
integer value indicating how many lags of the (partial) autocorrelation / autocovariance function should be calculated. |
type |
character. If |
plot |
logical. If |
... |
additional arguments to be passed to function |
The INGARCH model of order and
used here follows the definition
where is the history of the process up to time
and
is the Poisson distribution parametrised by its mean (cf. Ferland et al., 2006).
The conditional mean
is given by
The function ingarch.acf
depends on the function tacvfARMA
from package ltsa
, which needs to be installed.
Tobias Liboschik
Ferland, R., Latour, A. and Oraichi, D. (2006) Integer-valued GARCH process. Journal of Time Series Analysis 27(6), 923–942, http://dx.doi.org/10.1111/j.1467-9892.2006.00496.x.
tsglm
for fitting a more genereal GLM for time series of counts of which this INGARCH model is a special case. tsglm.sim
for simulation from such a model.
ingarch.mean(0.3, c(0.1,0.1), 0.1) ## Not run: ingarch.var(0.3, c(0.1,0.1), 0.1) ingarch.acf(0.3, c(0.1,0.1,0.1), 0.1, type="acf", lag.max=15) ## End(Not run)
ingarch.mean(0.3, c(0.1,0.1), 0.1) ## Not run: ingarch.var(0.3, c(0.1,0.1), 0.1) ingarch.acf(0.3, c(0.1,0.1,0.1), 0.1, type="acf", lag.max=15) ## End(Not run)
Generates covariates describing certain types of intervention effects according to the definition by Fokianos and Fried (2010).
interv_covariate(n, tau, delta)
interv_covariate(n, tau, delta)
n |
integer value giving the number of observations the covariates should have. |
tau |
integer vector giving the times where intervention effects occur. |
delta |
numeric vector with constants specifying the type of intervention (see Details). Must be of the same length as |
The intervention effect occuring at time is described by the covariate
where is the indicator function which is 0 for
and 1 for
. The constant
with
specifies the type of intervention. For
the intervention has an effect only at the time of its occurence, for
the effect decays exponentially and for
there is a persistent effect of the intervention after its occurence.
If tau
and delta
are vectors, one covariate is generated with tau[1]
as and
delta[1]
as , another covariate for the second elements and so on.
A matrix with n
rows and length(tau)
columns. The generated covariates describing the interventions are the columns of the matrix.
Tobias Liboschik
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Liboschik, T. (2016) Modelling count time series following generalized linear models. PhD Thesis TU Dortmund University, http://dx.doi.org/10.17877/DE290R-17191.
Liboschik, T., Kerschke, P., Fokianos, K. and Fried, R. (2016) Modelling interventions in INGARCH processes. International Journal of Computer Mathematics 93(4), 640–657, http://dx.doi.org/10.1080/00207160.2014.949250.
tsglm
for fitting a GLM for time series of counts.
interv_test
, interv_detect
and interv_multiple
for tests and detection procedures for intervention effects.
interv_covariate(n=140, tau=c(84,100), delta=c(1,0))
interv_covariate(n=140, tau=c(84,100), delta=c(1,0))
Detection procedure for an intervention of given type occuring at unknown time as proposed by Fokianos and Fried (2010, 2012).
## S3 method for class 'tsglm' interv_detect(fit, taus=2:length(fit$ts), delta, external=FALSE, B=NULL, info=c("score"), start.control_bootstrap, final.control_bootstrap, inter.control_bootstrap, parallel=FALSE, est_interv=TRUE, ...)
## S3 method for class 'tsglm' interv_detect(fit, taus=2:length(fit$ts), delta, external=FALSE, B=NULL, info=c("score"), start.control_bootstrap, final.control_bootstrap, inter.control_bootstrap, parallel=FALSE, est_interv=TRUE, ...)
fit |
an object of class |
taus |
integer vector of time points which are considered for the possible intervention to occur. Default is to consider all possible time points. |
delta |
numeric value that determines the type of intervention (see Details). |
external |
logical value specifying wether the intervention's effect is external or not (see Details). |
B |
positive integer value giving the number of bootstrap samples for estimation of the p-value. For |
info |
character value that determines how to calculate the information matrix, see |
start.control_bootstrap |
named list that determines how to make initial estimation in the bootstrap, see argument |
final.control_bootstrap |
named list that determines how to make final maximum likelihood estimation in the bootstrap, see argument |
inter.control_bootstrap |
named list determining how to maximise the log-likelihood function in an intermediate step, see argument |
parallel |
logical value. If |
est_interv |
logical value. If |
... |
additional arguments passed to the fitting function |
For each time in taus
the score test statistic for an intervention effect occuring at that time is computed, see interv_test
. The time with the maximum test statistic is considered as a candidate for a possible intervention effect at that time. The type of the intervention effect is specified by delta
as described in interv_covariate
. The intervention is included as an additional covariate according to the definition in tsglm
. It can have an internal (the default) or external (external=TRUE
) effect (see Liboschik et al., 2014).
If argument B
is not NULL
, the null hypothesis that there is no intervention effect at any time is tested. Test statistic for this test is the maximum test statistic of the score test (see above). The p-value is computed by a parametric bootstrap with B
bootstrap samples. It is recommended to use at least several hundred bootstrap samples. Note that this bootstrap procedure is very time-consuming.
An object of class "interv_detect"
, which is a list with at least the following components:
test_statistic |
maximum value of the score test statistics for all considered times in |
test_statistic_tau |
numeric vector of all score test statistics at the considered times in |
tau_max |
time at which the score test statistic has its maximum. |
fit_H0 |
object of class |
model_interv |
model specification of the model with the specified intervention at time |
If argument est_interv=TRUE
(the default), the following component is additionally returned:
fit_interv |
object of class |
Tobias Liboschik, Philipp Probst, Konstantinos Fokianos and Roland Fried
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Liboschik, T. (2016) Modelling count time series following generalized linear models. PhD Thesis TU Dortmund University, http://dx.doi.org/10.17877/DE290R-17191.
Liboschik, T., Kerschke, P., Fokianos, K. and Fried, R. (2016) Modelling interventions in INGARCH processes. International Journal of Computer Mathematics 93(4), 640–657, http://dx.doi.org/10.1080/00207160.2014.949250.
tsglm
for fitting a GLM for time series of counts.
interv_test
for testing on intervention effects and interv_multiple
for iterative detection of multiple interventions of unknown types. interv_covariate
for generation of deterministic covariates describing intervention effects.
###Campylobacter infections in Canada (see help("campy")) #Searching for a potential intervention effect: campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervdetect <- interv_detect(fit=campyfit, taus=80:120, delta=1) campyfit_intervdetect plot(campyfit_intervdetect) #Additionally computing a p-value with the bootstrap procedure based on 500 #replications would take about 20 minutes in this example on a single #processing unit, of course depending on its speed. ## Not run: #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) interv_detect(fit=campyfit, taus=80:120, delta=1, B=500, parallel=TRUE) ## End(Not run)
###Campylobacter infections in Canada (see help("campy")) #Searching for a potential intervention effect: campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervdetect <- interv_detect(fit=campyfit, taus=80:120, delta=1) campyfit_intervdetect plot(campyfit_intervdetect) #Additionally computing a p-value with the bootstrap procedure based on 500 #replications would take about 20 minutes in this example on a single #processing unit, of course depending on its speed. ## Not run: #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) interv_detect(fit=campyfit, taus=80:120, delta=1, B=500, parallel=TRUE) ## End(Not run)
Iterative detection procedure for multiple interventions of unknown types occuring at unknown times as proposed by Fokianos and Fried (2010, 2012).
## S3 method for class 'tsglm' interv_multiple(fit, taus=2:length(fit$ts), deltas=c(0,0.8,1), external=FALSE, B=10, signif_level=0.05, start.control_bootstrap, final.control_bootstrap, inter.control_bootstrap, parallel=FALSE, ...)
## S3 method for class 'tsglm' interv_multiple(fit, taus=2:length(fit$ts), deltas=c(0,0.8,1), external=FALSE, B=10, signif_level=0.05, start.control_bootstrap, final.control_bootstrap, inter.control_bootstrap, parallel=FALSE, ...)
fit |
an object of class |
taus |
integer vector of times which are considered for the possible intervention to occur. Default is to consider all times. |
deltas |
numeric vector that determines the types of intervention to be considered (see Details). |
external |
logical value specifying wether the interventions effect is external or not (see Details). |
B |
positive integer value giving the number of bootstrap samples for estimation of the p-value. |
signif_level |
numeric value with |
start.control_bootstrap |
named list that determines how to make initial estimation in the bootstrap, see argument |
final.control_bootstrap |
named list that determines how to make final maximum likelihood estimation in the bootstrap, see argument |
inter.control_bootstrap |
named list determining how to maximise the log-likelihood function in an intermediate step, see argument |
parallel |
logical value. If |
... |
additional arguments passed to the function for detection of single intervention effects |
This function performs an iterative procedure for detection of multiple intervention effects. In each step the function interv_detect
is applied for each of the possible intervention types provided in the argument deltas
. If there is (after a Bonferroni correction) no significant intervention effect the procedure stops. Otherwise the type of intervention with the minimum p-value is chosen. In case of equal p-values preference is given to a level shift (i.e. ) and then to the type of intervention with the largest test statistic. The effect of the chosen intervention is removed from the time series. The time series cleaned from the intervention effect is tested for further interventions in a next step.
For each time in taus
the test statistic of a score test on an intervention effect occuring at that time is computed, see interv_test
. The time with the maximum test statistic is considered as a candidate for a possible intervention effect at that time. The type of the intervention effect is specified by delta
as described in interv_covariate
. The intervention is included as an additional covariate according to the definition in tsglm
. It can have an internal (the default) or external (external=TRUE
) effect (see Liboschik et al., 2014).
All p-values given in the output are multiplied by the number of intervention types considered to account for the multiple testing in each step by a Bonferroni correction. Note that this correction can lead to p-values greater than one.
Note that this bootstrap procedure is very time-consuming.
An object of class "interv_multiple"
, which is a list with the following components:
interventions |
data frame giving the detected interventions, which has the variables |
fit_H0 |
object of class |
fit_cleaned |
object of class |
model_interv |
model specification of the model with all detected interventions at their respective times. |
fit_interv |
object of class |
track |
named list of matrices with the detailed results of the iterative detection procedure. Element |
Tobias Liboschik, Philipp Probst, Konstantinos Fokianos and Roland Fried
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Liboschik, T. (2016) Modelling count time series following generalized linear models. PhD Thesis TU Dortmund University, http://dx.doi.org/10.17877/DE290R-17191.
Liboschik, T., Kerschke, P., Fokianos, K. and Fried, R. (2016) Modelling interventions in INGARCH processes. International Journal of Computer Mathematics 93(4), 640–657, http://dx.doi.org/10.1080/00207160.2014.949250.
tsglm
for fitting a GLM for time series of counts.
interv_test
for testing for intervention effects and interv_detect
for detection of single interventions of given type. interv_covariate
for generation of deterministic covariates describing intervention effects.
## Not run: ###Campylobacter infections in Canada (see help("campy")) #Searching for potential intervention effects (runs several hours!): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervmultiple <- interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), B=500, signif_level=0.05) campyfit_intervmultiple plot(campyfir_intervmultiple) #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), B=500, signif_level=0.05, parallel=TRUE) ## End(Not run)
## Not run: ###Campylobacter infections in Canada (see help("campy")) #Searching for potential intervention effects (runs several hours!): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervmultiple <- interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), B=500, signif_level=0.05) campyfit_intervmultiple plot(campyfir_intervmultiple) #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), B=500, signif_level=0.05, parallel=TRUE) ## End(Not run)
Test for one or more interventions of given type at given time as proposed by Fokianos and Fried (2010, 2012).
## S3 method for class 'tsglm' interv_test(fit, tau, delta, external, info=c("score"), est_interv=FALSE, ...)
## S3 method for class 'tsglm' interv_test(fit, tau, delta, external, info=c("score"), est_interv=FALSE, ...)
fit |
an object of class |
tau |
integer vector of times at which the interventions occur which are tested for. |
delta |
numeric vector that determines the types of the interventions (see Details). Must be of the same length as |
external |
logical vector of length |
info |
character value that determines how to calculate the information matrix, see |
est_interv |
logical value. If |
... |
additional arguments passed to the fitting function |
A score test on the null hypothesis of no interventions is done. The null hypothesis is that the data are generated from the model specified in the argument model
, see definition in tsglm
. Under the alternative there are one or more intervention effects occuring at times tau
. The types of the intervention effects are specified by delta
as defined in interv_covariate
. The interventions are included as additional covariates according to the definition in tsglm
. It can have an internal (the default) or external (external=TRUE
) effect (see Liboschik et al., 2014).
Under the null hypothesis the test statistic has asymptotically a chi-square distribution with length(tau)
(i.e. the number of breaks) degrees of freedom. The returned p-value is based on this and approximately valid for long time series, i.e. when length(ts)
large.
An object of class "interv_test"
, which is a list with at least the following components:
test_statistic |
value of the test statistic. |
df |
degrees of freedom of the chi-squared distribution the test statistic is compared with. |
p_value |
p-value of the test. |
fit_H0 |
object of class |
model_interv |
model specification of the model with the specified interventions. |
If argument est_interv=TRUE
, the following component is additionally returned:
fit_interv |
object of class |
Tobias Liboschik, Philipp Probst, Konstantinos Fokianos and Roland Fried
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Liboschik, T. (2016) Modelling count time series following generalized linear models. PhD Thesis TU Dortmund University, http://dx.doi.org/10.17877/DE290R-17191.
Liboschik, T., Kerschke, P., Fokianos, K. and Fried, R. (2016) Modelling interventions in INGARCH processes. International Journal of Computer Mathematics 93(4), 640–657, http://dx.doi.org/10.1080/00207160.2014.949250.
S3 method print
.
tsglm
for fitting a GLM for time series of counts.
interv_detect
for detection of single interventions of given type and interv_multiple
for iterative detection of multiple interventions of unknown types. interv_covariate
for generation of deterministic covariates describing intervention effects.
###Campylobacter infections in Canada (see help("campy")) #Test for the intervention effects which were found in Fokianos und Fried (2010): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervtest <- interv_test(fit=campyfit, tau=c(84,100), delta=c(1,0)) campyfit_intervtest
###Campylobacter infections in Canada (see help("campy")) #Test for the intervention effects which were found in Fokianos und Fried (2010): campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervtest <- interv_test(fit=campyfit, tau=c(84,100), delta=c(1,0)) campyfit_intervtest
Stable function for computing a covariance matrix from a given Fisher information matrix by inversion.
invertinfo(mat, silent=TRUE, stopOnError=FALSE)
invertinfo(mat, silent=TRUE, stopOnError=FALSE)
mat |
a Fisher Information Matrix. |
silent |
logical value. If |
stopOnError |
logical value. If |
A Cholesky decomposition is used to obtain the covariance matrix. This can be done because the Fisher information matrix is symmetric and positive definite.
This function is meant to be a more stable alternative to the function solve
, which does not take into account, that the matrix is symmetric and positive definite.
A list containing the following components:
vcov |
the covariance matrix. |
error_message |
possible error messages that occured when inverting the Fisher information matrix. |
Tobias Liboschik and Philipp Probst
library(Matrix) invertinfo(Hilbert(5), stopOnError=TRUE) invertinfo(Hilbert(100)) invertinfo(Hilbert(100), silent=FALSE) ## Not run: invertinfo(Hilbert(100), stopOnError=TRUE)
library(Matrix) invertinfo(Hilbert(5), stopOnError=TRUE) invertinfo(Hilbert(100)) invertinfo(Hilbert(100), silent=FALSE) ## Not run: invertinfo(Hilbert(100), stopOnError=TRUE)
The function produces a marginal calibration plot.
## S3 method for class 'tsglm' marcal(object, plot=TRUE, ...) ## Default S3 method: marcal(response, pred, distr=c("poisson", "nbinom"), distrcoefs, plot=TRUE, ...)
## S3 method for class 'tsglm' marcal(object, plot=TRUE, ...) ## Default S3 method: marcal(response, pred, distr=c("poisson", "nbinom"), distrcoefs, plot=TRUE, ...)
object |
an object of class |
plot |
logical. If |
response |
integer vector. Vector of observed values. |
pred |
numeric vector. Vector of predicted values. |
distr |
character giving the conditional distribution. Currently implemented are the Poisson ( |
distrcoefs |
numeric vector of additional coefficients specifying the conditional distribution. For |
... |
additional arguments to be passed to |
Marginal Calibration can be assessed by taking the difference between the average predictive cumulative distribution function (c.d.f.) and the empirical c.d.f. of the observations. Minor fluctuations about zero are expected if the marginal calibration hypothesis is true. For more information about marginal calibration see the refererences listed below.
Produces a plot of the difference between the average predictive cumulative distribution function (c.d.f.) and the empirical c.d.f. of the observations at each value between the highest and lowest observation of the time series (only for plot=TRUE
).
Returns a list with elements x
and y
, where x
are the threshold values and y
the respective differences of predictive and empirical cumulative distribution function (invisibly for plot=TRUE
).
Philipp Probst and Tobias Liboschik
Christou, V. and Fokianos, K. (2013) On count time series prediction. Journal of Statistical Computation and Simulation (published online), http://dx.doi.org/10.1080/00949655.2013.823612.
Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics 65, 1254–1261, http://dx.doi.org/10.1111/j.1541-0420.2009.01191.x.
Gneiting, T., Balabdaoui, F. and Raftery, A.E. (2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 243–268, http://dx.doi.org/10.1111/j.1467-9868.2007.00587.x.
tsglm
for fitting a GLM for time series of counts.
pit
and scoring
for other predictive model assessment tools.
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) marcal(campyfit)
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) marcal(campyfit)
Weekly number of reported measles infections in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013.
measles
measles
A data frame with variables year
and week
giving the year and calendar week of observation, and with a variable cases
giving the number of reported cases in the respective week.
Robert Koch Institute: SurvStat@RKI, https://survstat.rki.de, accessed on 10th June 2013.
The data are provided with kind permission of the Robert Koch Institute. Further details and terms of usage are given at https://survstat.rki.de. More data reported under the German Infectious Diseases Protection Act is available via the SurvStat@RKI web application linked above.
campy
, ecoli
, ehec
, influenza
in this package, polio
in package gamlss.data
The function allows a probabilistic calibration check with a Probability Integral Transform (PIT) histogram.
## S3 method for class 'tsglm' pit(object, bins=10, ...) ## Default S3 method: pit(response, pred, distr=c("poisson", "nbinom"), distrcoefs, bins=10, ...)
## S3 method for class 'tsglm' pit(object, bins=10, ...) ## Default S3 method: pit(response, pred, distr=c("poisson", "nbinom"), distrcoefs, bins=10, ...)
object |
an object of class |
bins |
number of bins in the histogram. Default value is 10. |
response |
integer vector. Vector of observed values. |
pred |
numeric vector. Vector of predicted values. |
distr |
character giving the conditional distribution. Currently implemented are the Poisson ( |
distrcoefs |
numeric vector of additional coefficients specifying the conditional distribution. For |
... |
additional arguments passed to |
A PIT histogram is a tool for evaluating the statistical consistency between the probabilistic forecast and the observation. The predictive distributions of the observations are compared with the actual observations. If the predictive distribution is ideal the result should be a flat PIT histogram with no bin having an extraordinary high or low level. For more information about PIT histograms see the references listed below.
Philipp Probst and Tobias Liboschik
Christou, V. and Fokianos, K. (2013) On count time series prediction. Journal of Statistical Computation and Simulation (published online), http://dx.doi.org/10.1080/00949655.2013.823612.
Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics 65, 1254–1261, http://dx.doi.org/10.1111/j.1541-0420.2009.01191.x.
Gneiting, T., Balabdaoui, F. and Raftery, A.E. (2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 243–268, http://dx.doi.org/10.1111/j.1467-9868.2007.00587.x.
tsglm
for fitting a GLM for time series of counts.
marcal
and scoring
for other predictive model assessment tools.
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) pit(campyfit)
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) pit(campyfit)
Provides a plot of the test statistics of a test on an intervention in GLM-type count time series (as returned by interv_detect.tsglm
) against time.
## S3 method for class 'interv_detect' plot(x, ...)
## S3 method for class 'interv_detect' plot(x, ...)
x |
an object of class |
... |
additional arguments to be passed to function |
Tobias Liboschik and Philipp Probst
interv_detect
for detecting an intervention effect in GLM-type count time series and tsglm
for fitting such a model.
## Not run: ###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervdetect <- interv_detect(fit=campyfit, taus=80:120, delta=1, external=FALSE) #This example runs about 20 minutes on a single processing unit, #of course depending on its speed. plot(campyfit_intervdetect) ## End(Not run)
## Not run: ###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervdetect <- interv_detect(fit=campyfit, taus=80:120, delta=1, external=FALSE) #This example runs about 20 minutes on a single processing unit, #of course depending on its speed. plot(campyfit_intervdetect) ## End(Not run)
Provides a plot with the intervention effects detected by an iterative procedure (as returned by interv_multiple.tsglm
) and the time series cleaned from these intervention effects.
## S3 method for class 'interv_multiple' plot(x, ...)
## S3 method for class 'interv_multiple' plot(x, ...)
x |
an object of class |
... |
additional arguments to be passed to function |
The vertical red lines indicate where possible interventions were found and the dashed blue line is the time series cleaned from all detected intervention effects.
Tobias Liboschik and Philipp Probst
interv_multiple
for detecting multiple intervention effects in GLM-type count time series and tsglm
for fitting such a model.
## Not run: ###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervmultiple <- interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), external=FALSE, B=2, signif_level=0.05) #runs several hours! plot(campyfit_intervmultiple) ## End(Not run)
## Not run: ###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_intervmultiple <- interv_multiple(fit=campyfit, taus=80:120, deltas=c(0,0.8,1), external=FALSE, B=2, signif_level=0.05) #runs several hours! plot(campyfit_intervmultiple) ## End(Not run)
Produces several diagnostic plots to asses the fit of a GLM-type model for time series of counts.
## S3 method for class 'tsglm' plot(x, ask = TRUE, ...)
## S3 method for class 'tsglm' plot(x, ask = TRUE, ...)
x |
an object of class |
ask |
logical value. If |
... |
further arguments are currently ignored. Only for compatibility with generic function. |
Produces plots of the acf of the Pearson residuals, the Pearson residuals plotted against time, a cumulative periodogramm of the Pearson residuals, a probability integral transform (PIT) histogram (see function pit
) and a marginal calibration plot (see function marcal
). The cumulative periodogramm is plotted with the function cpgram
from package MASS
and is omitted with a warning if this package is not available.
Tobias Liboschik and Philipp Probst
tsglm
for fitting a GLM for time series of counts.
###Campylobacter infections in Canada (see help("campy")) interventions <- interv_covariate(n=length(campy), tau=c(84, 100), delta=c(1, 0)) #detected by Fokianos and Fried (2010, 2012) #Linear link function with Negative Binomial distribution: campyfit <- tsglm(campy, model=list(past_obs=1, past_mean=13), xreg=interventions, dist="nbinom") plot(campyfit)
###Campylobacter infections in Canada (see help("campy")) interventions <- interv_covariate(n=length(campy), tau=c(84, 100), delta=c(1, 0)) #detected by Fokianos and Fried (2010, 2012) #Linear link function with Negative Binomial distribution: campyfit <- tsglm(campy, model=list(past_obs=1, past_mean=13), xreg=interventions, dist="nbinom") plot(campyfit)
Predict future observations based on a fitted GLM-type model for time series of counts.
## S3 method for class 'tsglm' predict(object, n.ahead=1, newobs=NULL, newxreg=NULL, level=0.95, global=FALSE, type=c("quantiles", "shortest", "onesided"), method=c("conddistr", "bootstrap"), B=1000, estim=c("ignore", "bootstrap", "normapprox", "given"), B_estim=B, coefs_given, ...)
## S3 method for class 'tsglm' predict(object, n.ahead=1, newobs=NULL, newxreg=NULL, level=0.95, global=FALSE, type=c("quantiles", "shortest", "onesided"), method=c("conddistr", "bootstrap"), B=1000, estim=c("ignore", "bootstrap", "normapprox", "given"), B_estim=B, coefs_given, ...)
object |
an object of class |
n.ahead |
positive integer value giving the number of steps ahead for which predictions should be made. |
newobs |
integer vector of known future observations of the time series. This argument is only relevant if more than one observation ahead is to be predicted ( |
newxreg |
matrix or vector containing new values for the covariates to be used for prediction. If |
level |
numeric value determining the desired coverage rate of prediction intervals. If |
global |
logical value saying whether the coverage rate for |
type |
character value saying how the prediction interval shall be constructed. If |
method |
character value saying which method to be used for computing the prediction intervals. If |
B |
positive integer value giving the number of samples of a parametric bootstrap to use for numerical determination of prediction intervals (only necessary if argument |
estim |
character value saying how the prediction intervals shall account for the additional uncertainty induced by the parameter estimation. This is particularly important if the model was fitted on a short time series. If |
B_estim |
positive integer value giving the number of parameters used for resampling to account for estimation uncertainty. Only necessary for |
coefs_given |
table with parameters in the rows. Only necessary for |
... |
further arguments are currently ignored. Only for compatibility with generic function. |
Returns predictions for the n.ahead
observations following the fitted time series contained in argument object
. The 1-step-ahead prediction is the conditional expectation of the observation to be predicted given the past. The true parameters are replaced by their estimations given in argument object
. For a 2-step-ahead-prediction the true previous observation is used when given in argument newobs
, otherwise it is replaced by the 1-step-ahead prediction computed before. For a 3-step-prediction this holds for the previous two observations, which are replaced by their respective predictions if not available, and so on.
Unless level=0
, the function also returns prediction intervals. Read the description of the arguments type
andmethod
for further details on the computation. Note that the prediction intervals do not reflect the additional uncertainty induced by the parameter estimation. However, for sufficiently long time series used for model fitting, it is expected that this uncertainty is negligible compared to the uncertainty of the predictive distribution. The argument estim
allows to account fot this additional estimation uncertainty if method="bootstrap"
, see the description of this argument.
If prediction intervals are computed the function additionally returns the median of the predictive distribution. If method="conddistr"
this is the analytical median of the conditional distribution, otherwise the empirical median of the simulated distribution.
A list with at least the following element:
pred |
a numeric vector of the predictions. Has class |
If prediction intervals are calculated, the list has the additional element:
interval |
a matrix with the columns |
level |
a numeric value determining the desired coverage rate of prediction intervals. |
global |
a logical value saying whether the coverage rate |
type |
a character value saying how the prediction intervals were computed. Possible values are |
method |
a character value saying which method were used for computation of prediction intervals. Possible values are |
B |
an integer value giving the number of bootstrap samples which were used for computing prediction intervals. Is |
estim |
a character value saying how the prediction intervals account for estimation uncertainty of the model parameters. Possible values are |
B_estim |
an integer value giving the number of parameter values used for resampling to account for estimation uncertainty. This value is zero if the estimation uncertainty is ignored. |
warning_messages |
a character vector containing warning messages. This should be |
median |
a vector giving the median of the predictive distribution for each of the future time points. Has class |
futureobs |
a matrix ( |
Tobias Liboschik and Philipp Probst
Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.
tsglm
for fitting a GLM for time series of counts.
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) predict(campyfit, n.ahead=1) #prediction interval using conditional distribution predict(campyfit, n.ahead=5, global=TRUE) #prediction intervals using parametric bootstrap
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) predict(campyfit, n.ahead=1) #prediction interval using conditional distribution predict(campyfit, n.ahead=5, global=TRUE) #prediction intervals using parametric bootstrap
The function computes the Quasi Information Criterion (QIC) of a generalised linear model for time series of counts.
## S3 method for class 'tsglm' QIC(object, ...)
## S3 method for class 'tsglm' QIC(object, ...)
object |
an object of class |
... |
additional arguments passed to |
The quasi information criterion (QIC) has been proposed by Pan (2001) as alternative to Akaike's information criterion (AIC) which is properly adjusted for regression analysis based on the generalized estimating equations (GEE).
This function computes the QIC of a generalised linear model for time series of counts. In case of models with the Poisson distribution the QIC has approximately the same value as the AIC. However, in case of models with another distribution it can be a more adequate alternative to the AIC.
Tobias Liboschik
Pan, W. (2001) Akaike's Information Criterion in Generalized Estimating Equations. Biometrics 57, 120–125, http://dx.doi.org/10.1111/j.0006-341X.2001.00120.x.
tsglm
for fitting a GLM for time series of counts.
AIC
and BIC
for other information criteria.
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13)), distr="nbinom") QIC(campyfit) AIC(campyfit)
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13)), distr="nbinom") QIC(campyfit) AIC(campyfit)
Returns the residuals of a fitted GLM-type model for time series of counts.
## S3 method for class 'tsglm' residuals(object, type = c("response", "pearson", "anscombe"), ...)
## S3 method for class 'tsglm' residuals(object, type = c("response", "pearson", "anscombe"), ...)
object |
an object of class |
type |
character value giving the type of residuals which should be returned. Choose |
... |
further arguments are currently ignored. Only for compatibility with generic function. |
Computes a vector with the respective residuals of the fit given in argument object
.
Numerical vector of the residuals.
Tobias Liboschik and Philipp Probst
tsglm
for fitting a GLM for time series of counts.
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_resid <- residuals(campyfit, type="pearson") plot(campyfit_resid) acf(campyfit_resid)
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) campyfit_resid <- residuals(campyfit, type="pearson") plot(campyfit_resid) acf(campyfit_resid)
Computes scores for the assessment of sharpness of a fitted model for time series of counts.
## S3 method for class 'tsglm' scoring(object, individual=FALSE, cutoff=1000, ...) ## Default S3 method: scoring(response, pred, distr=c("poisson", "nbinom"), distrcoefs, individual=FALSE, cutoff=1000, ...)
## S3 method for class 'tsglm' scoring(object, individual=FALSE, cutoff=1000, ...) ## Default S3 method: scoring(response, pred, distr=c("poisson", "nbinom"), distrcoefs, individual=FALSE, cutoff=1000, ...)
object |
an object of class |
individual |
logical. If |
cutoff |
positive integer. Summation over the infinite sample space {0,1,2,...} of a distribution is cut off at this value. This affects the quadratic, spherical and ranked probability score. |
response |
integer vector. Vector of observed values |
pred |
numeric vector. Vector of predicted values |
distr |
character giving the conditional distribution. Currently implemented are the Poisson ( |
distrcoefs |
numeric vector of additional coefficients specifying the conditional distribution. For |
... |
further arguments are currently ignored. Only for compatibility with generic function. |
The scoring rules are penalties that should be minimised for a better forecast, so a smaller scoring value means better sharpness.
Different competing forecast models can be ranked via these scoring rules.
They are computed as follows:
For each score and time
the value
is computed, where
is the predictive
c.d.f. and
is the observation at time
. To obtain the overall score for one model the average of the score of all observations
is calculated.
For all , let
be the density function of the predictive distribution at
and
be a quadratic sum over the whole sample space
of the predictive distribution.
and
are the mean and the standard deviation of the predictive distribution, respectively.
Then the scores are defined as follows:
Logarithmic score:
Quadratic or Brier score:
Spherical score:
Ranked probability score: (sum over the whole sample space
)
Dawid-Sebastiani score:
Normalized squared error score:
Squared error score:
For more information on scoring rules see the references listed below.
Returns a named vector of the mean scores (if argument individual=FALSE
, the default) or a data frame of the individual scores for each observation (if argument individual=TRUE
). The scoring rules are named as follows:
logarithmic |
Logarithmic score |
quadratic |
Quadratic or Brier score |
spherical |
Spherical score |
rankprob |
Ranked probability score |
dawseb |
Dawid-Sebastiani score |
normsq |
Normalized squared error score |
sqerror |
Squared error score |
Philipp Probst and Tobias Liboschik
Christou, V. and Fokianos, K. (2013) On count time series prediction. Journal of Statistical Computation and Simulation (published online), http://dx.doi.org/10.1080/00949655.2013.823612.
Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics 65, 1254–1261, http://dx.doi.org/10.1111/j.1541-0420.2009.01191.x.
Gneiting, T., Balabdaoui, F. and Raftery, A.E. (2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 243–268, http://dx.doi.org/10.1111/j.1467-9868.2007.00587.x.
tsglm
for fitting a GLM for time series of counts.
pit
and marcal
for other predictive model assessment tools.
permutationTest
in package surveillance
for the Monte Carlo permutation test for paired individual scores by Paul and Held (2011, Statistics in Medicine 30, 1118–1136, http://dx.doi.org/10.1002/sim.4177).
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) scoring(campyfit)
###Campylobacter infections in Canada (see help("campy")) campyfit <- tsglm(ts=campy, model=list(past_obs=1, past_mean=c(7,13))) scoring(campyfit)
Computes the standard errors for the parameters of a fitted GLM-type model for time series of counts.
## S3 method for class 'tsglm' se(object, B, parallel, level=0.95, ...)
## S3 method for class 'tsglm' se(object, B, parallel, level=0.95, ...)
object |
an object of class |
B |
positive integer value giving the number of bootstrap samples to use for estimation of the standard errors. If missing the standard errors are based on a normal approximation. |
parallel |
logical value. If |
level |
numeric value determining the desired coverage rate of confidence intervals. |
... |
additional arguments to be passed to the fitting function |
By default the standard errors and confidence intervals are based on a normal approximation of the (quasi) maximum likelihood estimator. The standard errors are the square roots of the diagonal elements of the inverse of the information matrix. Because there is no analytical approximation of the standard error for the overdispersion coefficient sigmasq
, its standard error and its confidence interval are set to NA
.
If the number of bootstrap samples B
is given, the standard errors and condidence intervals are computed by a parametric bootstrap. The standard errors are the empirical standard deviation of the parameter estimations of B
random samples drawn from the fitted model given in argument object
. The confidence intervals are the a
- and (1-a)
-quantile of this bootstrap sample with a=(1-level)/2
.
A list with the following components:
est |
a vector of the maximum likelihood estimated coefficients. |
se |
a vector of the standard errors of each estimated coefficient. |
ci |
a matrix with the columns |
level |
numerical value giving the coverage rate of the confidence intervals. |
type |
a character value |
If the standard errors are computed by a parametric bootstrap procedure, the following component is additionally returned:
B |
positive integer value giving the number of bootstrap samples used for estimation of the standard errors. |
Tobias Liboschik and Philipp Probst
Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.
tsglm
for fitting a GLM for time series of counts.
###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") se(seatbeltsfit) #by normal approximation ## Not run: system.time(stderror <- se(seatbeltsfit, B=100)) #by bootstrap stderror #This estimation of bootstrap standard errors takes several minutes on a single #processing unit, of course depending on its speed. #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) system.time(stderror <- se(seatbeltsfit, B=100, parallel=TRUE)) ## End(Not run)
###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") se(seatbeltsfit) #by normal approximation ## Not run: system.time(stderror <- se(seatbeltsfit, B=100)) #by bootstrap stderror #This estimation of bootstrap standard errors takes several minutes on a single #processing unit, of course depending on its speed. #Parallel computation for shorter run time on a cluster: library(parallel) ntasks <- 3 clust <- makeCluster(ntasks) setDefaultCluster(cl=clust) system.time(stderror <- se(seatbeltsfit, B=100, parallel=TRUE)) ## End(Not run)
summary
method for class "tsglm"
.
## S3 method for class 'tsglm' summary(object, B, parallel=FALSE, level=0.95, ...)
## S3 method for class 'tsglm' summary(object, B, parallel=FALSE, level=0.95, ...)
object |
an object of class |
B |
controls the computation of standard errors. Is passed to |
parallel |
controls the computation of standard errors. Is passed to |
level |
controls the computation of conficence intervals. Is passed to |
... |
further arguments are currently ignored. Only for compatibility with generic function. |
Computes and returns a list of summary statistics of the fitted model given in argument object
.
A named list with the following elements:
call |
see |
link |
see |
distr |
see |
residuals |
see |
coefficients |
data frame with estimated parameters, their standard errors and confidence intervals (based on a normal approximation or a parametric bootstrap, see |
level |
numerical value giving the coverage rate of the confidence intervals. |
number.coef |
number of coefficients. |
se.type |
type of standard errors, see |
se.bootstrapsamples |
number of bootstrap samples used for estimation of the standard errors, see |
logLik |
value of the log-likelihood function evaluated at the (quasi) maximum likelihood estimate. |
AIC |
Akaike's information criterion (AIC), see |
BIC |
Bayesian information criterion (BIC), see |
QIC |
Quasi information criterion (QIC), see |
pearson.resid |
Pearson residuals, see |
Tobias Liboschik and Philipp Probst
S3 method print
.
tsglm
for fitting a GLM for time series of counts.
###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") summary(seatbeltsfit)
###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") summary(seatbeltsfit)
The function tsglm
fits a generalised linear model (GLM) for time series of counts.
The specification of the linear predictor allows for regressing on past observations, past values of the linear predictor and covariates as defined in the Details section.
There is the so-called INGARCH model with the identity link (see for example Ferland et al., 2006, Fokianos et al., 2009) and another model with the logarithmic link (see for example Fokianos and Tjostheim, 2011), which also differ in the specification of the linear predictor.
The conditional distribution can be chosen to be either Poisson or negative binomial.
Estimation is done by conditional maximum likelihood for the Poisson distribution or by a conditional quasi-likelihood approach based on the Poisson likelihood function for the negative binomial distribution.
There is a vignette available which introduces the functionality of tsglm
and related functions of this package and its underlying statistical methods (vignette("tsglm", package="tscount")
).
The function tsglm.meanfit
is a lower level function to fit the mean specification of such a model assuming a Poisson distribution. It is called by tsglm
. It has additional arguments allowing for a finer control of the fitting procedure, which can be handed over from the function tsglm
by its ...
argument. Note that it is usually not necessary for a user to call this lower level functions nor to worry about the additional arguments provided by this function. The defaults of these arguments have been chosen wisely by the authors of this package and should perform well in most applications.
tsglm(ts, model = list(past_obs = NULL, past_mean = NULL, external = NULL), xreg = NULL, link = c("identity", "log"), distr = c("poisson", "nbinom"), ...) tsglm.meanfit(ts, model, xreg, link, score = TRUE, info = c("score", "none", "hessian", "sandwich"), init.method=c("marginal", "iid", "firstobs", "zero"), init.drop = FALSE, epsilon = 1e-06, slackvar = 1e-06, start.control = list(), final.control = list(), inter.control = NULL)
tsglm(ts, model = list(past_obs = NULL, past_mean = NULL, external = NULL), xreg = NULL, link = c("identity", "log"), distr = c("poisson", "nbinom"), ...) tsglm.meanfit(ts, model, xreg, link, score = TRUE, info = c("score", "none", "hessian", "sandwich"), init.method=c("marginal", "iid", "firstobs", "zero"), init.drop = FALSE, epsilon = 1e-06, slackvar = 1e-06, start.control = list(), final.control = list(), inter.control = NULL)
ts |
a univariate time series. |
model |
a named list specifying the model for the linear predictor, which can be of the following elements:
|
xreg |
matrix with covariates in the columns, i.e. its number of rows must be |
link |
character giving the link function. Default is |
distr |
character giving the conditional distribution. Default is |
... |
additional arguments to be passed to the lower level fitting function |
score |
logical value indicating whether the score vector should be computed. |
info |
character that determines if and how to compute the information matrix. Can be set to |
init.method |
character that determines how the recursion of the conditional mean (and possibly of its derivatives) is initialised. If set to |
init.drop |
logical value that determines which observations are considered for computation of the log-likelihood, the score vector and, if applicable, the information matrix. If |
epsilon |
numeric positive but small value determining how close the parameters may come to the limits of the parameter space. |
slackvar |
numeric positive but small value determining how true inequalities among the parameter restrictions are treated; a true inequality |
start.control |
named list with optional elements that determine how to make the start estimation. Possible list elements are:
|
final.control |
named list with optional elements that determine how to make the final maximum likelihood estimation. If
|
inter.control |
named list determining how to maximise the log-likelihood function in a first step. This intermediate optimisation will start from the start estimation and be followed by the final optimisation, which will in turn start from the intermediate optimisation result. This intermediate optimisation is intended to use a very quick but imprecise optimisation algorithm. Possible elements are the same as for |
The INGARCH model (argument link="identity"
) used here follows the definition
where denotes the history of the process up to time
,
and
is the Poisson respectively the negative binomial distribution with the parametrisation as specified below.
For the model with covariates having an internal effect (the default) the linear predictor of the INGARCH model (which is in that case identical to the conditional mean) is given by
The log-linear model (argument link="log"
) used here follows the definition
with and
as above.
For the model with covariates having an internal effect (the default) the linear predictor
of the log-linear model is given by
Note that because of the logarithmic link function the effect of single summands in the linear predictor on the conditional mean is multiplicative and hence the parameters play a different role than in the INGARCH model, although they are denoted by the same letters.
The Poisson distribution is parametrised by the mean lambda
according to the definition in Poisson
.
The negative binomial distribution is parametrised by the mean mu
with an additional dispersion parameter size
according to the definition in NegBinomial
. In the notation above its mean parameter mu
is and its dispersion parameter
size
is .
This function allows to include covariates in two different ways. A covariate can have a so-called internal effect as defined above, where its effect propagates via the regression on past values of the linear predictor and on past observations. Alternatively, it can have a so-called external effect, where its effect does not directly propagates via the feedback on past values of the linear predictor, but only via past observations. For external effects of the covariates, the linear predictor for the model with identity link is given by
and analoguesly for the model with logarithmic link by
This is described in more detail by Liboschik et al. (2014) for the case of deterministic covariates for modelling interventions.
It is also possible to model a combination of external and internal covariates, which can be defined straightforwardly by adding each covariate either to the linear predictor itself (for an internal effect) or to
defined above (for an external effect).
An object of class "tsglm"
, which is a list with at least the following elements:
coefficients |
a named vector of the maximum likelihood estimated coefficients, which can be extracted by the |
start |
a named vector of the start estimation for the coefficients. |
residuals |
a vector of residuals, which can be extracted by the |
fitted.values |
the fitted values, which can be extracted by the |
linear.predictors |
the linear fit on link scale. |
response |
a vector of the response values (this is usually the original time series but possibly without the first few observations used for initialization if argument |
logLik |
the log-likelihood of the fitted model, which can be extracted by the |
score |
the score vector at the maximum likelihood estimation. |
info.matrix |
the information matrix at the maximum likelihood estimation assuming a Poisson distribution. |
info.matrix_corrected |
the information matrix at the maximum likelihood estimation assuming the distribution specified in |
call |
the matched call. |
n_obs |
the number of observations. |
n_eff |
the effective number of observations used for maximum likelihood estimation (might be lower than |
ts |
the original time series. |
model |
the model specification. |
xreg |
the given covariates. |
distr |
a character giving the fitted conditional distribution. |
distrcoefs |
a named vector of the estimated additional coefficients specifying the conditional distribution. Is |
sigmasq |
the estimated overdispersion coefficient. Is zero in case of a Poisson distribution. |
The function tsglm.meanfit
has the same output except the elements distr
, distrcoefs
and sigmasq
. In addition, they return the following list elements:
inter |
some details on the intermediate estimation of the coefficients as returned by |
final |
some details on the final estimation of the coefficients as returned by |
durations |
named vector of the durations of the model fit (in seconds). |
outerscoreprod |
array of outer products of score vectors at each time point. |
Tobias Liboschik, Philipp Probst, Konstantinos Fokianos and Roland Fried
Christou, V. and Fokianos, K. (2014) Quasi-likelihood inference for negative binomial time series models. Journal of Time Series Analysis 35(1), 55–78, http://dx.doi.org/10.1002/jtsa.12050.
Christou, V. and Fokianos, K. (2015) Estimation and testing linearity for non-linear mixed poisson autoregressions. Electronic Journal of Statistics 9, 1357–1377, http://dx.doi.org/10.1214/15-EJS1044.
Ferland, R., Latour, A. and Oraichi, D. (2006) Integer-valued GARCH process. Journal of Time Series Analysis 27(6), 923–942, http://dx.doi.org/10.1111/j.1467-9892.2006.00496.x.
Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.
Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.
Fokianos, K., Rahbek, A. and Tjostheim, D. (2009) Poisson autoregression. Journal of the American Statistical Association 104(488), 1430–1439, http://dx.doi.org/10.1198/jasa.2009.tm08270.
Fokianos, K. and Tjostheim, D. (2011) Log-linear Poisson autoregression. Journal of Multivariate Analysis 102(3), 563–578, http://dx.doi.org/10.1016/j.jmva.2010.11.002.
Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.
S3 methods print
, summary
, residuals
, plot
, fitted
, coef
, predict
, logLik
, vcov
, AIC
, BIC
and QIC
for the class "tsglm"
.
The S3 method se
computes the standard errors of the parameter estimates.
Additionally, there are the S3 methods pit
, marcal
and scoring
for predictive model assessment.
S3 methods interv_test
, interv_detect
and interv_multiple
for tests and detection procedures for intervention effects.
tsglm.sim
for simulation from GLM-type model for time series of counts. ingarch.mean
, ingarch.var
and ingarch.acf
for calculation of analytical mean, variance and autocorrelation function of an INGARCH model (i.e. with identity link) without covariates.
Example time series of counts are campy
, ecoli
, ehec
, influenza
, measles
in this package, polio
in package gamlss.data
.
###Campylobacter infections in Canada (see help("campy")) interventions <- interv_covariate(n=length(campy), tau=c(84, 100), delta=c(1, 0)) #detected by Fokianos and Fried (2010, 2012) #Linear link function with Negative Binomial distribution: campyfit <- tsglm(campy, model=list(past_obs=1, past_mean=13), xreg=interventions, distr="nbinom") campyfit plot(campyfit) ###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") summary(seatbeltsfit)
###Campylobacter infections in Canada (see help("campy")) interventions <- interv_covariate(n=length(campy), tau=c(84, 100), delta=c(1, 0)) #detected by Fokianos and Fried (2010, 2012) #Linear link function with Negative Binomial distribution: campyfit <- tsglm(campy, model=list(past_obs=1, past_mean=13), xreg=interventions, distr="nbinom") campyfit plot(campyfit) ###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") summary(seatbeltsfit)
Generates a simulated time series from a GLM-type model for time series of counts (see tsglm
for details).
tsglm.sim(n, param = list(intercept = 1, past_obs = NULL, past_mean = NULL, xreg = NULL), model = list(past_obs = NULL, past_mean = NULL, external = FALSE), xreg = NULL, link = c("identity", "log"), distr = c("poisson", "nbinom"), distrcoefs, fit, n_start = 50)
tsglm.sim(n, param = list(intercept = 1, past_obs = NULL, past_mean = NULL, xreg = NULL), model = list(past_obs = NULL, past_mean = NULL, external = FALSE), xreg = NULL, link = c("identity", "log"), distr = c("poisson", "nbinom"), distrcoefs, fit, n_start = 50)
n |
integer value giving the number of observations to be simulated. |
param |
a named list giving the parameters for the linear predictor of the model, which has the following elements:
|
model |
a named list specifying the model for the linear predictor, which has the elements |
xreg |
matrix with covariates in the columns (see |
link |
character giving the link function. Default is |
distr |
character giving the conditional distribution. Default is |
distrcoefs |
numeric vector of additional coefficients specifying the conditional distribution. For |
fit |
an object of class |
n_start |
number of observations used as a burn-in. |
The definition of the model used here is like in function tsglm
.
Note that during the burn-in period covariates are set to zero.
If a previous model fit is given in argument fit
and the length of the burn-in period n_start
is set to zero, then the a continuation of the original time series is simulated.
A list with the following components:
ts |
an object of class |
linear.predictors |
an object of class |
xreg.effects |
an object of class |
Tobias Liboschik and Philipp Probst
Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.
tsglm
for fitting a GLM for time series of counts.
#Simulate from an INGARCH model with two interventions: interventions <- interv_covariate(n=200, tau=c(50, 150), delta=c(1, 0.8)) model <- list(past_obs=1, past_mean=c(1, 7), external=FALSE) param <- list(intercept=2, past_obs=0.3, past_mean=c(0.2, 0.1), xreg=c(3, 10)) tsglm.sim(n=200, param=param, model=model, xreg=interventions, link="identity", distr="nbinom", distrcoefs=c(size=1))
#Simulate from an INGARCH model with two interventions: interventions <- interv_covariate(n=200, tau=c(50, 150), delta=c(1, 0.8)) model <- list(past_obs=1, past_mean=c(1, 7), external=FALSE) param <- list(intercept=2, past_obs=0.3, past_mean=c(0.2, 0.1), xreg=c(3, 10)) tsglm.sim(n=200, param=param, model=model, xreg=interventions, link="identity", distr="nbinom", distrcoefs=c(size=1))