Package 'surveillance'

Title: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena
Description: Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
Authors: Michael Hoehle [aut, ths] , Sebastian Meyer [aut, cre] , Michaela Paul [aut], Leonhard Held [ctb, ths] , Howard Burkom [ctb], Thais Correa [ctb], Mathias Hofmann [ctb], Christian Lang [ctb], Juliane Manitz [ctb], Sophie Reichert [ctb], Andrea Riebler [ctb], Daniel Sabanes Bove [ctb], Maelle Salmon [ctb], Dirk Schumacher [ctb], Stefan Steiner [ctb], Mikko Virtanen [ctb], Wei Wei [ctb], Valentin Wimmer [ctb], R Core Team [ctb] (02zz1nj61, src/ks.c and a few code fragments of standard S3 methods)
Maintainer: Sebastian Meyer <[email protected]>
License: GPL-2
Version: 1.24.1.9000
Built: 2025-02-08 06:25:36 UTC
Source: https://github.com/r-forge/surveillance

Help Index


surveillance: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Description

The R package surveillance implements statistical methods for the retrospective modeling and prospective monitoring of epidemic phenomena in temporal and spatio-temporal contexts. Focus is on (routinely collected) public health surveillance data, but the methods just as well apply to data from environmetrics, econometrics or the social sciences. As many of the monitoring methods rely on statistical process control methodology, the package is also relevant to quality control and reliability engineering.

Details

The package implements many typical outbreak detection procedures such as Stroup et al. (1989), Farrington et al. (1996), Rossi et al. (1999), Rogerson and Yamada (2001), a Bayesian approach (Höhle, 2007), negative binomial CUSUM methods (Höhle and Mazick, 2009), and a detector based on generalized likelihood ratios (Höhle and Paul, 2008), see wrap.algo. Also CUSUMs for the prospective change-point detection in binomial, beta-binomial and multinomial time series are covered based on generalized linear modeling, see categoricalCUSUM. This includes, e.g., paired comparison Bradley-Terry modeling described in Höhle (2010), or paired binary CUSUM (pairedbinCUSUM) described by Steiner et al. (1999). The package contains several real-world datasets, the ability to simulate outbreak data, visualize the results of the monitoring in temporal, spatial or spatio-temporal fashion. In dealing with time series data, the fundamental data structure of the package is the S4 class sts wrapping observations, monitoring results and date handling for multivariate time series. A recent overview of the available monitoring procedures is given by Salmon et al. (2016).

For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. The function hhh4 offers inference methods for the (multivariate) count time series models of Held et al. (2005), Paul et al. (2008), Paul and Held (2011), Held and Paul (2012), and Meyer and Held (2014). See vignette("hhh4") for a general introduction and vignette("hhh4_spacetime") for a discussion and illustration of spatial hhh4 models. Self-exciting point processes are modeled through endemic-epidemic conditional intensity functions. twinSIR (Höhle, 2009) models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks; see vignette("twinSIR") for an illustration. twinstim (Meyer et al., 2012) fits spatio-temporal point process models to point patterns of infective events, e.g., time-stamped geo-referenced surveillance data on infectious disease occurrence; see vignette("twinstim") for an illustration. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017).

Acknowledgements

Substantial contributions of code by: Leonhard Held, Howard Burkom, Thais Correa, Mathias Hofmann, Christian Lang, Juliane Manitz, Sophie Reichert, Andrea Riebler, Daniel Sabanes Bove, Maelle Salmon, Dirk Schumacher, Stefan Steiner, Mikko Virtanen, Wei Wei, Valentin Wimmer.

Furthermore, the authors would like to thank the following people for ideas, discussions, testing and feedback: Doris Altmann, Johannes Bracher, Caterina De Bacco, Johannes Dreesman, Johannes Elias, Marc Geilhufe, Jim Hester, Kurt Hornik, Mayeul Kauffmann, Junyi Lu, Lore Merdrignac, Tim Pollington, Marcos Prates, André Victor Ribeiro Amaral, Brian D. Ripley, François Rousseu, Barry Rowlingson, Christopher W. Ryan, Klaus Stark, Yann Le Strat, André Michael Toschke, Wei Wei, George Wood, Achim Zeileis, Bing Zhang.

Author(s)

Michael Hoehle, Sebastian Meyer, Michaela Paul

Maintainer: Sebastian Meyer [email protected]

References

citation(package="surveillance") gives the two main software references for the modeling (Meyer et al., 2017) and the monitoring (Salmon et al., 2016) functionalities:

  • Meyer S, Held L, Höhle M (2017). “Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance.” Journal of Statistical Software, 77(11), 1–55. doi:10.18637/jss.v077.i11.

  • Salmon M, Schumacher D, Höhle M (2016). “Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance.” Journal of Statistical Software, 70(10), 1–35. doi:10.18637/jss.v070.i10.

Further references are listed in surveillance:::REFERENCES.

If you use the surveillance package in your own work, please do cite the corresponding publications.

See Also

https://surveillance.R-forge.R-project.org/

Examples

## Additional documentation and illustrations of the methods are
## available in the form of package vignettes and demo scripts:
vignette(package = "surveillance")
demo(package = "surveillance")

Abattoir Data

Description

A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.

Usage

data(abattoir)

Details

The object of class "sts" contains an artificial data set inspired by meat inspection data used by Danish Pig Production, Denmark. For each week the number of pigs with positive audit reports is recorded together with the total number of audits made that week.

References

Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.

See Also

categoricalCUSUM

Examples

data("abattoir")
plot(abattoir)
population(abattoir)

Formatted Time Axis for "sts" Objects

Description

Add a nicely formatted x-axis to time series plots related to the "sts" class. This utility function is, e.g., used by stsplot_time1 and plotHHH4_fitted1.

Usage

addFormattedXAxis(x, epochsAsDate = FALSE,
                  xaxis.tickFreq = list("%Q"=atChange),
                  xaxis.labelFreq = xaxis.tickFreq,
                  xaxis.labelFormat = "%G\n\n%OQ",
                  ...)

Arguments

x

an object of class "sts".

epochsAsDate

a logical indicating if the old (FALSE) or the new (TRUE) and more flexible implementation should be used. The xaxis.* arguments are only relevant for the new implementation epochsAsDate = TRUE.

xaxis.labelFormat, xaxis.tickFreq, xaxis.labelFreq

see the details below.

...

further arguments passed to axis.

Details

The setting epochsAsDate = TRUE enables very flexible formatting of the x-axis and its annotations using the xaxis.tickFreq, xaxis.labelFreq and xaxis.labelFormat arguments. The first two are named lists containing pairs with the name being a strftime single conversion specification and the second part is a function which based on this conversion returns a subset of the rows in the sts objects. The subsetting function has the following header: function(x,xm1), where x is a vector containing the result of applying the conversion in name to the epochs of the sts object and xm1 is the scalar result when applying the conversion to the natural element just before the first epoch. Please note that the input to the subsetting function is converted using as.numeric before calling the function. Hence, the conversion specification needs to result in a string convertible to integer.

Three predefined subsetting functions exist: atChange, at2ndChange and atMedian, which are used to make a tick at each (each 2nd for at2ndChange) change and at the median index computed on all having the same value, respectively:

    atChange <- function(x,xm1) which(diff(c(xm1,x)) != 0)
    at2ndChange <- function(x,xm1) which(diff(c(xm1,x) %/% 2) != 0)
    atMedian <- function(x,xm1) tapply(seq_along(x), INDEX=x, quantile, prob=0.5, type=3)
  

By defining own functions here, one can obtain an arbitrary degree of flexibility.

Finally, xaxis.labelFormat is a strftime compatible formatting string., e.g. the default value is "%G\n\n%OQ", which means ISO year and quarter (in roman letters) stacked on top of each other.

Value

NULL (invisibly). The function is called for its side effects.

Author(s)

Michael Höhle with contributions by Sebastian Meyer

See Also

the examples in stsplot_time1 and plotHHH4_fitted1


Add Harmonics to an Existing Formula

Description

This function helps to construct a formula object that can be used in a call to hhh4 to model seasonal variation via a sum of sine and cosine terms.

Usage

addSeason2formula(f = ~1, S = 1, period = 52, timevar = "t")

Arguments

f

formula that the seasonal terms should be added to, defaults to an intercept ~1.

S

number of sine and cosine terms. If S is a vector, unit-specific seasonal terms are created.

period

period of the season, defaults to 52 for weekly data.

timevar

the time variable in the model. Defaults to "t".

Details

The function adds the seasonal terms

sin(s2πtimevar/period),  cos(s2πtimevar/period),\sin( s \cdot 2\pi \cdot \code{timevar}/\code{period} ),\; \cos( s \cdot 2\pi \cdot \code{timevar}/\code{period} ),

for s=1,,Ss = 1,\dots,\code{S} to an existing formula f.

Note the following equivalence when interpreting the coefficients of the seasonal terms:

γsin(ωt)+δcos(ωt)=Asin(ωt+ϵ)\gamma \sin(\omega t) + \delta \cos(\omega t) = A \sin(\omega t + \epsilon)

with amplitude A=γ2+δ2A = \sqrt{\gamma^2 + \delta^2} and phase shift ϵ=arctan(δ/γ)\epsilon = \arctan(\delta / \gamma). The amplitude and phase shift can be obtained from a fitted hhh4 model via coef(..., amplitudeShift = TRUE), see coef.hhh4.

Value

Returns a formula with the seasonal terms added and its environment set to .GlobalEnv. Note that to use the resulting formula in hhh4, a time variable named as specified by the argument timevar must be available.

Author(s)

M. Paul, with contributions by S. Meyer

See Also

hhh4, fe, ri

Examples

# add 2 sine/cosine terms to a model with intercept and linear trend
addSeason2formula(f = ~ 1 + t, S = 2)

# the same for monthly data
addSeason2formula(f = ~ 1 + t, S = 2, period = 12)

# different number of seasons for a bivariate time series
addSeason2formula(f = ~ 1, S = c(3, 1), period = 52)

Aggregate an "sts" Object Over Time or Across Units

Description

Aggregate the matrix slots of an "sts" object. Either the time series is aggregated so a new sampling frequency of nfreq observations per year is obtained (i.e., as in aggregate.ts), or the aggregation is over all columns (units).

Usage

## S4 method for signature 'sts'
aggregate(x, by = "time", nfreq = "all", ...)

Arguments

x

an object of class "sts".

by

a string being either "time" or "unit".

nfreq

new sampling frequency for by="time". If nfreq="all" then all time points are summed.

...

unused (argument of the generic).

Value

an object of class "sts".

Warning

Aggregation over units fills the upperbound slot with NAs and the map slot is left as-is, but the object cannot be plotted by unit any longer.

The populationFrac slot is aggregated just like observed. Population fractions are recomputed if and only if x is no multinomialTS and already contains population fractions. This might not be intended, especially for aggregation over time.

Examples

data("ha.sts")
dim(ha.sts)
dim(aggregate(ha.sts, by = "unit"))
dim(aggregate(ha.sts, nfreq = 13))

The Bayes System

Description

Evaluation of timepoints with the Bayes subsystem 1, 2, 3 or a self defined Bayes subsystem.

Usage

algo.bayesLatestTimepoint(disProgObj, timePoint = NULL,
       control = list(b = 0, w = 6, actY = TRUE,alpha=0.05))
  algo.bayes(disProgObj, control = list(range = range,
       b = 0, w = 6, actY = TRUE,alpha=0.05))
  algo.bayes1(disProgObj, control = list(range = range))
  algo.bayes2(disProgObj, control = list(range = range))
  algo.bayes3(disProgObj, control = list(range = range))

Arguments

disProgObj

object of class disProg (including the observed and the state chain)

timePoint

time point which should be evaluated in algo.bayes LatestTimepoint. The default is to use the latest timepoint

control

control object: range determines the desired timepoints which should be evaluated, b describes the number of years to go back for the reference values, w is the half window width for the reference values around the appropriate timepoint and actY is a boolean to decide if the year of timePoint also contributes w reference values. The parameter alpha is the (1α)(1-\alpha)-quantile to use in order to calculate the upper threshold. As default b, w, actY are set for the Bayes 1 system with alpha=0.05.

Details

Using the reference values the (1α)100%(1-\alpha)\cdot 100\% quantile of the predictive posterior distribution is calculated as a threshold. An alarm is given if the actual value is bigger or equal than this threshold. It is possible to show using analytical computations that the predictive posterior in this case is the negative binomial distribution. Note: algo.rki or algo.farrington use two-sided prediction intervals – if one wants to compare with these procedures it is necessary to use an alpha, which is half the one used for these procedures.

Note also that algo.bayes calls algo.bayesLatestTimepoint for the values specified in range and for the system specified in control. algo.bayes1, algo.bayes2, algo.bayes3 call algo.bayesLatestTimepoint for the values specified in range for the Bayes 1 system, Bayes 2 system or Bayes 3 system.

  • "Bayes 1" reference values from 6 weeks. Alpha is fixed a t 0.05.

  • "Bayes 2" reference values from 6 weeks ago and 13 weeks of the previous year (symmetrical around the same week as the current one in the previous year). Alpha is fixed at 0.05.

  • "Bayes 3" 18 reference values. 9 from the year ago and 9 from two years ago (also symmetrical around the comparable week). Alpha is fixed at 0.05.

The procedure is now able to handle NA's in the reference values. In the summation and when counting the number of observed reference values these are simply not counted.

Value

survRes

algo.bayesLatestTimepoint returns a list of class survRes (surveillance result), which includes the alarm value for recognizing an outbreak (1 for alarm, 0 for no alarm), the threshold value for recognizing the alarm and the input object of class disProg. algo.bayes gives a list of class survRes which includes the vector of alarm values for every timepoint in range and the vector of threshold values for every timepoint in range for the system specified by b, w and actY, the range and the input object of class disProg. algo.bayes1 returns the same for the Bayes 1 system, algo.bayes2 for the Bayes 2 system and algo.bayes3 for the Bayes 3 system.

Author(s)

M. Höhle, A. Riebler, C. Lang

Source

Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor's thesis.

See Also

algo.call, algo.rkiLatestTimepoint and algo.rki for the RKI system.

Examples

disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1,
                                    alpha = 1, beta = 0, phi = 0,
                                    frequency = 1, state = NULL, K = 1.7)

    # Test for bayes 1 the latest timepoint
    algo.bayesLatestTimepoint(disProg)

    # Test week 200 to 208 for outbreaks with a selfdefined bayes
    algo.bayes(disProg, control = list(range = 200:208, b = 1,
                                                w = 5, actY = TRUE,alpha=0.05))
    # The same for bayes 1 to bayes 3
    algo.bayes1(disProg, control = list(range = 200:208,alpha=0.05))
    algo.bayes2(disProg, control = list(range = 200:208,alpha=0.05))
    algo.bayes3(disProg, control = list(range = 200:208,alpha=0.05))

Query Transmission to Specified Surveillance Algorithm

Description

Transmission of a object of class disProg to the specified surveillance algorithm.

Usage

algo.call(disProgObj, control = list(
                     list(funcName = "rki1", range = range),
                     list(funcName = "rki", range = range,
                          b = 2, w = 4, actY = TRUE),
                     list(funcName = "rki", range = range,
                          b = 2, w = 5, actY = TRUE)))

Arguments

disProgObj

object of class disProg, which includes the state chain and the observed

control

specifies which surveillance algorithm should be used with their parameters. The parameter funcName and range must be specified. Here, funcName is the appropriate method function (without 'algo.') and range defines the timepoints to be evaluated by the actual system.

Value

a list of survRes objects generated by the specified surveillance algorithm

See Also

algo.rki, algo.bayes, algo.farrington

Examples

# Create a test object
disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1,
                           alpha = 1, beta = 0, phi = 0,
                           frequency = 1, state = NULL, K = 1.7)

# Let this object be tested from any methods in range = 200:400
range <- 200:400
survRes <- algo.call(disProg,
                     control = list(
                         list(funcName = "rki1", range = range),
                         list(funcName = "rki2", range = range),
                         list(funcName = "rki3", range = range),
                         list(funcName = "rki", range = range,
                              b = 3, w = 2, actY = FALSE),
                         list(funcName = "rki", range = range,
                              b = 2, w = 9, actY = TRUE),
                         list(funcName = "bayes1", range = range),
                         list(funcName = "bayes2", range = range),
                         list(funcName = "bayes3", range = range),
                         list(funcName = "bayes",
                              range = range, b = 1, w = 5, actY = TRUE,alpha=0.05)
                     ))
# show selected survRes objects
names(survRes)
plot(survRes[["rki(6,6,0)"]])
survRes[["bayes(5,5,1)"]]

The CDC Algorithm

Description

Surveillance using the CDC Algorithm

Usage

algo.cdcLatestTimepoint(disProgObj, timePoint = NULL,
                        control = list(b = 5, m = 1, alpha=0.025))
algo.cdc(disProgObj, control = list(range = range, b= 5, m=1, 
         alpha = 0.025))

Arguments

disProgObj

object of class disProg (including the observed and the state chain).

timePoint

time point which should be evaluated in algo.cdcLatestTimepoint. The default is to use the latest timepoint.

control

control object: range determines the desired timepoints which should be evaluated, b describes the number of years to go back for the reference values, m is the half window width for the reference values around the appropriate timepoint (see details). The standard definition is b=5 and m=1.

Details

Using the reference values for calculating an upper limit, alarm is given if the actual value is bigger than a computed threshold. algo.cdc calls algo.cdcLatestTimepoint for the values specified in range and for the system specified in control. The threshold is calculated from the predictive distribution, i.e.

mean(x)+zα/2sd(x)1+1/k,mean(x) + z_{\alpha/2} * sd(x) * \sqrt{1+1/k},

which corresponds to Equation 8-1 in Farrington and Andrews (2003). Note that an aggregation into 4-week blocks occurs in algo.cdcLatestTimepoint and m denotes number of 4-week blocks (months) to use as reference values. This function currently does the same for monthly data (not correct!)

Value

algo.cdcLatestTimepoint returns a list of class survRes (surveillance result), which includes the alarm value (alarm = 1, no alarm = 0) for recognizing an outbreak, the threshold value for recognizing the alarm and the input object of class disProg.

algo.cdc gives a list of class survRes which includes the vector of alarm values for every timepoint in range, the vector of threshold values for every timepoint in range for the system specified by b, w, the range and the input object of class disProg.

Author(s)

M. Höhle

References

Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurrence of notifiable diseases surveillance data. Statistics in Medicine 8, 323–329. doi:10.1002/sim.4780080312

Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.

See Also

algo.rkiLatestTimepoint,algo.bayesLatestTimepoint and algo.bayes for the Bayes system.

Examples

# Create a test object
disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 500, 
                              A = 1,alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.7)

# Test week 200 to 208 for outbreaks with a selfdefined cdc
algo.cdc(disProgObj, control = list(range = 400:500,alpha=0.025))

Comparison of Specified Surveillance Systems using Quality Values

Description

Comparison of specified surveillance algorithms using quality values.

Usage

algo.compare(survResList)

Arguments

survResList

a list of survRes objects to compare via quality values.

Value

Matrix with values from algo.quality, i.e. quality values for every surveillance algorithm found in survResults.

See Also

algo.quality

Examples

# Create a test object
disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 400,
                              A = 1, alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.7)

# Let this object be tested from any methods in range = 200:400
range <- 200:400
survRes <- algo.call(disProgObj,
                     control = list(
                         list(funcName = "rki1", range = range),
                         list(funcName = "rki2", range = range),
                         list(funcName = "rki3", range = range),
                         list(funcName = "rki", range = range,
                              b = 3, w = 2, actY = FALSE),
                         list(funcName = "rki", range = range,
                              b = 2, w = 9, actY = TRUE),
                         list(funcName = "bayes1", range = range),
                         list(funcName = "bayes2", range = range),
                         list(funcName = "bayes3", range = range),
                         list(funcName = "bayes",
                              range = range, b = 1, w = 5, actY = TRUE,alpha=0.05)
                     ))
algo.compare(survRes)

CUSUM method

Description

Approximate one-side CUSUM method for a Poisson variate based on the cumulative sum of the deviation between a reference value k and the transformed observed values. An alarm is raised if the cumulative sum equals or exceeds a prespecified decision boundary h. The function can handle time varying expectations.

Usage

algo.cusum(disProgObj, control = list(range = range, k = 1.04, h = 2.26, 
           m = NULL, trans = "standard", alpha = NULL))

Arguments

disProgObj

object of class disProg (including the observed and the state chain)

control

control object:

range

determines the desired time points which should be evaluated

k

is the reference value

h

the decision boundary

m

how to determine the expected number of cases – the following arguments are possible

numeric

a vector of values having the same length as range. If a single numeric value is specified then this value is replicated length(range) times.

NULL

A single value is estimated by taking the mean of all observations previous to the first range value.

"glm"

A GLM of the form

log(mt)=α+βt+s=1S(γssin(ωst)+δscos(ωst)),\log(m_t) = \alpha + \beta t + \sum_{s=1}^S (\gamma_s \sin(\omega_s t) + \delta_s \cos(\omega_s t)),

where ωs=2π52s\omega_s = \frac{2\pi}{52}s are the Fourier frequencies is fitted. Then this model is used to predict the range values.

trans

one of the following transformations (warning: Anscombe and NegBin transformations are experimental)

rossi

standardized variables z3 as proposed by Rossi

standard

standardized variables z1 (based on asymptotic normality) - This is the default.

anscombe

anscombe residuals – experimental

anscombe2nd

anscombe residuals as in Pierce and Schafer (1986) based on 2nd order approximation of E(X) – experimental

pearsonNegBin

compute Pearson residuals for NegBin – experimental

anscombeNegBin

anscombe residuals for NegBin – experimental

none

no transformation

alpha

parameter of the negative binomial distribution, s.t. the variance is m+αm2m+\alpha *m^2

Value

algo.cusum gives a list of class "survRes" which includes the vector of alarm values for every timepoint in range and the vector of cumulative sums for every timepoint in range for the system specified by k and h, the range and the input object of class "disProg".

The upperbound entry shows for each time instance the number of diseased individuals it would have taken the cusum to signal. Once the CUSUM signals no resetting is applied, i.e. signals occurs until the CUSUM statistic again returns below the threshold.

In case control$m="glm" was used, the returned control$m.glm entry contains the fitted "glm" object.

Note

This implementation is experimental, but will not be developed further.

Author(s)

M. Paul and M. Höhle

References

G. Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122

D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986

Examples

# Xi ~ Po(5), i=1,...,500
set.seed(321)
stsObj <- sts(observed = rpois(500,lambda=5))
# there should be no alarms as mean doesn't change
res <- cusum(stsObj, control = list(range = 100:500, trans = "anscombe"))
plot(res, xaxis.labelFormat = NULL)

# simulated data
disProgObj <- sim.pointSource(p = 1, r = 1, length = 250,
                              A = 0, alpha = log(5), beta = 0, phi = 10,
                              frequency = 10, state = NULL, K = 0)
plot(disProgObj)

# Test weeks 200 to 250 for outbreaks
surv0 <- algo.cusum(disProgObj, control = list(range = 200:250))
plot(surv0, xaxis.years = FALSE)

# alternatively, using the newer "sts" interface
stsObj <- disProg2sts(disProgObj)
surv <- cusum(stsObj, control = list(range = 200:250))
plot(surv)
stopifnot(upperbound(surv) == surv0$upperbound)

Surveillance for Count Time Series Using the Classic Farrington Method

Description

Implements the procedure of Farrington et al. (1996). At each time point of the specified range, a GLM is fitted to predict the counts. This is then compared to the observed counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised.

Usage

# original interface for a single "disProg" time series
algo.farrington(disProgObj, control=list(
    range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, plot=FALSE,
    alpha=0.05, trend=TRUE, limit54=c(5,4), powertrans="2/3",
    fitFun="algo.farrington.fitGLM.fast"))

# wrapper for "sts" data, possibly multivariate
farrington(sts, control=list(
    range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE,
    alpha=0.05), ...)

Arguments

disProgObj

an object of class "disProg" (a list including observed and state time series).

control

list of control parameters

range

Specifies the index of all timepoints which should be tested. If range is NULL the maximum number of possible weeks is used (i.e. as many weeks as possible while still having enough reference values).

b

how many years back in time to include when forming the base counts.

w

windows size, i.e. number of weeks to include before and after the current week

reweight

Boolean specifying whether to perform reweight step

trend

If TRUE a trend is included and kept in case the conditions documented in Farrington et al. (1996) are met (see the results). If FALSE then NO trend is fit.

verbose

Boolean indicating whether to show extra debugging information.

plot

Boolean specifying whether to show the final GLM model fit graphically (use History|Recording to see all pictures).

powertrans

Power transformation to apply to the data. Use either "2/3" for skewness correction (Default), "1/2" for variance stabilizing transformation or "none" for no transformation.

alpha

An approximate (two-sided) (1α)(1-\alpha) prediction interval is calculated.

limit54

To avoid alarms in cases where the time series only has about 0-2 cases the algorithm uses the following heuristic criterion (see Section 3.8 of the Farrington paper) to protect against low counts: no alarm is sounded if fewer than cases=5cases=5 reports were received in the past period=4period=4 weeks. limit54=c(cases,period) is a vector allowing the user to change these numbers. Note: As of version 0.9-7 the term "last" period of weeks includes the current week - otherwise no alarm is sounded for horrible large numbers if the four weeks before that are too low.

fitFun

String containing the name of the fit function to be used for fitting the GLM. The options are algo.farrington.fitGLM.fast (default) and algo.farrington.fitGLM or algo.farrington.fitGLM.populationOffset. See details of algo.farrington.fitGLM for more information.

sts

an object of class "sts".

...

arguments for wrap.algo, e.g., verbose=FALSE.

Details

The following steps are performed according to the Farrington et al. (1996) paper.

  1. fit of the initial model and initial estimation of mean and overdispersion.

  2. calculation of the weights omega (correction for past outbreaks)

  3. refitting of the model

  4. revised estimation of overdispersion

  5. rescaled model

  6. omission of the trend, if it is not significant

  7. repetition of the whole procedure

  8. calculation of the threshold value

  9. computation of exceedance score

Value

For algo.farrington, a list object of class "survRes" with elements alarm, upperbound, trend, disProgObj, and control.

For farrington, the input "sts" object with updated alarm, upperbound and control slots, and subsetted to control$range.

Author(s)

M. Höhle

References

A statistical algorithm for the early detection of outbreaks of infectious disease, Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996), J. R. Statist. Soc. A, 159, 547-563.

See Also

algo.farrington.fitGLM, algo.farrington.threshold

An improved Farrington algorithm is available as function farringtonFlexible.

Examples

#load "disProg" data
data("salmonella.agona")

#Do surveillance for the last 42 weeks
n <- length(salmonella.agona$observed)
control <- list(b=4,w=3,range=(n-42):n,reweight=TRUE, verbose=FALSE,alpha=0.01)
res <- algo.farrington(salmonella.agona,control=control)
plot(res)

#Generate Poisson counts and create an "sts" object
set.seed(123)
x <- rpois(520,lambda=1)
stsObj <- sts(observed=x, frequency=52)

if (surveillance.options("allExamples")) {
#Compare timing of the two possible fitters for algo.farrington
  range <- 312:520
  system.time( sts1 <- farrington(stsObj, control=list(range=range,
                         fitFun="algo.farrington.fitGLM.fast"), verbose=FALSE))
  system.time( sts2 <- farrington(stsObj, control=list(range=range,
                         fitFun="algo.farrington.fitGLM"), verbose=FALSE))
  #Check if results are the same
  stopifnot(upperbound(sts1) == upperbound(sts2))
}

Assign weights to base counts

Description

Weights are assigned according to the Anscombe residuals

Usage

algo.farrington.assign.weights(s, weightsThreshold=1)

Arguments

s

Vector of standardized Anscombe residuals

weightsThreshold

A scalar indicating when observations are seen as outlier. In the original Farrington proposal the value was 1 (default value), in the improved version this value is suggested to be 2.58.

Value

Weights according to the residuals

See Also

anscombe.residuals


Fit Poisson GLM of the Farrington procedure for a single time point

Description

The function fits a Poisson regression model (GLM) with mean predictor

logμt=α+βt\log \mu_t = \alpha + \beta t

as specified by the Farrington procedure. If requested, Anscombe residuals are computed based on an initial fit and a 2nd fit is made using weights, where base counts suspected to be caused by earlier outbreaks are downweighted.

Usage

algo.farrington.fitGLM(response, wtime, timeTrend = TRUE,
                       reweight = TRUE, ...)
algo.farrington.fitGLM.fast(response, wtime, timeTrend = TRUE,
                            reweight = TRUE, ...)
algo.farrington.fitGLM.populationOffset(response, wtime, population,
                                        timeTrend=TRUE,reweight=TRUE, ...)

Arguments

response

The vector of observed base counts

wtime

Vector of week numbers corresponding to response

timeTrend

Boolean whether to fit the βt\beta t or not

reweight

Fit twice – 2nd time with Anscombe residuals

population

Population size. Possibly used as offset, i.e. in algo.farrington.fitGLM.populationOffset the value log(population) is used as offset in the linear predictor of the GLM:

logμt=log(population)+α+βt\log \mu_t = \log(\texttt{population}) + \alpha + \beta t

This provides a way to adjust the Farrington procedure to the case of greatly varying populations. Note: This is an experimental implementation with methodology not covered by the original paper.

...

Used to catch additional arguments, currently not used.

Details

Compute weights from an initial fit and rescale using Anscombe based residuals as described in the anscombe.residuals function.

Note that algo.farrington.fitGLM uses the glm routine for fitting. A faster alternative is provided by algo.farrington.fitGLM.fast which uses the glm.fit function directly (thanks to Mikko Virtanen). This saves computational overhead and increases speed for 500 monitored time points by a factor of approximately two. However, some of the routine glm functions might not work on the output of this function. Which function is used for algo.farrington can be controlled by the control$fitFun argument.

Value

an object of class GLM with additional fields wtime, response and phi. If the glm returns without convergence NULL is returned.

See Also

anscombe.residuals,algo.farrington


Compute prediction interval for a new observation

Description

Depending on the current transformation h(y)={y,y,y2/3}h(y)= \{y, \sqrt{y}, y^{2/3}\},

V(h(y0)h(μ0))=V(h(y0))+V(h(μ0))V(h(y_0)-h(\mu_0))=V(h(y_0))+V(h(\mu_0))

is used to compute a prediction interval. The prediction variance consists of a component due to the variance of having a single observation and a prediction variance.

Usage

algo.farrington.threshold(pred,phi,alpha=0.01,skewness.transform="none",y)

Arguments

pred

A GLM prediction object

phi

Current overdispersion parameter (superfluous?)

alpha

Quantile level in Gaussian based CI, i.e. an (1α)100%(1-\alpha)\cdot 100\% confidence interval is computed.

skewness.transform

Skewness correction, i.e. one of "none", "1/2", or "2/3".

y

Observed number

Value

Vector of length four with lower and upper bounds of an (1α)100%(1-\alpha)\cdot 100\% confidence interval (first two arguments) and corresponding quantile of observation y together with the median of the predictive distribution.


Count Data Regression Charts

Description

Count data regression charts for the monitoring of surveillance time series as proposed by Höhle and Paul (2008). The implementation is described in Salmon et al. (2016).

Usage

algo.glrnb(disProgObj, control = list(range=range, c.ARL=5,
           mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept",
           theta=NULL, dir=c("inc","dec"),
           ret=c("cases","value"), xMax=1e4))

algo.glrpois(disProgObj, control = list(range=range, c.ARL=5,
             mu0=NULL, Mtilde=1, M=-1, change="intercept",
             theta=NULL, dir=c("inc","dec"),
             ret=c("cases","value"), xMax=1e4))

Arguments

disProgObj

object of class disProg to do surveillance for. For new sts-class data, use the glrnb wrapper, or the sts2disProg converter.

control

A list controlling the behaviour of the algorithm

range

vector of indices in the observed vector to monitor (should be consecutive)

mu0

A vector of in-control values of the mean of the Poisson / negative binomial distribution with the same length as range. If NULL the observed values in 1:(min(range)-1) are used to estimate the beta vector through a generalized linear model. To fine-tune the model one can instead specify mu0 as a list with two components:

S

integer number of harmonics to include (typically 1 or 2)

trend

A Boolean indicating whether to include a term t in the GLM model

The fitting is controlled by the estimateGLRNbHook function. The in-control mean model is re-fitted after every alarm. The fitted models can be found as a list mod in the control slot after the call.

Note: If a value for alpha is given, then the inverse of this value is used as fixed theta in a negative.binomial glm. If is.null(alpha) then the parameter is estimated as well (using glm.nb) – see the description of this parameter for details.

alpha

The (known) dispersion parameter of the negative binomial distribution, i.e. the parametrization of the negative binomial is such that the variance is mean+alphamean2mean + alpha*mean^2. Note: This parametrization is the inverse of the shape parametrization used in R – for example in dnbinom and glr.nb. Hence, if alpha=0 then the negative binomial distribution boils down to the Poisson distribution and a call of algo.glrnb is equivalent to a call to algo.glrpois. If alpha=NULL the parameter is calculated as part of the in-control estimation. However, the parameter is estimated only once from the first fit. Subsequent fittings are only for the parameters of the linear predictor with alpha fixed.

c.ARL

threshold in the GLR test, i.e. cγc_{\gamma}

Mtilde

number of observations needed before we have a full rank the typical setup for the "intercept" and "epi" charts is Mtilde=1

M

number of time instances back in time in the window-limited approach, i.e. the last value considered is max1,nM\max{1,n-M}. To always look back until the first observation use M=-1.

change

a string specifying the type of the alternative. Currently the two choices are intercept and epi. See the SFB Discussion Paper 500 for details.

theta

if NULL then the GLR scheme is used. If not NULL the prespecified value for κ\kappa or λ\lambda is used in a recursive LR scheme, which is faster.

dir

a string specifying the direction of testing in GLR scheme. With "inc" only increases in xx are considered in the GLR-statistic, with "dec" decreases are regarded.

ret

a string specifying the type of upperbound-statistic that is returned. With "cases" the number of cases that would have been necessary to produce an alarm or with "value" the GLR-statistic is computed (see below).

xMax

Maximum value to try for x to see if this is the upperbound number of cases before sounding an alarm (Default: 1e4). This only applies for the GLR using the NegBin when ret="cases" – see details.

Details

This function implements the seasonal count data chart based on generalized likelihood ratio (GLR) as described in the Höhle and Paul (2008) paper. A moving-window generalized likelihood ratio detector is used, i.e. the detector has the form

N=inf{n:max1kn[t=knlog{fθ1(xtzt)fθ0(xtzt)}]cγ}N = \inf\left\{ n : \max_{1\leq k \leq n} \left[ \sum_{t=k}^n \log \left\{ \frac{f_{\theta_1}(x_t|z_t)}{f_{\theta_0}(x_t|z_t)} \right\} \right] \geq c_\gamma \right\}

where instead of 1kn1\leq k \leq n the GLR statistic is computed for all k{nM,,nM~+1}k \in \{n-M, \ldots, n-\tilde{M}+1\}. To achieve the typical behaviour from 1kn1\leq k\leq n use Mtilde=1 and M=-1.

So NN is the time point where the GLR statistic is above the threshold the first time: An alarm is given and the surveillance is reset starting from time N+1N+1. Note that the same c.ARL as before is used, but if mu0 is different at N+1,N+2,N+1,N+2,\ldots compared to time 1,2,1,2,\ldots the run length properties differ. Because c.ARL to obtain a specific ARL can only be obtained my Monte Carlo simulation there is no good way to update c.ARL automatically at the moment. Also, FIR GLR-detectors might be worth considering.

In case is.null(theta) and alpha>0 as well as ret="cases" then a brute-force search is conducted for each time point in range in order to determine the number of cases necessary before an alarm is sounded. In case no alarm was sounded so far by time tt, the function increases x[t]x[t] until an alarm is sounded any time before time point tt. If no alarm is sounded by xMax, a return value of 1e99 is given. Similarly, if an alarm was sounded by time tt the function counts down instead. Note: This is slow experimental code!

At the moment, window limited “intercept” charts have not been extensively tested and are at the moment not supported. As speed is not an issue here this doesn't bother too much. Therefore, a value of M=-1 is always used in the intercept charts.

Value

algo.glrpois simply calls algo.glrnb with control$alpha set to 0.

algo.glrnb returns a list of class survRes (surveillance result), which includes the alarm value for recognizing an outbreak (1 for alarm, 0 for no alarm), the threshold value for recognizing the alarm and the input object of class disProg. The upperbound slot of the object are filled with the current GLR(n)GLR(n) value or with the number of cases that are necessary to produce an alarm at any time point n\leq n. Both lead to the same alarm timepoints, but "cases" has an obvious interpretation.

Author(s)

M. Höhle with contributions by V. Wimmer

References

Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10

Examples

##Simulate data and apply the algorithm
S <- 1 ; t <- 1:120 ; m <- length(t)
beta <- c(1.5,0.6,0.6)
omega <- 2*pi/52
#log mu_{0,t}
base <- beta[1] + beta[2] * cos(omega*t) + beta[3] * sin(omega*t)
#Generate example data with changepoint and tau=tau
tau <- 100
kappa <- 0.4
mu0 <- exp(base)
mu1 <- exp(base  + kappa)


## Poisson example
#Generate data
set.seed(42)
x <- rpois(length(t),mu0*(exp(kappa)^(t>=tau)))
s.ts <- sts(observed=x, state=(t>=tau))
#Plot the data
plot(s.ts, xaxis.labelFormat=NULL)
#Run
cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0,
             change="intercept",ret="value",dir="inc")
glr.ts <- glrpois(s.ts,control=cntrl)
plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5)
lr.ts  <- glrpois(s.ts,control=c(cntrl,theta=0.4))
plot(lr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5)

#using the legacy interface for "disProg" data
lr.ts0  <- algo.glrpois(sts2disProg(s.ts), control=c(cntrl,theta=0.4))
stopifnot(upperbound(lr.ts) == lr.ts0$upperbound)


## NegBin example
#Generate data
set.seed(42)
alpha <- 0.2
x <- rnbinom(length(t),mu=mu0*(exp(kappa)^(t>=tau)),size=1/alpha)
s.ts <- sts(observed=x, state=(t>=tau))

#Plot the data
plot(s.ts, xaxis.labelFormat=NULL)

#Run GLR based detection
cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha,
             change="intercept",ret="value",dir="inc")
glr.ts <- glrnb(s.ts, control=cntrl)
plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5)

#CUSUM LR detection with backcalculated number of cases
cntrl2 = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha,
              change="intercept",ret="cases",dir="inc",theta=1.2)
glr.ts2 <- glrnb(s.ts, control=cntrl2)
plot(glr.ts2, xaxis.labelFormat=NULL)

Hidden Markov Model (HMM) method

Description

This function implements on-line HMM detection of outbreaks based on the retrospective procedure described in Le Strat and Carret (1999). Using the function msm (from package msm) a specified HMM is estimated, the decoding problem, i.e. the most probable state configuration, is found by the Viterbi algorithm and the most probable state of the last observation is recorded. On-line detection is performed by sequentially repeating this procedure.

Warning: This function can be very slow - a more efficient implementation would be nice!

Usage

algo.hmm(disProgObj, control = list(range=range, Mtilde=-1, 
           noStates=2, trend=TRUE, noHarmonics=1,
           covEffectEqual=FALSE, saveHMMs = FALSE, extraMSMargs=list()))

Arguments

disProgObj

object of class disProg (including the observed and the state chain)

control

control object:

range

determines the desired time points which should be evaluated. Note that opposite to other surveillance methods an initial parameter estimation occurs in the HMM. Note that range should be high enough to allow for enough reference values for estimating the HMM

Mtilde

number of observations back in time to use for fitting the HMM (including the current observation). Reasonable values are a multiple of disProgObj$freq, the default is Mtilde=-1, which means to use all possible values - for long series this might take very long time!

noStates

number of hidden states in the HMM – the typical choice is 2. The initial rates are set such that the noStatesth state is the one having the highest rate. In other words: this state is considered the outbreak state.

trend

Boolean stating whether a linear time trend exists, i.e. if TRUE (default) then βj0\beta_j \neq 0

noHarmonics

number of harmonic waves to include in the linear predictor. Default is 1.

covEffectEqual

see details

saveHMMs

Boolean, if TRUE then the fitted HMMs are saved. With this option the function can also be used to analyse data retrospectively. Default option is FALSE

extraMSMArgs

A named list with additional arguments to send to the msm HMM fitting function. Note that the msm arguments formula, data, qmatrix, hmodel, hcovariates and hconstraint are automatically filled by algo.hmm, thus these should NOT be modified.

Details

For each time point t the reference values values are extracted. If the number of requested values is larger than the number of possible values the latter is used. Now the following happens on these reference values:

A noStates-State Hidden Markov Model (HMM) is used based on the Poisson distribution with linear predictor on the log-link scale. I.e.

YtXt=jPo(μtj),Y_t | X_t = j \sim Po(\mu_t^j),

where

log(μtj)=αj+βjt+i=1nHγjicos(2iπ/freq(t1))+δjisin(2iπ/freq(t1))\log(\mu_t^j) = \alpha_j + \beta_j\cdot t + \sum_{i=1}^{nH} \gamma_j^i \cos(2i\pi/freq\cdot (t-1)) + \delta_j^i \sin(2i\pi/freq\cdot (t-1))

and nH=nH=noHarmonics and freq=12,52freq=12,52 depending on the sampling frequency of the surveillance data. In the above t1t-1 is used, because the first week is always saved as t=1, i.e. we want to ensure that the first observation corresponds to cos(0) and sin(0).

If covEffectEqual then all covariate effects parameters are equal for the states, i.e. βj=β,γji=γi,δji=δi\beta_j=\beta, \gamma_j^i=\gamma^i, \delta_j^i=\delta^i for all j=1,...,noStatesj=1,...,\code{noStates}.

In case more complicated HMM models are to be fitted it is possible to modify the msm code used in this function. Using e.g. AIC one can select between different models (see the msm package for further details).

Using the Viterbi algorithms the most probable state configuration is obtained for the reference values and if the most probable configuration for the last reference value (i.e. time t) equals control$noOfStates then an alarm is given.

Note: The HMM is re-fitted from scratch every time, sequential updating schemes of the HMM would increase speed considerably! A major advantage of the approach is that outbreaks in the reference values are handled automatically.

Value

algo.hmm gives a list of class survRes which includes the vector of alarm values for every timepoint in range. No upperbound can be specified and is put equal to zero.

The resulting object contains a list control$hmms, which contains the "msm" objects with the fitted HMMs (if saveHMMs=TRUE).

Author(s)

M. Höhle

References

Y. Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478

I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70

See Also

msm

Examples

#Simulate outbreak data from HMM
set.seed(123)
counts <- sim.pointSource(p = 0.98, r = 0.8, length = 3*52,
                              A = 1, alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.5)

## Not run: 
#Do surveillance using a two state HMM without trend component and
#the effect of the harmonics being the same in both states. A sliding
#window of two years is used to fit the HMM
surv <- algo.hmm(counts, control=list(range=(2*52):length(counts$observed),
                                   Mtilde=2*52,noStates=2,trend=FALSE,
                                   covEffectsEqual=TRUE,extraMSMargs=list()))
plot(surv,legend.opts=list(x="topright"))

## End(Not run)

if (require("msm")) {
#Retrospective use of the function, i.e. monitor only the last time point
#but use option saveHMMs to store the output of the HMM fitting
surv <- algo.hmm(counts,control=list(range=length(counts$observed),Mtilde=-1,noStates=2,
                          trend=FALSE,covEffectsEqual=TRUE, saveHMMs=TRUE))

#Compute most probable state using the viterbi algorithm - 1 is "normal", 2 is "outbreak".
viterbi.msm(surv$control$hmms[[1]])$fitted

#How often correct?
tab <- cbind(truth=counts$state + 1 ,
             hmm=viterbi.msm(surv$control$hmm[[1]])$fitted)
table(tab[,1],tab[,2])
}

Semiparametric surveillance of outbreaks

Description

Frisen and Andersson (2009) method for semiparametric surveillance of outbreaks

Usage

algo.outbreakP(disProgObj, control = list(range = range, k=100,
               ret=c("cases","value"),maxUpperboundCases=1e5))

Arguments

disProgObj

object of class disProg (including the observed and the state chain).

control

A list controlling the behaviour of the algorithm

range

determines the desired time-points which should be monitored. Note that it is automatically assumed that ALL other values in disProgObj can be used for the estimation, i.e. for a specific value i in range all values from 1 to i are used for estimation.

k

The threshold value. Once the outbreak statistic is above this threshold k an alarm is sounded.

ret

a string specifying the type of upperbound-statistic that is returned. With "cases" the number of cases that would have been necessary to produce an alarm (NNBA) or with "value" the outbreakP-statistic is computed (see below).

maxUpperboundCases

Upperbound when numerically searching for NNBA. Default is 1e5.

Details

A generalized likelihood ratio test based on the Poisson distribution is implemented where the means of the in-control and out-of-control states are computed by isotonic regression.

OutbreakP(s)=t=1s(μ^C1(t)μ^D(t))x(t)OutbreakP(s) = \prod_{t=1}^s \left( \frac{\hat{\mu}^{C1}(t)}{\hat{\mu}^D(t)} \right)^{x(t)}

where μ^C1(t)\hat{\mu}^{C1}(t) is the estimated mean obtained by uni-modal regression under the assumption of one change-point and μ^D(t)\hat{\mu}^D(t) is the estimated result when there is no change-point (i.e. this is just the mean of all observations). Note that the contrasted hypothesis assume all means are equal until the change-point, i.e. this detection method is especially suited for detecting a shift from a relative constant mean. Hence, this is less suited for detection in diseases with strong seasonal endemic component. Onset of influenza detection is an example where this method works particular well.

In case control$ret == "cases" then a brute force numerical search for the number needed before alarm (NNBA) is performed. That is, given the past observations, what's the minimum number which would have caused an alarm? Note: Computing this might take a while because the search is done by sequentially increasing/decreasing the last observation by one for each time point in control$range and then calling the workhorse function of the algorithm again. The argument control$maxUpperboundCases controls the upper limit of this search (default is 1e5). Currently, even though the statistic has passed the threshold, the NNBA is still computed. After a few time instances what typically happens is that no matter the observed value we would have an alarm at this time point. In this case the value of NNBA is set to NA. Furthermore, the first time point is always NA, unless k<1.

Value

algo.outbreakP gives a list of class survRes which includes the vector of alarm values for every time-point in range, the vector of threshold values for every time-point in range.

Author(s)

M. Höhle – based on Java code by M. Frisen and L. Schiöler

Source

The code is an extended R port of the Java code by Marianne Frisén and Linus Schiöler from the Computer Assisted Search For Epidemics (CASE) project, formerly available from ⁠https://case.folkhalsomyndigheten.se/⁠ under the GNU GPL License v3.

An additional feature of the R code is that it contains a search for NNBA (see details).

References

Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.

Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.

Examples

#Use data from outbreakP manual (http://www.hgu.gu.se/item.aspx?id=16857)
y <- matrix(c(1,0,3,1,2,3,5,4,7,3,5,8,16,23,33,34,48),ncol=1)

#Generate sts object with these observations
mysts <- sts(y, alarm=y*0)

#Run the algorithm and present results
#Only the value of outbreakP statistic
upperbound(outbreakP(mysts, control=list(range=1:length(y),k=100,
           ret="value")))

#Graphical illustration with number-needed-before-alarm (NNBA) upperbound.
res <- outbreakP(mysts, control=list(range=1:length(y),k=100,
           ret="cases"))
plot(res,dx.upperbound=0,lwd=c(1,1,3),legend.opts=list(legend=c("Infected",
      "NNBA","Outbreak","Alarm"),horiz=TRUE))

Computation of Quality Values for a Surveillance System Result

Description

Computation of the quality values for a surveillance system output.

Usage

algo.quality(sts, penalty = 20)

Arguments

sts

object of class survRes or sts, which includes the state chain and the computed alarm chain

penalty

the maximal penalty for the lag

Details

The lag is defined as follows: In the state chain just the beginnings of an outbreak chain (outbreaks directly following each other) are considered. In the alarm chain, the range from the beginning of an outbreak until min(next outbreak beginning, penalty) timepoints is considered. The penalty timepoints were chosen, to provide an upper bound on the penalty for not discovering an outbreak. Now the difference between the first alarm by the system and the defined beginning is denoted “the lag”. Additionally outbreaks found by the system are not punished. At the end, the mean of the lags for every outbreak chain is returned as summary lag.

Value

an object of class "algoQV", which is a list of quality values:

TP

Number of correct found outbreaks.

FP

Number of false found outbreaks.

TN

Number of correct found non outbreaks.

FN

Number of false found non outbreaks.

sens

True positive rate, meaning TP/(FN + TP).

spec

True negative rate, meaning TN/(TN + FP).

dist

Euclidean distance between (1-spec, sens) to (0,1).

lag

Lag of the outbreak recognizing by the system.

See Also

algo.compare

Examples

# Create a test object
disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1,
                              alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.7)

# Let this object be tested from rki1
survResObj <- algo.rki1(disProgObj, control = list(range = 50:200))

# Compute the list of quality values
quality <- algo.quality(survResObj)
quality # the list is printed in matrix form


# Format as an "xtable", which is printed with LaTeX markup (by default)
library("xtable")
xtable(quality)

The system used at the RKI

Description

Evaluation of timepoints with the detection algorithms used by the RKI

Usage

algo.rkiLatestTimepoint(disProgObj, timePoint = NULL,
                        control = list(b = 2, w = 4, actY = FALSE))
algo.rki(disProgObj, control = list(range = range,
         b = 2, w = 4, actY = FALSE))
algo.rki1(disProgObj, control = list(range = range))
algo.rki2(disProgObj, control = list(range = range))
algo.rki3(disProgObj, control = list(range = range))

Arguments

disProgObj

object of class disProg (including the observed and the state chain).

timePoint

time point which should be evaluated in algo.rkiLatestTimepoint. The default is to use the latest timepoint.

control

control object: range determines the desired timepoints which should be evaluated, b describes the number of years to go back for the reference values, w is the half window width for the reference values around the appropriate timepoint and actY is a boolean to decide if the year of timePoint also spend w reference values of the past. As default b, w, actY are set for the RKI 3 system.

Details

Using the reference values for calculating an upper limit (threshold), alarm is given if the actual value is bigger than a computed threshold. algo.rki calls algo.rkiLatestTimepoint for the values specified in range and for the system specified in control. algo.rki1 calls algo.rkiLatestTimepoint for the values specified in range for the RKI 1 system. algo.rki2 calls algo.rkiLatestTimepoint for the values specified in range for the RKI 2 system. algo.rki3 calls algo.rkiLatestTimepoint for the values specified in range for the RKI 3 system.

  • "RKI 1" reference values from 6 weeks ago

  • "RKI 2" reference values from 6 weeks ago and 13 weeks of the year ago (symmetrical around the comparable week).

  • "RKI 3" 18 reference values. 9 from the year ago and 9 from two years ago (also symmetrical around the comparable week).

Value

algo.rkiLatestTimepoint returns a list of class survRes (surveillance result), which includes the alarm value (alarm = 1, no alarm = 0) for recognizing an outbreak, the threshold value for recognizing the alarm and the input object of class disProg.

algo.rki gives a list of class survRes which includes the vector of alarm values for every timepoint in range, the vector of threshold values for every timepoint in range for the system specified by b, w and actY, the range and the input object of class disProg. algo.rki1 returns the same for the RKI 1 system, algo.rki2 for the RKI 2 system and algo.rki3 for the RKI 3 system.

Author(s)

M. Höhle, A. Riebler, Christian Lang

See Also

algo.bayesLatestTimepoint and algo.bayes for the Bayes system.

Examples

# Create a test object
disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1,
                              alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.7)

# Test week 200 to 208 for outbreaks with a selfdefined rki
algo.rki(disProgObj, control = list(range = 200:208, b = 1,
                                    w = 5, actY = TRUE))
# The same for rki 1 to rki 3
algo.rki1(disProgObj, control = list(range = 200:208))
algo.rki2(disProgObj, control = list(range = 200:208))
algo.rki3(disProgObj, control = list(range = 200:208))

# Test for rki 1 the latest timepoint
algo.rkiLatestTimepoint(disProgObj)

Modified CUSUM method as proposed by Rogerson and Yamada (2004)

Description

Modified Poisson CUSUM method that allows for a time-varying in-control parameter θ0,t\theta_{0,t} as proposed by Rogerson and Yamada (2004). The same approach can be applied to binomial data if distribution="binomial" is specified.

Usage

algo.rogerson(disProgObj, control = list(range = range,
   theta0t = NULL, ARL0 = NULL, s = NULL, hValues = NULL,
   distribution = c("poisson","binomial"), nt = NULL, FIR=FALSE, 
   limit = NULL, digits = 1))

Arguments

disProgObj

object of class disProg that includes a matrix with the observed number of counts

control

list with elements

range

vector of indices in the observed matrix of disProgObj to monitor

theta0t

matrix with in-control parameter, must be specified

ARL0

desired average run length γ\gamma

s

change to detect, see findH for further details

hValues

matrix with decision intervals h for a sequence of values θ0,t\theta_{0,t} (in the range of theta0t)

distribution

"poisson" or "binomial"

nt

optional matrix with varying sample sizes for the binomial CUSUM

FIR

a FIR CUSUM with head start h/2h/2 is applied to the data if TRUE, otherwise no head start is used; see details

limit

numeric that determines the procedure after an alarm is given, see details

digits

the reference value and decision interval are rounded to digits decimal places. Defaults to 1 and should correspond to the number of digits used to compute hValues

Details

The CUSUM for a sequence of Poisson or binomial variates xtx_t is computed as

St=max{0,St1+ct(xtkt)},t=1,2,,S_t = \max \{0, S_{t-1} + c_t (x_t- k_t)\} , \, t=1,2,\ldots ,

where S0=0S_0=0 and ct=h/htc_t=h/h_t; ktk_t and hth_t are time-varying reference values and decision intervals. An alarm is given at time tt if SthS_t \geq h.

If FIR=TRUE, the CUSUM starts with a head start value S0=h/2S_0=h/2 at time t=0t=0. After an alarm is given, the FIR CUSUM starts again at this head start value.

The procedure after the CUSUM gives an alarm can be determined by limit. Suppose that the CUSUM signals at time tt, i.e. SthS_t \geq h. For numeric values of limit, the CUSUM is bounded above after an alarm is given, i.e. StS_t is set to min{limith,St}\min\{\code{limit} \cdot h, S_t\}. Using limit=0 corresponds to resetting StS_t to zero after an alarm as proposed in the original formulation of the CUSUM. If FIR=TRUE, StS_t is reset to h/2h/2 (i.e. limit=h/2h/2 ). If limit=NULL, no resetting occurs after an alarm is given.

Value

Returns an object of class survRes with elements

alarm

indicates whether the CUSUM signaled at time tt or not (1 = alarm, 0 = no alarm)

upperbound

CUSUM values StS_t

disProgObj

disProg object

control

list with the alarm threshold hh and the specified control object

Note

algo.rogerson is a univariate CUSUM method. If the data are available in several regions (i.e. observed is a matrix), multiple univariate CUSUMs are applied to each region.

References

Rogerson, P. A. and Yamada, I. Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts. Morbidity and Mortality Weekly Report, 2004, 53/Supplement, 79-85

See Also

hValues

Examples

# simulate data (seasonal Poisson)
set.seed(123)
t <- 1:300
lambda <- exp(-0.5 + 0.4 * sin(2*pi*t/52) + 0.6 * cos(2*pi*t/52))
data <- sts(observed = rpois(length(lambda), lambda))

# determine a matrix with h values
hVals <- hValues(theta0 = 10:150/100, ARL0=500, s = 1, distr = "poisson")

# convert to legacy "disProg" class and apply modified Poisson CUSUM
disProgObj <- sts2disProg(data)
res <- algo.rogerson(disProgObj, control=c(hVals, list(theta0t=lambda, range=1:300)))
plot(res, xaxis.years = FALSE)

Summary Table Generation for Several Disease Chains

Description

Summary table generation for several disease chains.

Usage

algo.summary(compMatrices)

Arguments

compMatrices

list of matrices constructed by algo.compare.

Details

As lag the mean of all single lags is returned. TP values, FN values, TN values and FP values are summed up. dist, sens and spec are new computed on the basis of the new TP value, FN value, TN value and FP value.

Value

a matrix summing up the singular input matrices

See Also

algo.compare, algo.quality

Examples

# Create a test object
disProgObj1 <- sim.pointSource(p = 0.99, r = 0.5, length = 400,
                               A = 1, alpha = 1, beta = 0, phi = 0,
                               frequency = 1, state = NULL, K = 1.7)
disProgObj2 <- sim.pointSource(p = 0.99, r = 0.5, length = 400,
                               A = 1, alpha = 1, beta = 0, phi = 0,
                               frequency = 1, state = NULL, K = 5)
disProgObj3 <- sim.pointSource(p = 0.99, r = 0.5, length = 400,
                               A = 1, alpha = 1, beta = 0, phi = 0,
                               frequency = 1, state = NULL, K = 17)

# Let this object be tested from any methods in range = 200:400
range <- 200:400
control <- list(list(funcName = "rki1", range = range),
                list(funcName = "rki2", range = range),
                list(funcName = "rki3", range = range))

compMatrix1 <- algo.compare(algo.call(disProgObj1, control=control))
compMatrix2 <- algo.compare(algo.call(disProgObj2, control=control))
compMatrix3 <- algo.compare(algo.call(disProgObj3, control=control))

algo.summary( list(a=compMatrix1, b=compMatrix2, c=compMatrix3) )

Test if Two Model Fits are (Nearly) Equal

Description

Two model fits are compared using standard all.equal-methods after discarding certain elements considered irrelevant for the equality of the fits, e.g., the runtime and the call.

Usage

## S3 method for class 'twinstim'
all.equal(target, current, ..., ignore = NULL)

## S3 method for class 'hhh4'
all.equal(target, current, ..., ignore = NULL)

Arguments

target, current

the model fits to be compared.

...

further arguments for standard all.equal-methods, e.g., the numerical tolerance.

ignore

an optional character vector of elements to ignore when comparing the two fitted objects. The following elements are always ignored: "runtime" and "call".

Value

Either TRUE or a character vector describing differences between the target and the current model fit.

Author(s)

Sebastian Meyer


Generic animation of spatio-temporal objects

Description

Generic function for animation of R objects.

Usage

animate(object, ...)

Arguments

object

The object to animate.

...

Arguments to be passed to methods, such as graphical parameters or time interval options for the snapshots.

See Also

The methods animate.epidata, animate.epidataCS, and animate.sts for the animation of surveillance data.


Compute Anscombe Residuals

Description

Compute Anscombe residuals from a fitted glm, which makes them approximately standard normal distributed.

Usage

anscombe.residuals(m, phi)

Arguments

m

a fitted "glm"

phi

the current estimated overdispersion

Value

The standardized Anscombe residuals of m

References

McCullagh & Nelder, Generalized Linear Models, 1989


Calculation of Average Run Length for discrete CUSUM schemes

Description

Calculates the average run length (ARL) for an upward CUSUM scheme for discrete distributions (i.e. Poisson and binomial) using the Markov chain approach.

Usage

arlCusum(h=10, k=3, theta=2.4, distr=c("poisson","binomial"),
         W=NULL, digits=1, ...)

Arguments

h

decision interval

k

reference value

theta

distribution parameter for the cumulative distribution function (cdf) FF, i.e. rate λ\lambda for Poisson variates or probability pp for binomial variates

distr

"poisson" or "binomial"

W

Winsorizing value W for a robust CUSUM, to get a nonrobust CUSUM set W > k+h. If NULL, a nonrobust CUSUM is used.

digits

k and h are rounded to digits decimal places

...

further arguments for the distribution function, i.e. number of trials n for binomial cdf

Value

Returns a list with the ARL of the regular (zero-start) and the fast initial response (FIR) CUSUM scheme with reference value k, decision interval h for XF(θ)X \sim F(\theta), where F is the Poisson or binomial CDF.

ARL

one-sided ARL of the regular (zero-start) CUSUM scheme

FIR.ARL

one-sided ARL of the FIR CUSUM scheme with head start h2\frac{\code{h}}{2}

Source

Based on the FORTRAN code of

Hawkins, D. M. (1992). Evaluation of Average Run Lengths of Cumulative Sum Charts for an Arbitrary Data Distribution. Communications in Statistics - Simulation and Computation, 21(4), p. 1001-1020.


Non-parametric back-projection of incidence cases to exposure cases using a known incubation time as in Becker et al (1991)

Description

The function is an implementation of the non-parametric back-projection of incidence cases to exposure cases described in Becker et al. (1991). The method back-projects exposure times from a univariate time series containing the number of symptom onsets per time unit. Here, the delay between exposure and symptom onset for an individual is seen as a realization of a random variable governed by a known probability mass function. The back-projection function calculates the expected number of exposures λt\lambda_t for each time unit under the assumption of a Poisson distribution, but without any parametric assumption on how the λt\lambda_t evolve in time.

Furthermore, the function contains a bootstrap based procedure, as given in Yip et al (2011), which allows an indication of uncertainty in the estimated λt\lambda_t. The procedure is equivalent to the suggestion in Becker and Marschner (1993). However, the present implementation in backprojNP allows only a univariate time series, i.e. simultaneous age groups as in Becker and Marschner (1993) are not possible.

The method in Becker et al. (1991) was originally developed for the back-projection of AIDS incidence, but it is equally useful for analysing the epidemic curve in outbreak situations of a disease with long incubation time, e.g. in order to qualitatively investigate the effect of intervention measures.

Usage

backprojNP(sts, incu.pmf, 
   control = list(k = 2, 
                  eps = rep(0.005,2), 
                  iter.max=rep(250,2), 
                  Tmark = nrow(sts), 
                  B = -1, 
                  alpha = 0.05, 
                  verbose = FALSE, 
                  lambda0 = NULL,
                  eq3a.method = c("R","C"),
                  hookFun = function(stsbp) {}),
     ...)

Arguments

sts

an object of class "sts" (or one that can be coerced to that class): contains the observed number of symptom onsets as a time series.

incu.pmf

Probability mass function (PMF) of the incubation time. The PMF is specified as a vector or matrix with the value of the PMF evaluated at 0,...,dmax0,...,d_max, i.e. note that the support includes zero. The value of dmaxd_max is automatically calculated as length(incu.pmf)-1 or nrow(incu.pmf)-1. Note that if the sts object has more than one column, then for the backprojection the incubation time is either recycled for all components or, if it is a matrix with the same number of columns as the sts object, the kk'th column of incu.pmf is used for the backprojection of the kk'th series.

control

A list with named arguments controlling the functionality of the non-parametric back-projection.

k

An integer representing the smoothing parameter to use in the smoothing step of the EMS algorithm. Needs to be an even number.

eps

A vector of length two representing the convergence threshold ϵ\epsilon of the EMS algorithm, see Details for further information. The first value is the threshold to use in the k=0k=0 loop, which forms the values for the parametric bootstrap. The second value is the threshold to use in the actual fit and bootstrap fitting using the specified k. If k is only of length one, then this number is replicated twice.

Tmark

Numeric with TTT'\leq T. Upper time limit on which to base convergence, i.e. only the values λ1,,λT\lambda_1,\ldots,\lambda_{T'} are monitored for convergence. See details.

iter.max

The maximum number of EM iterations to do before stopping.

B

Number of parametric bootstrap samples to perform from an initial k=0 fit. For each sample a back projection is performed. See Becker and Marschner (1993) for details.

alpha

(1-α\alpha)*100% confidence intervals are computed based on the percentile method.

verbose

(boolean). If true show extra progress and debug information.

lambda0

Start values for lambda. Vector needs to be of the length nrow(sts).

eq3a.method

A single character being either "R" or "C" depending on whether the three nested loops of equation 3a in Becker et al. (1991) are to be executed as safe R code (can be extremely slow, however the implementation is not optimized for speed) or a C code (can be more than 200 times faster!). However, the C implementation is experimental and can hang R if, e.g., the time series does not go far enough back.

hookFun

Hook function called for each iteration of the EM algorithm. The function should take a single argument stsbp of class "stsBP" class. It will be have the lambda set to the current value of lambda. If no action desired just leave the function body empty (default). Additional arguments are possible.

...

Additional arguments are sent to the hook function.

Details

Becker et al. (1991) specify a non-parametric back-projection algorithm based on the Expectation-Maximization-Smoothing (EMS) algorithm.

In the present implementation the algorithm iterates until

λ(k+1)λ(k)λ(k)<ϵ\frac{||\lambda^{(k+1)} - \lambda^{(k)}||}{||\lambda^{(k)}||} < \epsilon

This is a slight adaptation of the proposals in Becker et al. (1991). If TT is the length of λ\lambda then one can avoid instability of the algorithm near the end by considering only the λ\lambda's with index 1,,T1,\ldots,T'.

See the references for further information.

Value

backprojNP returns an object of "stsBP".

Note

The method is still experimental. A proper plot routine for stsBP objects is currently missing.

Author(s)

Michael Höhle with help by Daniel Sabanés Bové and Sebastian Meyer for eq3a.method = "C"

References

Becker NG, Watson LF and Carlin JB (1991), A method for non-parametric back-projection and its application to AIDS data, Statistics in Medicine, 10:1527-1542.

Becker NG and Marschner IC (1993), A method for estimating the age-specific relative risk of HIV infection from AIDS incidence data, Biometrika, 80(1):165-178.

Yip PSF, Lam KF, Xu Y, Chau PH, Xu J, Chang W, Peng Y, Liu Z, Xie X and Lau HY (2011), Reconstruction of the Infection Curve for SARS Epidemic in Beijing, China Using a Back-Projection Method, Communications in Statistics - Simulation and Computation, 37(2):425-433.

Associations of Age and Sex on Clinical Outcome and Incubation Period of Shiga toxin-producing Escherichia coli O104:H4 Infections, 2011 (2013), Werber D, King LA, Müller L, Follin P, Buchholz U, Bernard H, Rosner BM, Ethelberg S, de Valk H, Höhle M, American Journal of Epidemiology, 178(6):984-992.

Examples

#Generate an artificial outbreak of size n starting at time t0 and being of length
n <- 1e3 ; t0 <- 23 ; l <- 10

#PMF of the incubation time is an interval censored gamma distribution
#with mean 15 truncated at 25.
dmax <- 25
inc.pmf <- c(0,(pgamma(1:dmax,15,1.4) - pgamma(0:(dmax-1),15,1.4))/pgamma(dmax,15,1.4))
#Function to sample from the incubation time
rincu <- function(n) {
  sample(0:dmax, size=n, replace=TRUE, prob=inc.pmf)
}
#Sample time of exposure and length of incubation time
set.seed(123)
exposureTimes <- t0 + sample(x=0:(l-1),size=n,replace=TRUE)
symptomTimes <- exposureTimes + rincu(n)

#Time series of exposure (truth) and symptom onset (observed)
X <- table( factor(exposureTimes,levels=1:(max(symptomTimes)+dmax)))
Y <- table( factor(symptomTimes,levels=1:(max(symptomTimes)+dmax)))
#Convert Y to an sts object
Ysts <- sts(Y)

#Plot the outbreak
plot(Ysts, xaxis.labelFormat=NULL, legend=NULL)
#Add true number of exposures to the plot
lines(1:length(Y)+0.2,X,col="red",type="h",lty=2)


#Helper function to show the EM step
plotIt <- function(cur.sts) {
  plot(cur.sts,xaxis.labelFormat=NULL, legend.opts=NULL,ylim=c(0,140))
}

#Call non-parametric back-projection function with hook function but
#without bootstrapped confidence intervals
bpnp.control <- list(k=0,eps=rep(0.005,2),iter.max=rep(250,2),B=-1,hookFun=plotIt,verbose=TRUE)

#Fast C version (use argument: eq3a.method="C")! 
sts.bp <- backprojNP(Ysts, incu.pmf=inc.pmf,
    control=modifyList(bpnp.control,list(eq3a.method="C")), ylim=c(0,max(X,Y)))

#Show result
plot(sts.bp,xaxis.labelFormat=NULL,legend=NULL,lwd=c(1,1,2),lty=c(1,1,1),main="")
lines(1:length(Y)+0.2,X,col="red",type="h",lty=2)

#Do the convolution for the expectation
mu <- matrix(0,ncol=ncol(sts.bp),nrow=nrow(sts.bp))
#Loop over all series
for (j in 1:ncol(sts.bp)) { 
  #Loop over all time points
  for (t in 1:nrow(sts.bp)) {
    #Convolution, note support of inc.pmf starts at zero (move idx by 1)
    i <- seq_len(t)
    mu[t,j] <- sum(inc.pmf[t-i+1] * upperbound(sts.bp)[i,j],na.rm=TRUE)
  }
}
#Show the fit
lines(1:nrow(sts.bp)-0.5,mu[,1],col="green",type="s",lwd=3)

#Non-parametric back-projection including bootstrap CIs
bpnp.control2 <- modifyList(bpnp.control, list(hookFun=NULL, k=2,
  B=10, # in practice, use B >= 1000 !
  eq3a.method="C"))
sts.bp2 <- backprojNP(Ysts, incu.pmf=inc.pmf, control=bpnp.control2)

######################################################################
# Plot the result. This is currently a manual routine.
# ToDo: Need to specify a plot method for stsBP objects which also
#       shows the CI.
#
# Parameters:
#  stsBP - object of class stsBP which is to be plotted.
######################################################################

plot.stsBP <- function(stsBP) {
  maxy <- max(observed(stsBP),upperbound(stsBP),stsBP@ci,na.rm=TRUE)
  plot(upperbound(stsBP),type="n",ylim=c(0,maxy), ylab="Cases",xlab="time")
  if (!all(is.na(stsBP@ci))) {
    polygon( c(1:nrow(stsBP),rev(1:nrow(stsBP))),
             c(stsBP@ci[2,,1],rev(stsBP@ci[1,,1])),col="lightgray")
  }
  lines(upperbound(stsBP),type="l",lwd=2)
  legend(x="topright",c(expression(lambda[t])),lty=c(1),col=c(1),fill=c(NA),border=c(NA),lwd=c(2))

  invisible()
}

#Plot the result of k=0 and add truth for comparison. No CIs available
plot.stsBP(sts.bp)
lines(1:length(Y),X,col=2,type="h")
#Same for k=2
plot.stsBP(sts.bp2)
lines(1:length(Y),X,col=2,type="h")

Partition of a number into two factors

Description

Given a prime number factorization x, bestCombination partitions x into two groups, such that the product of the numbers in group one is as similar as possible to the product of the numbers of group two. This is useful in magic.dim.

Usage

bestCombination(x)

Arguments

x

prime number factorization

Value

a vector c(prod(set1),prod(set2))


Bayesian Outbreak Detection Algorithm (BODA)

Description

The function takes range values of a univariate surveillance time series sts and for each time point uses a negative binomial regression model to compute the predictive posterior distribution for the current observation. The (1α)100%(1-\alpha)\cdot 100\% quantile of this predictive distribution is then used as bound: If the actual observation is above the bound an alarm is raised. The Bayesian Outbreak Detection Algorithm (boda) is due to Manitz and Höhle (2013) and its implementation is illustrated in Salmon et al. (2016). However, boda should be considered as an experiment, see the Warning section below!

Usage

boda(sts, control = list(
    range=NULL, X=NULL, trend=FALSE, season=FALSE,
    prior=c('iid','rw1','rw2'), alpha=0.05, mc.munu=100, 
    mc.y=10, verbose=FALSE,
    samplingMethod=c('joint','marginals'),
    quantileMethod=c("MC","MM")
))

Arguments

sts

object of class sts (including the observed and the state time series)

control

Control object given as a list containing the following components:

range

Specifies the index of all timepoints which should be tested. If range is NULL all possible timepoints are used.

X
trend

Boolean indicating whether a linear trend term should be included in the model for the expectation the log-scale

season

Boolean to indicate whether a cyclic spline should be included.

alpha

The threshold for declaring an observed count as an aberration is the (1α)100%(1-\alpha)\cdot 100\% quantile of the predictive posterior.

mc.munu
mc.y

Number of samples of yy to generate for each par of the mean and size parameter. A total of mc.munu×mc.ymc.munu \times mc.y samples are generated.

verbose

Argument sent to the inla call. When using ESS it might be necessary to force verbose mode for INLA to work.

samplingMethod

Should one sample from the parameters joint distribution (joint) or from their respective marginal posterior distribution (marginals)?

quantileMethod

Character, either MC or MM. Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Manitz and Höhle 2013); or by sampling mc.munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.

Warning

This function is currently experimental!! It also heavily depends on the INLA package so changes there might affect the operational ability of this function. Since the computations for the Bayesian GAM are quite involved do not expect this function to be particularly fast.

Results are not reproducible if INLA uses parallelization (as by default); set INLA::inla.setOption(num.threads = "1:1") to avoid that, then do set.seed as usual.

Note

This function requires the R package INLA, which is currently not available from CRAN. It can be obtained from INLA's own repository via install.packages("INLA", repos="https://inla.r-inla-download.org/R/stable").

Author(s)

J. Manitz, M. Höhle, M. Salmon

References

Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10

Examples

## Not run: 
  ## running this example takes a couple of minutes

  #Load the campylobacteriosis data for Germany
  data("campyDE")
  #Make an sts object from the data.frame
  cam.sts <-  sts(epoch=campyDE$date,
                  observed=campyDE$case, state=campyDE$state)

  #Define monitoring period
#  range <- which(epoch(cam.sts)>=as.Date("2007-01-01"))
#  range <- which(epoch(cam.sts)>=as.Date("2011-12-10"))
  range <- tail(1:nrow(cam.sts),n=2)

  control <- list(range=range, X=NULL, trend=TRUE, season=TRUE,
                  prior='iid', alpha=0.025, mc.munu=100, mc.y=10,
                  samplingMethod = "joint")

  #Apply the boda algorithm in its simples form, i.e. spline is
  #described by iid random effects and no extra covariates
  library("INLA")  # needs to be attached
  cam.boda1 <- boda(cam.sts, control=control)

  plot(cam.boda1, xlab='time [weeks]', ylab='No. reported', dx.upperbound=0)

## End(Not run)

Bayesian Outbreak Detection in the Presence of Reporting Delays

Description

The function takes range values of the surveillance time series sts and for each time point uses a Bayesian model of the negative binomial family with log link inspired by the work of Noufaily et al. (2012) and of Manitz and Höhle (2014). It allows delay-corrected aberration detection as explained in Salmon et al. (2015). A reportingTriangle has to be provided in the control slot.

Usage

bodaDelay(sts, control = list(
  range = NULL, b = 5, w = 3, mc.munu = 100, mc.y = 10,
  pastAberrations = TRUE, verbose = FALSE,
  alpha = 0.05, trend = TRUE, limit54 = c(5,4), 
  inferenceMethod = c("asym","INLA"), quantileMethod = c("MC","MM"),
  noPeriods = 1, pastWeeksNotIncluded = NULL, delay = FALSE))

Arguments

sts

sts-object to be analysed. Needs to have a reporting triangle.

control

list of control arguments:

b

How many years back in time to include when forming the base counts.

w

Window's half-size, i.e. number of weeks to include before and after the current week in each year.

range

Specifies the index of all timepoints which should be tested. If range is NULL all possible timepoints are used.

pastAberrations

Boolean indicating whether to include an effect for past outbreaks in a second fit of the model. This option only makes sense if inferenceMethod is INLA, as it is not supported by the other inference method.

verbose

Boolean specifying whether to show extra debugging information.

alpha

An approximate (one-sided) (1α)100%(1-\alpha)\cdot 100\% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1α)100%(1-\alpha)\cdot 100\% quantile serves as an upperbound.

trend

Boolean indicating whether a trend should be included

noPeriods

Number of levels in the factor allowing to use more baseline. If equal to 1 no factor variable is created, the set of reference values is defined as in Farrington et al (1996).

inferenceMethod

Which inference method used, as defined in Salmon et al. (2015). If one chooses "INLA" then inference is performed with INLA. If one chooses "asym" (default) then the asymptotic normal approximation of the posteriori is used.

pastWeeksNotIncluded

Number of past weeks to ignore in the calculation. The default (NULL) means to use the value of control$w.

delay

Boolean indicating whether to take reporting delays into account.

mc.munu

Number of samples for the parameters of the negative binomial distribution for calculating a threshold

mc.y

Number of samples for observations when performing Monte Carlo to calculate a threshold

limit54

c(cases,period) is a vector allowing the user to change these numbers.

quantileMethod

Character, either "MC" (default) or "MM". Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Salmon et al. 2015); or by sampling mc.munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.

References

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.

Salmon, M., Schumacher, D., Stark, K., Höhle, M. (2015): Bayesian outbreak detection in the presence of reporting delays. Biometrical Journal, 57 (6), 1051-1067.

Examples

## Not run: 
data("stsNewport")
salm.Normal <- list()
salmDelayAsym <- list()
for (week in 43:45){
  listWeeks <- as.Date(row.names(stsNewport@control$reportingTriangle$n))
  dateObs <- listWeeks[isoWeekYear(listWeeks)$ISOYear==2011 &
                       isoWeekYear(listWeeks)$ISOWeek==week]
  stsC <- sts_observation(stsNewport,
                          dateObservation=dateObs,
                          cut=TRUE)
  inWeeks <- with(isoWeekYear(epoch(stsC)),
                  ISOYear == 2011 & ISOWeek >= 40 & ISOWeek <= 48)
  
  rangeTest <- which(inWeeks)
  alpha <- 0.07

  # Control slot for Noufaily method          
  controlNoufaily <- list(range=rangeTest,noPeriods=10,
                          b=4,w=3,weightsThreshold=2.58,pastWeeksNotIncluded=26,
                          pThresholdTrend=1,thresholdMethod="nbPlugin",alpha=alpha*2,
                          limit54=c(0,50))
  
  # Control slot for the Proposed algorithm with D=0 correction
  controlNormal <- list(range = rangeTest, b = 4, w = 3,
                        reweight = TRUE, mc.munu=10000, mc.y=100,
                        verbose = FALSE,
                        alpha = alpha, trend = TRUE,
                        limit54=c(0,50), 
                        noPeriods = 10, pastWeeksNotIncluded = 26,
                        delay=FALSE)
  
  # Control slot for the Proposed algorithm with D=10 correction
  controlDelayNorm <-  list(range = rangeTest, b = 4, w = 3,
                            reweight = FALSE, mc.munu=10000, mc.y=100,
                            verbose = FALSE,
                            alpha = alpha, trend = TRUE,
                            limit54=c(0,50), 
                            noPeriods = 10, pastWeeksNotIncluded = 26,
                            delay=TRUE,inferenceMethod="asym")
  
  set.seed(1)
  salm.Normal[[week]] <- farringtonFlexible(stsC, controlNoufaily)
  salmDelayAsym[[week]] <- bodaDelay(stsC, controlDelayNorm)
}

opar <- par(mfrow=c(2,3))
lapply(salmDelayAsym[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35))
lapply(salm.Normal[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35))
par(opar)

## End(Not run)

Calibration Tests for Poisson or Negative Binomial Predictions

Description

The implemented calibration tests for Poisson or negative binomial predictions of count data are based on proper scoring rules and described in detail in Wei and Held (2014). The following proper scoring rules are available: Dawid-Sebastiani score ("dss"), logarithmic score ("logs"), ranked probability score ("rps").

Usage

calibrationTest(x, ...)

## Default S3 method:
calibrationTest(x, mu, size = NULL,
                which = c("dss", "logs", "rps"),
                tolerance = 1e-4, method = 2, ...)

Arguments

x

the observed counts. All involved functions are vectorized and also accept matrices or arrays.

mu

the means of the predictive distributions for the observations x.

size

either NULL (default), indicating Poisson predictions with mean mu, or dispersion parameters of negative binomial forecasts for the observations x, parametrized as in dnbinom with variance mu*(1+mu/size).

which

a character string indicating which proper scoring rule to apply.

tolerance

absolute tolerance for the null expectation and variance of "logs" and "rps". For the latter, see the note below. Unused for which = "dss" (closed form).

method

selection of the zz-statistic: method = 2 refers to the alternative test statistic ZsZ_s^* of Wei and Held (2014, Discussion), which has been recommended for low counts. method = 1 corresponds to Equation 5 in Wei and Held (2014).

...

unused (argument of the generic).

Value

an object of class "htest", which is a list with the following components:

method

a character string indicating the type of test performed (including which scoring rule).

data.name

a character string naming the supplied x argument.

statistic

the zz-statistic of the test.

parameter

the number of predictions underlying the test, i.e., length(x).

p.value

the p-value for the test.

Note

If the gsl package is installed, its implementations of the Bessel and hypergeometric functions are used when calculating the null expectation and variance of the rps. These functions are faster and yield more accurate results (especially for larger mu).

Author(s)

Sebastian Meyer and Wei Wei

References

Wei, W. and Held, L. (2014): Calibration tests for count data. Test, 23, 787-805.

Examples

mu <- c(0.1, 1, 3, 6, pi, 100)
size <- 0.1
set.seed(1)
y <- rnbinom(length(mu), mu = mu, size = size)
calibrationTest(y, mu = mu, size = size) # p = 0.99
calibrationTest(y, mu = mu, size = 1) # p = 4.3e-05
calibrationTest(y, mu = 1, size = size) # p = 0.6959
calibrationTest(y, mu = 1, size = size, which = "rps") # p = 0.1286

Campylobacteriosis and Absolute Humidity in Germany 2002-2011

Description

Weekly number of reported campylobacteriosis cases in Germany, 2002-2011, together with the corresponding absolute humidity (in g/m^3) that week. The absolute humidity was computed according to the procedure by Dengler (1997) using the means of representative weather station data from the German Climate service.

Usage

data(campyDE)

Format

A data.frame containing the following columns

date

Date instance containing the Monday of the reporting week.

case

Number of reported cases that week.

state

Boolean indicating whether there is external knowledge about an outbreak that week

hum

Mean absolute humidity (in g/m^3) of that week as measured by a single representative weather station.

l1.hum-l5.hum

Lagged version (lagged by 1-5) of the hum covariate.

newyears

Boolean indicating whether the reporting week corresponds to the first two weeks of the year (TRUE) or not (FALSE). Note: The first week of a year is here defined as the first reporting week, which has its corresponding Monday within new year.

christmas

Boolean indicating whether the reporting week corresponds to the last two weeks of the year (TRUE) or not (FALSE). Note: This are the first two weeks before the newyears weeks.

O104period

Boolean indicating whether the reporting week corresponds to the W21-W30 period of increased gastroenteritis awareness during the O104:H4 STEC outbreak.

Source

The data on campylobacteriosis cases have been queried from the Survstat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/).

Data for the computation of absolute humidity were obtained from the German Climate Service (Deutscher Wetterdienst), Climate data of Germany, available at https://www.dwd.de.

A complete data description and an analysis of the data can be found in Manitz and Höhle (2013).

References

Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.

Examples

# Load the data
data("campyDE")

# O104 period is W21-W30 in 2011
stopifnot(all(campyDE$O104period == (
  (campyDE$date >= as.Date("2011-05-23")) &
  (campyDE$date < as.Date("2011-07-31"))
)))

# Make an sts object from the data.frame
cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state)

# Plot the result
plot(cam.sts)

CUSUM detector for time-varying categorical time series

Description

Function to process sts object by binomial, beta-binomial or multinomial CUSUM as described by Höhle (2010). Logistic, multinomial logistic, proportional odds or Bradley-Terry regression models are used to specify in-control and out-of-control parameters. The implementation is illustrated in Salmon et al. (2016).

Usage

categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL,
                 pi1=NULL, dfun=NULL, ret=c("cases","value")),...)

Arguments

stsObj

Object of class sts containing the number of counts in each of the kk categories of the response variable. Time varying number of counts ntn_t is found in slot populationFrac.

control

Control object containing several items

range

Vector of length tmaxt_{max} with indices of the observed slot to monitor.

h

Threshold to use for the monitoring. Once the CUSUM statistics is larger or equal to h we have an alarm.

pi0

(k1)×tmax(k-1) \times t_{max} in-control probability vector for all categories except the reference category.

mu1

(k1)×tmax(k-1) \times t_{max} out-of-control probability vector for all categories except the reference category.

dfun

The probability mass function (PMF) or density used to compute the likelihood ratios of the CUSUM. In a negative binomial CUSUM this is dnbinom, in a binomial CUSUM dbinom and in a multinomial CUSUM dmultinom. The function must be able to handle the arguments y, size, mu and log. As a consequence, one in the case of, e.g, the beta-binomial distribution has to write a small wrapper function.

ret

Return the necessary proportion to sound an alarm in the slot upperbound or just the value of the CUSUM statistic. Thus, ret is one of the values in c("cases","value"). Note: For the binomial PMF it is possible to compute this value explicitly, which is much faster than the numeric search otherwise conducted. In case dfun just corresponds to dbinom just set the attribute isBinomialPMF for the dfun object.

...

Additional arguments to send to dfun.

Details

The function allows the monitoring of categorical time series as described by regression models for binomial, beta-binomial or multinomial data. The later includes e.g. multinomial logistic regression models, proportional odds models or Bradley-Terry models for paired comparisons. See the Höhle (2010) reference for further details about the methodology.

Once an alarm is found the CUSUM scheme is reset (to zero) and monitoring continues from there.

Value

An sts object with observed, alarm, etc. slots trimmed to the control$range indices.

Author(s)

M. Höhle

References

Höhle, M. (2010): Online Change-Point Detection in Categorical Time Series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10

See Also

LRCUSUM.runlength

Examples

## IGNORE_RDIFF_BEGIN
have_GAMLSS <- require("gamlss")
## IGNORE_RDIFF_END

if (have_GAMLSS) {
  ###########################################################################
  #Beta-binomial CUSUM for a small example containing the time-varying
  #number of positive test out of a time-varying number of total
  #test.
  #######################################

  #Load meat inspection data
  data("abattoir")

  #Use GAMLSS to fit beta-bin regression model
  phase1 <- 1:(2*52)
  phase2  <- (max(phase1)+1) : nrow(abattoir)

  #Fit beta-binomial model using GAMLSS
  abattoir.df <- as.data.frame(abattoir)

  #Replace the observed and epoch column names to something more convenient
  dict <- c("observed"="y", "epoch"="t", "population"="n")
  replace <- dict[colnames(abattoir.df)]
  colnames(abattoir.df)[!is.na(replace)] <- replace[!is.na(replace)]

  m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t +
                      + sin(2*pi/52*t) + cos(2*pi/52*t) +
                      + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1,
                   family=BB(sigma.link="log"),
                   data=abattoir.df[phase1,c("n","y","t")])

  #CUSUM parameters
  R <- 2 #detect a doubling of the odds for a test being positive
  h <- 4 #threshold of the cusum

  #Compute in-control and out of control mean
  pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response")
  pi1 <- plogis(qlogis(pi0)+log(R))
  #Create matrix with in control and out of control proportions.
  #Categories are D=1 and D=0, where the latter is the reference category
  pi0m <- rbind(pi0, 1-pi0)
  pi1m <- rbind(pi1, 1-pi1)


  ######################################################################
  # Use the multinomial surveillance function. To this end it is necessary
  # to create a new abattoir object containing counts and proportion for
  # each of the k=2 categories. For binomial data this appears a bit
  # redundant, but generalizes easier to k>2 categories.
  ######################################################################

  abattoir2 <- sts(epoch=1:nrow(abattoir), start=c(2006,1), freq=52,
    observed=cbind(abattoir@observed, abattoir@populationFrac-abattoir@observed),
    populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac),
    state=matrix(0,nrow=nrow(abattoir),ncol=2),
    multinomialTS=TRUE)

  ######################################################################
  #Function to use as dfun in the categoricalCUSUM
  #(just a wrapper to the dBB function). Note that from v 3.0-1 the
  #first argument of dBB changed its name from "y" to "x"!
  ######################################################################
  mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) {
    return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log))
  }


  #Create control object for multinom cusum and use the categoricalCUSUM
  #method
  control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases",
		   dfun=mydBB.cusum)
  surv <- categoricalCUSUM(abattoir2, control=control,
			   sigma=exp(m.bbin$sigma.coef))

  #Show results
  plot(surv[,1],dx.upperbound=0)
  lines(pi0,col="green")
  lines(pi1,col="red")

  #Index of the alarm
  which.max(alarms(surv[,1]))
}

Check the residual process of a fitted twinSIR or twinstim

Description

Transform the residual process (cf. the residuals methods for classes "twinSIR" and "twinstim") such that the transformed residuals should be uniformly distributed if the fitted model well describes the true conditional intensity function. Graphically check this using ks.plot.unif. The transformation for the residuals tau is 1 - exp(-diff(c(0,tau))) (cf. Ogata, 1988). Another plot inspects the serial correlation between the transformed residuals (scatterplot between uiu_i and ui+1u_{i+1}).

Usage

checkResidualProcess(object, plot = 1:2, mfrow = c(1,length(plot)), ...)

Arguments

object

an object of class "twinSIR" or "twinstim".

plot

logical (or integer index) vector indicating if (which) plots of the transformed residuals should be produced. The plot index 1 corresponds to a ks.plot.unif to check for deviations of the transformed residuals from the uniform distribution. The plot index 2 corresponds to a scatterplot of uiu_i vs. ui+1u_{i+1}. By default (plot = 1:2), both plots are produced.

mfrow

see par.

...

further arguments passed to ks.plot.unif.

Value

A list (returned invisibly, if plot = TRUE) with the following components:

tau

the residual process obtained by residuals(object).

U

the transformed residuals which should be distributed as U(0,1).

ks

the result of the ks.test for the uniform distribution of U.

Author(s)

Sebastian Meyer

References

Ogata, Y. (1988) Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83, 9-27

See Also

ks.plot.unif and the residuals-method for classes "twinSIR" and "twinstim".

Examples

data("hagelloch")
fit <- twinSIR(~ household, data = hagelloch)  # a simplistic model
## extract the "residual process", i.e., the fitted cumulative intensities
residuals(fit)
## assess goodness of fit based on these residuals
checkResidualProcess(fit)  # could be better

Conditional lapply

Description

Use lapply if the input is a list and otherwise apply the function directly to the input and wrap the result in a list. The function is implemented as

    if (is.list(X)) lapply(X, FUN, ...) else list(FUN(X, ...))

Usage

clapply(X, FUN, ...)

Arguments

X

a list or a single R object on which to apply FUN.

FUN

the function to be applied to (each element of) X.

...

optional arguments to FUN.

Value

a list (of length 1 if X is not a list).


List Coefficients by Model Component

Description

S3-generic function to use with models which contain several groups of coefficients in their coefficient vector. The coeflist methods are intended to list the coefficients by group. The default method simply splits the coefficient vector given the number of coefficients by group.

Usage

coeflist(x, ...)

## Default S3 method:
coeflist(x, npars, ...)

Arguments

x

a model with groups of coefficients or, for the default method, a vector of coefficients.

npars

a named vector specifying the number of coefficients per group.

...

potential further arguments (currently ignored).

Value

a list of coefficients

Author(s)

Sebastian Meyer

Examples

## the default method just 'split's the coefficient vector
coefs <- c(a = 1, b = 3, dispersion = 0.5)
npars <- c(regression = 2, variance = 1)
coeflist(coefs, npars)

Surgical Failures Data

Description

The dataset from Steiner et al. (1999) on A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.

Usage

data(deleval)

Details

Steiner et al. (1999) use data from de Leval et al. (1994) to illustrate monitoring of failure rates of a surgical procedure for a bivariate outcome.

Over a period of six years an arterial switch operation was performed on 104 newborn babies. Since the death rate from this surgery was relatively low the idea of surgical "near miss" was introduced. It is defined as the need to reinstitute cardiopulmonary bypass after a trial period of weaning. The object of class sts contains the recordings of near misses and deaths from the surgery for the 104 newborn babies of the study.

The data could also be handled by a multinomial CUSUM model.

References

Steiner, S. H., Cook, R. J., and Farewell, V. T. (1999), Monitoring paired binary surgical outcomes using cumulative sum charts, Statistics in Medicine, 18, pp. 69–86.

De Leval, Marc R., Franiois, K., Bull, C., Brawn, W. B. and Spiegelhalter, D. (1994), Analysis of a cluster of surgical failures, Journal of Thoracic and Cardiovascular Surgery, March, pp. 914–924.

See Also

pairedbinCUSUM

Examples

data("deleval")
plot(deleval, xaxis.labelFormat=NULL,ylab="Response",xlab="Patient number")

Polygonal Approximation of a Disc/Circle

Description

Generates a polygon representing a disc/circle (in planar coordinates) as an object of one of three possible classes: "Polygon" from package sp, "owin" from package spatstat.geom, or "gpc.poly" from gpclib (if available).

Usage

discpoly(center, radius, npoly = 64,
         class = c("Polygon", "owin", "gpc.poly"),
         hole = FALSE)

Arguments

center

numeric vector of length 2 (center coordinates of the circle).

radius

single numeric value (radius of the circle).

npoly

single integer. Number of edges of the polygonal approximation.

class

class of the resulting polygon (partial name matching applies). For "owin", this is just a wrapper around spatstat.geom's own disc function.

hole

logical. Does the resulting polygon represent a hole?

Value

A polygon of class class representing a circle/disc with npoly edges accuracy.

If class="gpc.poly" and this S4 class is not yet registered in the current R session (by loading gpclib beforehand), only the pts slot of a "gpc.poly" is returned with a warning.

See Also

disc in package spatstat.geom.

Examples

## Construct circles with increasing accuracy and of different spatial classes
disc1 <- discpoly(c(0,0), 5, npoly=4, class = "owin")
disc2 <- discpoly(c(0,0), 5, npoly=16, class = "Polygon")
disc3 <- discpoly(c(0,0), 5, npoly=64, class = "gpc.poly")  # may warn

## Look at the results
print(disc1)
plot(disc1, axes=TRUE, main="", border=2)

str(disc2)
lines(disc2, col=3)

str(disc3)  # a list or a formal "gpc.poly" (if gpclib is available)
if (is(disc3, "gpc.poly")) {
  plot(disc3, add=TRUE, poly.args=list(border=4))
} else {
  lines(disc3[[1]], col=4)
}

## to only _draw_ a circle
symbols(0, 0, circles=5, inches=FALSE, add=TRUE, fg=5)

Convert disProg object to sts and vice versa

Description

A small helper function to convert a disProg object to become an object of the S4 class sts and vice versa. In the future the sts should replace the disProg class, but for now this function allows for conversion between the two formats.

Usage

disProg2sts(disProgObj, map=NULL)
   sts2disProg(sts)

Arguments

disProgObj

an object of class "disProg"

map

an optional "SpatialPolygons" object

sts

an object of class "sts" to convert

Value

an object of class "sts" or "disProg", respectively.

See Also

sts-class

Examples

data(ha)
  print(disProg2sts(ha))
  class(sts2disProg(disProg2sts(ha)))

Surveillance for a count data time series using the EARS C1, C2 or C3 method and its extensions

Description

The function takes range values of the surveillance time series sts and for each time point computes a threshold for the number of counts based on values from the recent past. This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

Usage

earsC(sts, control = list(range = NULL, method = "C1",
                          baseline = 7, minSigma = 0,
                          alpha = 0.001))

Arguments

sts

object of class sts (including the observed and the state time series) , which is to be monitored.

control

Control object

range

Specifies the index in the sts object of all the timepoints which should be monitored. If range is NULL the maximum number of possible timepoints is used (this number depends on the method chosen):

C1

all timepoints from the observation with index baseline + 1 can be monitored,

C2

timepoints from index baseline + 3 can be monitored,

C3

timepoints starting from the index baseline + 5 can be monitored.

method

String indicating which method to use:

"C1"

for EARS C1-MILD method (Default),

"C2"

for EARS C2-MEDIUM method,

"C3"

for EARS C3-HIGH method.

See Details for further information about the methods.

baseline

how many time points to use for calculating the baseline, see details

minSigma

By default 0. If minSigma is higher than 0, for C1 and C2, the quantity zAlpha * minSigma is then the alerting threshold if the baseline is zero. Howard Burkom suggests using a value of 0.5 or 1 for sparse data.

alpha

An approximate (two-sided) (1α)100%(1-\alpha)\cdot 100\% prediction interval is calculated. By default if alpha is NULL the value 0.001 is assumed for C1 and C2 whereas 0.025 is assumed for C3. These different choices are the one made at the CDC.

Details

The three methods are different in terms of baseline used for calculation of the expected value and in terms of method for calculating the expected value:

  • in C1 and C2 the expected value is the moving average of counts over the sliding window of the baseline and the prediction interval depends on the standard derivation of the observed counts in this window. They can be considered as Shewhart control charts with a small sample used for calculations.

  • in C3 the expected value is based on the sum over 3 timepoints (assessed timepoints and the two previous timepoints) of the discrepancy between observations and predictions, predictions being calculated with the C2 method. This method has similarities with a CUSUM method due to it adding discrepancies between predictions and observations over several timepoints, but is not a CUSUM (sum over 3 timepoints, not accumulation over a whole range), even if it sometimes is presented as such.

Here is what the function does for each method, see the literature sources for further details:

  1. For C1 the baseline are the baseline (default 7) timepoints before the assessed timepoint t, t-baseline to t-1. The expected value is the mean of the baseline. An approximate (two-sided) (1α)100%(1-\alpha)\cdot 100\% prediction interval is calculated based on the assumption that the difference between the expected value and the observed value divided by the standard derivation of counts over the sliding window, called C1(t)C_1(t), follows a standard normal distribution in the absence of outbreaks:

    C1(t)=Y(t)Yˉ1(t)S1(t),C_1(t)= \frac{Y(t)-\bar{Y}_1(t)}{S_1(t)},

    where

    Yˉ1(t)=1baselinei=t1tbaselineY(i)\bar{Y}_1(t)= \frac{1}{\code{baseline}} \sum_{i=t-1}^{t-\code{baseline}} Y(i)

    and

    S12(t)=16i=t1tbaseline[Y(i)Yˉ1(i)]2.S^2_1(t)= \frac{1}{6} \sum_{i=t-1}^{t-\code{baseline}} [Y(i) - \bar{Y}_1(i)]^2.

    Then under the null hypothesis of no outbreak,

    C1(t)N(0,1)C_1(t) \mathcal \> \sim \> {N}(0,1)

    An alarm is raised if

    C1(t)z1αC_1(t)\ge z_{1-\alpha}

    with z1αz_{1-\alpha} the (1α)th(1-\alpha)^{th} quantile of the standard normal distribution.

    The upperbound U1(t)U_1(t) is then defined by:

    U1(t)=Yˉ1(t)+z1αS1(t).U_1(t)= \bar{Y}_1(t) + z_{1-\alpha}S_1(t).

  2. C2 is very similar to C1 apart from a 2-day lag in the baseline definition. In other words the baseline for C2 is baseline (Default: 7) timepoints with a 2-day lag before the monitored timepoint t, i.e. (tbaseline2)(t-\code{baseline}-2) to t3t-3. The expected value is the mean of the baseline. An approximate (two-sided) (1α)100%(1-\alpha)\cdot 100\% prediction interval is calculated based on the assumption that the difference between the expected value and the observed value divided by the standard derivation of counts over the sliding window, called C2(t)C_2(t), follows a standard normal distribution in the absence of outbreaks:

    C2(t)=Y(t)Yˉ2(t)S2(t),C_2(t)= \frac{Y(t)-\bar{Y}_2(t)}{S_2(t)},

    where

    Yˉ2(t)=1baselinei=t3tbaseline2Y(i)\bar{Y}_2(t)= \frac{1}{\code{baseline}} \sum_{i=t-3}^{t-\code{baseline}-2} Y(i)

    and

    S22(t)=1baseline1i=t3tbaseline2[Y(i)Yˉ2(i)]2.S^2_2(t)= \frac{1}{\code{baseline}-1} \sum_{i=t-3}^{t-\code{baseline}-2} [Y(i) - \bar{Y}_2(i)]^2.

    Then under the null hypothesis of no outbreak,

    C2(t)N(0,1)C_2(t) \mathcal \sim {N}(0,1)

    An alarm is raised if

    C2(t)z1α,C_2(t)\ge z_{1-\alpha},

    with z1αz_{1-\alpha} the (1α)th(1-\alpha)^{th} quantile of the standard normal distribution.

    The upperbound U2(t)U_{2}(t) is then defined by:

    U2(t)=Yˉ2(t)+z1αS2(t).U_{2}(t)= \bar{Y}_{2}(t) + z_{1-\alpha}S_{2}(t).

  3. C3 is quite different from the two other methods, but it is based on C2. Indeed it uses C2(t)C_2(t) from timepoint t and the two previous timepoints. This means the baseline consists of the timepoints t(baseline+4)t-(\code{baseline}+4) to t3t-3. The statistic C3(t)C_3(t) is the sum of discrepancies between observations and predictions.

    C3(t)=i=tt2max(0,C2(i)1)C_3(t)= \sum_{i=t}^{t-2} \max(0,C_2(i)-1)

    Then under the null hypothesis of no outbreak,

    C3(t)N(0,1)C_3(t) \mathcal \sim {N}(0,1)

    An alarm is raised if

    C3(t)z1α,C_3(t)\ge z_{1-\alpha},

    with z1αz_{1-\alpha} the (1α)th(1-\alpha)^{th} quantile of the standard normal distribution.

    The upperbound U3(t)U_3(t) is then defined by:

    U3(t)=Yˉ2(t)+S2(t)(z1αi=t1t2max(0,C2(i)1)).U_3(t)= \bar{Y}_2(t) + S_2(t)\left(z_{1-\alpha}-\sum_{i=t-1}^{t-2} \max(0,C_2(i)-1)\right).

Value

An object of class sts with the slots upperbound and alarm filled by the chosen method.

Author(s)

M. Salmon, H. Burkom

Source

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10

Examples

#Sim data and convert to sts object
disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1,
                              alpha = 1, beta = 0, phi = 0,
                              frequency = 1, state = NULL, K = 1.7)
stsObj <- disProg2sts( disProgObj)


# Call earsC function and show result
res1 <- earsC(stsObj, control = list(range = 20:208, method="C1"))
plot(res1, legend.opts=list(horiz=TRUE, x="topright"))


# Compare C3 upperbounds depending on alpha
res3 <- earsC(stsObj, control = list(range = 20:208,method="C3",alpha = 0.001))
plot(upperbound(res3), type='l')
res3 <- earsC(stsObj, control = list(range = 20:208,method="C3"))
lines(upperbound(res3), col='red')

Continuous-Time SIR Event History of a Fixed Population

Description

The function as.epidata is used to generate objects of class "epidata". Objects of this class are specific data frames containing the event history of an epidemic together with some additional attributes. These objects are the basis for fitting spatio-temporal epidemic intensity models with the function twinSIR. Their implementation is illustrated in Meyer et al. (2017, Section 4), see vignette("twinSIR"). Note that the spatial information itself, i.e. the positions of the individuals, is assumed to be constant over time. Besides epidemics following the SIR compartmental model, also data from SI, SIRS and SIS epidemics may be supplied.

Usage

as.epidata(data, ...)

## S3 method for class 'data.frame'
as.epidata(data, t0,
           tE.col, tI.col, tR.col, id.col, coords.cols,
           f = list(), w = list(), D = dist,
           max.time = NULL, keep.cols = TRUE, ...)
## Default S3 method:
as.epidata(data, id.col, start.col, stop.col,
           atRiskY.col, event.col, Revent.col, coords.cols,
           f = list(), w = list(), D = dist, .latent = FALSE, ...)

## S3 method for class 'epidata'
print(x, ...)
## S3 method for class 'epidata'
x[i, j, drop]
## S3 method for class 'epidata'
update(object, f = list(), w = list(), D = dist, ...)

Arguments

data

For the data.frame-method, a data frame with as many rows as there are individuals in the population and time columns indicating when each individual became exposed (optional), infectious (mandatory, but can be NA for non-affected individuals) and removed (optional). Note that this data format does not allow for re-infection (SIRS) and time-varying covariates. The data.frame-method converts the individual-indexed data frame to the long event history start/stop format and then feeds it into the default method. If calling the generic function as.epidata on a data.frame and the t0 argument is missing, the default method is called directly.
For the default method, data can be a matrix or a data.frame. It must contain the observed event history in a form similar to Surv(, type="counting") in package survival, with additional information (variables) along the process. Rows will be sorted automatically during conversion. The observation period is split up into consecutive intervals of constant state - thus constant infection intensities. The data frame consists of a block of NN (number of individuals) rows for each of those time intervals (all rows in a block have the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the individual during the interval given by the start and stop columns start.col and stop.col.
Note that there may not be more than one event (infection or removal) in a single block. Thus, in a single block, only one entry in the event.col and Revent.col may be 1, all others are 0. This rule follows the point process characteristic that there are no concurrent events (infections or removals).

t0, max.time

observation period. In the resulting "epidata", the time scale will be relative to the start time t0. Individuals that have already been removed prior to t0, i.e., rows with tR <= t0, will be dropped. The end of the observation period (max.time) will by default (NULL, or if NA) coincide with the last observed event.

tE.col, tI.col, tR.col

single numeric or character indexes of the time columns in data, which specify when the individuals became exposed, infectious and removed, respectively. tE.col and tR.col can be missing, corresponding to SIR, SEI, or SI data. NA entries mean that the respective event has not (yet) occurred. Note that is.na(tE) implies is.na(tI) and is.na(tR), and is.na(tI) implies is.na(tR) (and this is checked for the provided data).
CAVE: Support for latent periods (tE.col) is experimental! twinSIR cannot handle them anyway.

id.col

single numeric or character index of the id column in data. The id column identifies the individuals in the data frame. It is converted to a factor by calling factor, i.e., unused levels are dropped if it already was a factor.

start.col

single index of the start column in data. Can be numeric (by column number) or character (by column name). The start column contains the (numeric) time points of the beginnings of the consecutive time intervals of the event history. The minimum value in this column, i.e. the start of the observation period should be 0.

stop.col

single index of the stop column in data. Can be numeric (by column number) or character (by column name). The stop column contains the (numeric) time points of the ends of the consecutive time intervals of the event history. The stop value must always be greater than the start value of a row.

atRiskY.col

single index of the atRiskY column in data. Can be numeric (by column number) or character (by column name). The atRiskY column indicates if the individual was “at-risk” of becoming infected during the time interval (start; stop]. This variable must be logical or in 0/1-coding. Individuals with atRiskY == 0 in the first time interval (normally the rows with start == 0) are taken as initially infectious.

event.col

single index of the event column in data. Can be numeric (by column number) or character (by column name). The event column indicates if the individual became infected at the stop time of the interval. This variable must be logical or in 0/1-coding.

Revent.col

single index of the Revent column in data. Can be numeric (by column number) or character (by column name). The Revent column indicates if the individual was recovered at the stop time of the interval. This variable must be logical or in 0/1-coding.

coords.cols

indexes of the coords columns in data. Can be numeric (by column number), character (by column name), or NULL (no coordinates, e.g., if D is a pre-specified distance matrix). These columns contain the individuals' coordinates, which determine the distance matrix for the distance-based components of the force of infection (see argument f). By default, Euclidean distance is used (see argument D).
Note that the functions related to twinSIR currently assume fixed positions of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only (normally the rows with start == 0).
The animate-method requires coordinates.

f

a named list of vectorized functions for a distance-based force of infection. The functions must interact elementwise on a (distance) matrix D so that f[[m]](D) results in a matrix. A simple example is function(u) {u <= 1}, which indicates if the Euclidean distance between the individuals is smaller than or equal to 1. The names of the functions determine the names of the epidemic variables in the resulting data frame. So, the names should not coincide with names of other covariates. The distance-based weights are computed as follows: Let I(t)I(t) denote the set of infectious individuals just before time tt. Then, for individual ii at time tt, the mm'th covariate has the value jI(t)fm(dij)\sum_{j \in I(t)} f_m(d_{ij}), where dijd_{ij} denotes entries of the distance matrix (by default this is the Euclidean distance sisj||s_i - s_j|| between the individuals' coordinates, but see argument D).

w

a named list of vectorized functions for extra covariate-based weights wijw_{ij} in the epidemic component. Each function operates on a single time-constant covariate in data, which is determined by the name of the first argument: The two function arguments should be named varname.i and varname.j, where varname is one of names(data). Similar to the components in f, length(w) epidemic covariates will be generated in the resulting "epidata" named according to names(w). So, the names should not coincide with names of other covariates. For individual ii at time tt, the mm'th such covariate has the value jI(t)wm(zi(m),zj(m))\sum_{j \in I(t)} w_m(z^{(m)}_i, z^{(m)}_j), where z(m)z^{(m)} denotes the variable in data associated with w[[m]].

D

either a function to calculate the distances between the individuals with locations taken from coord.cols (the default is Euclidean distance via the function dist) and the result converted to a matrix via as.matrix, or a pre-computed distance matrix with dimnames containing the individual ids (a classed "Matrix" is supported).

keep.cols

logical indicating if all columns in data should be retained (and not only the obligatory "epidata" columns), in particular any additional columns with time-constant individual-specific covariates. Alternatively, keep.cols can be a numeric or character vector indexing columns of data to keep.

.latent

(internal) logical indicating whether to allow for latent periods (EXPERIMENTAL). Otherwise (default), the function verifies that an event (i.e., switching to the I state) only happens when the respective individual is at risk (i.e., in the S state).

x, object

an object of class "epidata".

...

arguments passed to print.data.frame. Currently unused in the as.epidata-methods.

i, j, drop

arguments passed to [.data.frame.

Details

The print method for objects of class "epidata" simply prints the data frame with a small header containing the time range of the observed epidemic and the number of infected individuals. Usually, the data frames are quite long, so the summary method summary.epidata might be useful. Also, indexing/subsetting "epidata" works exactly as for data.frames, but there is an own method, which assures consistency of the resulting "epidata" or drops this class, if necessary. The update-method can be used to add or replace distance-based (f) or covariate-based (w) epidemic variables in an existing "epidata" object.

SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.

Value

a data.frame with the columns "BLOCK", "id", "start", "stop", "atRiskY", "event", "Revent" and the coordinate columns (with the original names from data), which are all obligatory. These columns are followed by any remaining columns of the input data. Last but not least, the newly generated columns with epidemic variables corresponding to the functions in the list f are appended, if length(f) > 0.

The data.frame is given the additional attributes

"eventTimes"

numeric vector of infection time points (sorted chronologically).

"timeRange"

numeric vector of length 2: c(min(start), max(stop)).

"coords.cols"

numeric vector containing the column indices of the coordinate columns in the resulting data frame.

"f"

this equals the argument f.

"w"

this equals the argument w.

Note

The column name "BLOCK" is a reserved name. This column will be added automatically at conversion and the resulting data frame will be sorted by this column and by id. Also the names "id", "start", "stop", "atRiskY", "event" and "Revent" are reserved for the respective columns only.

Author(s)

Sebastian Meyer

References

Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11

See Also

The hagelloch data as an example.

The plot and the summary method for class "epidata". Furthermore, the function animate.epidata for the animation of epidemics.

Function twinSIR for fitting spatio-temporal epidemic intensity models to epidemic data.

Function simEpidata for the simulation of epidemic data.

Examples

data("hagelloch")   # see help("hagelloch") for a description
head(hagelloch.df)

## convert the original data frame to an "epidata" event history
myEpi <- as.epidata(hagelloch.df, t0 = 0,
                    tI.col = "tI", tR.col = "tR", id.col = "PN",
                    coords.cols = c("x.loc", "y.loc"),
                    keep.cols = c("SEX", "AGE", "CL"))


str(myEpi)
head(as.data.frame(myEpi))  # "epidata" has event history format
summary(myEpi)              # see 'summary.epidata'
plot(myEpi)                 # see 'plot.epidata' and also 'animate.epidata'


## add distance- and covariate-based weights for the force of infection
## in a twinSIR model, see vignette("twinSIR") for a description
myEpi <- update(myEpi,
    f = list(
        household    = function(u) u == 0,
        nothousehold = function(u) u > 0
    ),
    w = list(
        c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i,
        c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i
    )
)
## this is now identical to the prepared hagelloch "epidata"
stopifnot(all.equal(myEpi, hagelloch))

Spatio-Temporal Animation of an Epidemic

Description

Function for the animation of epidemic data, i.e. objects inheriting from class "epidata". This only works with 1- or 2-dimensional coordinates and is not useful if some individuals share the same coordinates (overlapping). There are two types of animation, see argument time.spacing. Besides the direct plotting in the R session, it is also possible to generate a sequence of graphics files to create animations outside R.

Usage

## S3 method for class 'summary.epidata'
animate(object, main = "An animation of the epidemic",
        pch = 19, col = c(3, 2, gray(0.6)), time.spacing = NULL,
        sleep = quote(5/.nTimes), legend.opts = list(), timer.opts = list(),
        end = NULL, generate.snapshots = NULL, ...)

## S3 method for class 'epidata'
animate(object, ...)

Arguments

object

an object inheriting from class "epidata" or "summary.epidata". In the former case, its summary is calculated and the function continues as in the latter case, passing all ... arguments to the summary.epidata method.

main

a main title for the plot, see also title.

pch, col

vectors of length 3 specifying the point symbols and colors for susceptible, infectious and removed individuals (in this order). The vectors are recycled if necessary. By default, susceptible individuals are marked as filled green circles, infectious individuals as filled red circles and removed individuals as filled gray circles. Note that the symbols are iteratively drawn (overlaid) in the same plotting region as time proceeds. For information about the possible values of pch and col, see the help pages of points and par, respectively.

time.spacing

time interval for the animation steps. If NULL (the default), the events are plotted one by one with pauses of sleep seconds. Thus, it is just the ordering of the events, which is shown. To plot the appearance of events proportionally to the exact time line, time.spacing can be set to a numeric value indicating the period of time between consecutive plots. Then, for each time point in seq(0, end, by = time.spacing) the current state of the epidemic can be seen and an additional timer indicates the current time (see timer.opts below). The argument sleep will be the artificial pause in seconds between two of those time points.

sleep

time in seconds to Sys.sleep before the next plotting event. By default, each artificial pause is of length 5/.nTimes seconds, where .nTimes is the number of events (infections and removals) of the epidemic, which is evaluated in the function body. Thus, for time.spacing = NULL the animation has a duration of approximately 5 seconds. In the other case, sleep is the duration of the artificial pause between two time points. Note that sleep is ignored on non-interactive devices (see dev.interactive)

legend.opts

either a list of arguments passed to the legend function or NULL (or NA), in which case no legend will be plotted. All necessary arguments have sensible defaults and need not be specified, i.e.

x:

"topright"

legend:

c("susceptible", "infectious", "removed")

pch:

same as argument pch of the main function

col:

same as argument col of the main function

timer.opts

either a list of arguments passed to the legend function or NULL (or NA), in which case no timer will be plotted. All necessary arguments have sensible defaults and need not be specified, i.e.

x:

"bottomright"

title:

"time"

box.lty:

0

adj:

c(0.5,0.5)

inset:

0.01

bg:

"white"

Note that the argument legend, which is the current time of the animation, can not be modified.

end

ending time of the animation in case of time.spacing not being NULL. By default (NULL), time stops after the last event.

generate.snapshots

By default (NULL), the animation is not saved to image files but only shown on the on-screen device. In order to print to files, time.spacing must not be NULL, a screen device must be available, and there are two options:
If the framework of the animation package should be used, i.e. the animate-call is passed as the expr argument to one of the save* functions of the animation package, then set generate.snapshots = img.name, where img.name is the base name for the generated images (the same as passed to the save* function). The path and format (type, width, height) for the generated images is derived from ani.options. See the last example below.
Alternatively, generate.snapshots may be a list of arguments passed to the function dev.print, which then is executed at each time point of the grid defined by time.spacing. Essentially, this is used for saving the produced snapshots to files, e.g.

generate.snapshots = list(device=pdf, file=quote(paste("epidemic_",sprintf(form,tp),".pdf", sep="")))

will store the animation steps in pdf-files in the current working directory, where the file names each end with the time point represented by the corresponding plot. Because the variables tp and form should only be evaluated inside the function the file argument is quoted. Alternatively, the file name could also make use of the internal plot index i, e.g., use file=quote(paste("epidemic",i,".pdf",sep="")).

...

further graphical parameters passed to the basic call of plot, e.g. las, cex.axis (etc.) and mgp.

Author(s)

Sebastian Meyer

See Also

summary.epidata for the data, on which the plot is based. plot.epidata for plotting the evolution of an epidemic by the numbers of susceptible, infectious and removed individuals.

The contributed R package animation.

Examples

data("hagelloch")
(s <- summary(hagelloch))

# plot the ordering of the events only
animate(s)   # or: animate(hagelloch)

# with timer (animate only up to t=10)
animate(s, time.spacing=0.1, end=10, sleep=0.01,
        legend.opts=list(x="topleft"))

# Such an animation can be saved in various ways using tools of
# the animation package, e.g., saveHTML()
if (interactive() && require("animation")) {
  oldwd <- setwd(tempdir())  # to not clutter up the current working dir
  saveHTML({
    par(bg="white")  # default "transparent" is grey in some browsers
    animate(s, time.spacing=1, sleep=0, legend.opts=list(x="topleft"),
            generate.snapshots="epiani")
  }, use.dev=FALSE, img.name="epiani", ani.width=600, interval=0.5)
  setwd(oldwd)
}

Impute Blocks for Extra Stops in "epidata" Objects

Description

This function modifies an object inheriting from class "epidata" such that it features the specified stop time points. For this purpose, the time interval in the event history into which the new stop falls will be split up into two parts, one block for the time period until the new stop – where no infection or removal occurs – and the other block for the time period from the new stop to the end of the original interval.
Main application is to enable the use of knots in twinSIR, which are not existing stop time points in the "epidata" object.

Usage

intersperse(epidata, stoptimes, verbose = FALSE)

Arguments

epidata

an object inheriting from class "epidata".

stoptimes

a numeric vector of time points inside the observation period of the epidata.

verbose

logical indicating if a txtProgressBar should be shown while inserting blocks for extra stoptimes.

Value

an object of the same class as epidata with additional time blocks for any new stoptimes.

Author(s)

Sebastian Meyer

See Also

as.epidata.epidataCS where this function is used.

Examples

data("hagelloch")
subset(hagelloch, start < 25 & stop > 25 & id %in% 9:13, select = 1:7)
# there is no "stop" time at 25, but we can add this extra stop
nrow(hagelloch)
moreStopsEpi <- intersperse(hagelloch, stoptimes = 25)
nrow(moreStopsEpi)
subset(moreStopsEpi, (stop == 25 | start == 25) & id %in% 9:13, select = 1:7)

Plotting the Evolution of an Epidemic

Description

Functions for plotting the evolution of epidemics. The plot methods for classes "epidata" and "summary.epidata" plots the numbers of susceptible, infectious and recovered (= removed) individuals by step functions along the time axis. The function stateplot shows individual state changes along the time axis.

Usage

## S3 method for class 'summary.epidata'
plot(x,
     lty = c(2, 1, 3), lwd = 2,
     col = c("#1B9E77", "#D95F02", "#7570B3"), col.hor = col, col.vert = col,
     xlab = "Time", ylab = "Number of individuals",
     xlim = NULL, ylim = NULL, legend.opts = list(), do.axis4 = NULL,
     panel.first = grid(), rug.opts = list(),
     which.rug = c("infections", "removals", "susceptibility", "all"), ...)
## S3 method for class 'epidata'
plot(x, ...)

stateplot(x, id, ...)

Arguments

x

an object inheriting from class "epidata" or "summary.epidata". In the former case, its summary is calculated and the function continues as in the latter case. The plot method for class "epidata" is a simple wrapper for plot.summary.epidata implemented as plot(summary(x, ...)).

lty, lwd

vectors of length 3 containing the line types and widths, respectively, for the numbers of susceptible, infectious and removed individuals (in this order). By default, all lines have width 1 and the line types are dashed (susceptible), solid (infectious) and dotted (removed), respectively. To omit the drawing of a specific line, just set the corresponding entry in lty to 0. The vectors are recycled if necessary. For information about the different lty and lwd codes, see the help pages of par.

col, col.hor, col.vert

vectors of length 3 containing the line colors for the numbers of susceptible, infectious and removed individuals (in this order). col.hor defines the color for the horizontal parts of the step function, whilst col.vert defines the color for its vertical parts. The argument col is just short for col.hor = col and col.vert = col. The default col vector corresponds to brewer.pal("Dark2",n=3) from the RColorBrewer package. The vectors are recycled if necessary. For information about the possible values of col, see the help pages of par.

xlab, ylab

axis labels, default to "Time" and "Number of individuals", respectively.

xlim, ylim

the x and y limits of the plot in the form c(xmin, xmax) and c(ymin, ymax), respectively. By default, these are chosen adequately to fit the time range of the epidemic and the number of individuals.

legend.opts

if this is a list (of arguments for the legend function), a legend will be plotted. The defaults are as follows:

x:

"topright"

inset:

c(0,0.02)

legend:

c("susceptible", "infectious", "removed")

lty,lwd,col:

same as the arguments lty, lwd, and col.hor of the main function

bty:

"n"

do.axis4

logical indicating if the final numbers of susceptible and removed individuals should be indicated on the right axis. The default NULL means TRUE, if x represents a SIR epidemic and FALSE otherwise, i.e. if the epidemic is SI, SIS or SIRS.

panel.first

an expression to be evaluated after the plot axes are set up but before any plotting takes place. By default, a standard grid is drawn.

rug.opts

either a list of arguments passed to the function rug or NULL (or NA), in which case no rug will be plotted. By default, the argument ticksize is set to 0.02, col is set to the color according to which.rug (black if this is "all"), and quiet is set to TRUE. Note that the argument x, which contains the locations for the rug is fixed internally and can not be modified. The argument which.rug (see below) determines the locations to mark.

which.rug

By default, tick marks are drawn at the time points of infections. Alternatively, one can choose to mark only "removals", "susceptibilities" (i.e. state change from R to S) or "all" events.

id

single character string or factor of length 1 specifying the individual for which the stateplot should be established.

...

For plot.summary.epidata: further graphical parameters passed to plot, lines and axis, e.g. main, las, cex.axis (etc.) and mgp.
For plot.epidata: arguments passed to plot.summary.epidata.
For stateplot: arguments passed to plot.stepfun or plot.function (if id had no events during the observation period). By default, xlab="time", ylab="state", xlim=attr(x,"timeRange"), xaxs="i" and do.points=FALSE.

Value

plot.summary.epidata (and plot.epidata) invisibly returns the matrix used for plotting, which contains the evolution of the three counters.
stateplot invisibly returns the function, which was plotted, typically of class "stepfun", but maybe of class "function", if no events have been observed for the individual in question (then the function always returns the initial state). The vertical axis of stateplot can range from 1 to 3, where 1 corresponds to Susceptible, 2 to Infectious and 3 to Removed.

Author(s)

Sebastian Meyer

See Also

summary.epidata for the data, on which the plots are based. animate.epidata for the animation of epidemics.

Examples

data("hagelloch")
(s <- summary(hagelloch))

# rudimentary stateplot
stateplot(s, id = "187")

# evolution of the epidemic
plot(s)

Summarizing an Epidemic

Description

The summary method for class "epidata" gives an overview of the epidemic. Its print method shows the type of the epidemic, the time range, the total number of individuals, the initially and never infected individuals and the size of the epidemic. An excerpt of the returned counters data frame is also printed (see the Value section below).

Usage

## S3 method for class 'epidata'
summary(object, ...)

## S3 method for class 'summary.epidata'
print(x, ...)

Arguments

object

an object inheriting from class "epidata".

x

an object inheriting from class "summary.epidata", i.e. an object returned by the function summary.epidata.

...

unused (argument of the generic).

Value

A list with the following components:

type

character string. Compartmental type of the epidemic, i.e. one of "SIR", "SI", "SIS" or "SIRS".

size

integer. Size of the epidemic, i.e. the number of initially susceptible individuals, which became infected during the course of the epidemic.

initiallyInfected

factor (with the same levels as the id column in the "epidata" object). Set of initially infected individuals.

neverInfected

factor (with the same levels as the id column in the "epidata" object). Set of never infected individuals, i.e. individuals, which were neither initially infected nor infected during the course of the epidemic.

coordinates

numeric matrix of individual coordinates with as many rows as there are individuals and one column for each spatial dimension. The row names of the matrix are the ids of the individuals.

byID

data frame with time points of infection and optionally removal and re-susceptibility (depending on the type of the epidemic) ordered by id. If an event was not observed, the corresponding entry is missing.

counters

data frame containing all events (S, I and R) ordered by time. The columns are time, type (of event), corresponding id and the three counters nSusceptible, nInfectious and nRemoved. The first row additionally shows the counters at the beginning of the epidemic, where the type and id column contain missing values.

Author(s)

Sebastian Meyer

See Also

as.epidata for generating objects of class "epidata".

Examples

data("hagelloch")
s <- summary(hagelloch)
s          # uses the print method for summary.epidata
names(s)   # components of the list 's'

# positions of the individuals
plot(s$coordinates)

# events by id
head(s$byID)

Continuous Space-Time Marked Point Patterns with Grid-Based Covariates

Description

Data structure for continuous spatio-temporal event data, e.g. individual case reports of an infectious disease. Apart from the actual events, the class simultaneously holds a spatio-temporal grid of endemic covariates (similar to disease mapping) and a representation of the observation region.

The "epidataCS" class is the basis for fitting spatio-temporal endemic-epidemic intensity models with the function twinstim (Meyer et al., 2012). The implementation is described in Meyer et al. (2017, Section 3), see vignette("twinstim").

Usage

as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes),
             nCircle2Poly = 32L, T = NULL,
             clipper = "polyclip", verbose = interactive())

## S3 method for class 'epidataCS'
print(x, n = 6L, digits = getOption("digits"), ...)

## S3 method for class 'epidataCS'
nobs(object, ...)
## S3 method for class 'epidataCS'
head(x, n = 6L, ...)
## S3 method for class 'epidataCS'
tail(x, n = 6L, ...)
## S3 method for class 'epidataCS'
x[i, j, ..., drop = TRUE]
## S3 method for class 'epidataCS'
subset(x, subset, select, drop = TRUE, ...)

## S3 method for class 'epidataCS'
marks(x, coords = TRUE, ...)

## S3 method for class 'epidataCS'
summary(object, ...)
## S3 method for class 'summary.epidataCS'
print(x, ...)

## S3 method for class 'epidataCS'
as.stepfun(x, ...)

getSourceDists(object, dimension = c("space", "time"))

Arguments

events

a "SpatialPointsDataFrame" of cases with the following obligatory columns (in the events@data data.frame):

time

time point of event. Will be converted to a numeric variable by as.numeric. There should be no concurrent events (but see untie for an ex post adjustment) and there cannot be events beyond stgrid (i.e., time<=T is required). Events at or before time t0t_0 = min(stgrid$start) are allowed and form the prehistory of the process.

tile

the spatial region (tile) where the event is located. This links to the tiles of stgrid.

type

optional type of event in a marked twinstim model. Will be converted to a factor variable dropping unused levels. If missing, all events will be attribute the single type "1".

eps.t

maximum temporal influence radius (e.g. length of infectious period, time to culling, etc.); must be positive and may be Inf.

eps.s

maximum spatial influence radius (e.g. 100 [km]); must be positive and may be Inf. A compact influence region mainly has computational advantages, but might also be plausible for specific applications.

The data.frame may contain columns with further marks of the events, e.g. sex, age of infected individuals, which may be used as epidemic covariates influencing infectiousness. Note that some auxiliary columns will be added at conversion whose names are reserved: ".obsInfLength", ".bdist", ".influenceRegion", and ".sources", as well as "start", "BLOCK", and all endemic covariates' names from stgrid.

stgrid

a data.frame describing endemic covariates on a full spatio-temporal region x interval grid (e.g., district x week), which is a decomposition of the observation region W and period t0,Tt_0,T. This means that for every combination of spatial region and time interval there must be exactly one row in this data.frame, that the union of the spatial tiles equals W, the union of the time intervals equals t0,Tt_0,T, and that regions (and intervals) are non-overlapping. There are the following obligatory columns:

tile

ID of the spatial region (e.g., district ID). It will be converted to a factor variable (dropping unused levels if it already was one).

start, stop

columns describing the consecutive temporal intervals (converted to numeric variables by as.numeric). The start time of an interval must be equal to the stop time of the previous interval. The stop column may be missing, in which case it will be auto-generated from the set of start values and T.

area

area of the spatial region (tile). Be aware that the unit of this area (e.g., square km) must be consistent with the units of W and events (as specified in their proj4strings).

The remaining columns are endemic covariates. Note that the column name "BLOCK" is reserved (a column which will be added automatically for indexing the time intervals of stgrid).

W

an object of class "SpatialPolygons" representing the observation region. It must have the same proj4string as events and all events must be within W. Prior simplification of W may considerably reduce the computational burden of likelihood evaluations in twinstim models with non-trivial spatial interaction functions (see the “Note” section below).

qmatrix

a square indicator matrix (0/1 or FALSE/TRUE) for possible transmission between the event types. The matrix will be internally converted to logical. Defaults to an independent spread of the event types, i.e. the identity matrix.

nCircle2Poly

accuracy (number of edges) of the polygonal approximation of a circle, see discpoly.

T

end of observation period (i.e. last stop time of stgrid). Must be specified if the start but not the stop times are supplied in stgrid (=> auto-generation of stop times).

clipper

polygon clipping engine to use for calculating the .influenceRegions of events (see the Value section below). Default is the polyclip package (called via intersect.owin from package spatstat.geom). In surveillance <= 1.6-0, package gpclib was used; this is no longer supported, neither is rgeos.

verbose

logical indicating if status messages should be printed during input checking and "epidataCS" generation. The default is to do so in interactive R sessions.

x

an object of class "epidataCS" or "summary.epidataCS", respectively.

n

a single integer. If positive, the first (head, print) / last (tail) n events are extracted. If negative, all but the n first/last events are extracted.

digits

minimum number of significant digits to be printed in values.

i, j, drop

arguments passed to the [-method for SpatialPointDataFrames for subsetting the events while retaining stgrid and W.
If drop=TRUE (the default), event types that completely disappear due to i-subsetting will be dropped, which reduces qmatrix and the factor levels of the type column.
By the j index, epidemic covariates can be removed from events.

...

unused (arguments of the generics) with a few exceptions: The print method for "epidataCS" passes ... to the print.data.frame method, and the print method for "summary.epidataCS" passes additional arguments to print.table.

subset, select

arguments used to subset the events from an "epidataCS" object like in subset.data.frame.

coords

logical indicating if the data frame of event marks returned by marks(x) should have the event coordinates appended as last columns. This defaults to TRUE.

object

an object of class "epidataCS".

dimension

the distances of all events to their potential source events can be computed in either the "space" or "time" dimension.

Details

The function as.epidataCS is used to generate objects of class "epidataCS", which is the data structure required for twinstim models.

The [-method for class "epidataCS" ensures that the subsetted object will be valid, for instance, it updates the auxiliary list of potential transmission paths stored in the object. The [-method is used in subset.epidataCS, which is implemented similar to subset.data.frame.

The print method for "epidataCS" prints some metadata of the epidemic, e.g., the observation period, the dimensions of the spatio-temporal grid, the types of events, and the total number of events. By default, it also prints the first n = 6 rows of the events.

Value

An object of class "epidataCS" is a list containing the following components:

events

a "SpatialPointsDataFrame" (see the description of the argument). The input events are checked for requirements and sorted chronologically. The columns are in the following order: obligatory event columns, event marks, the columns BLOCK, start and endemic covariates copied from stgrid, and finally, hidden auxiliary columns. The added auxiliary columns are:

.obsInfLength

observed length of the infectious period (possibly truncated at T), i.e., pmin(T-time, eps.t).

.sources

a list of numeric vectors of potential sources of infection (wrt the interaction ranges eps.s and eps.t) for each event. Row numbers are used as index.

.bdist

minimal distance of the event locations to the polygonal boundary W.

.influenceRegion

a list of influence regions represented by objects of the spatstat.geom class "owin". For each event, this is the intersection of W with a (polygonal) circle of radius eps.s centered at the event's location, shifted such that the event location becomes the origin. The list has nCircle2Poly set as an attribute.

stgrid

a data.frame (see description of the argument). The spatio-temporal grid of endemic covariates is sorted by time interval (indexed by the added variable BLOCK) and region (tile). It is a full BLOCK x tile grid.

W

a "SpatialPolygons" object representing the observation region.

qmatrix

see the above description of the argument. The storage.mode of the indicator matrix is set to logical and the dimnames are set to the levels of the event types.

The nobs-method returns the number of events.

The head and tail methods subset the epidemic data using the extraction method ([), i.e. they return an object of class "epidataCS", which only contains (all but) the first/last n events.

For the "epidataCS" class, the method of the generic function marks defined by the spatstat.geom package returns a data.frame of the event marks (actually also including time and location of the events), disregarding endemic covariates and the auxiliary columns from the events component of the "epidataCS" object.

The summary method (which has again a print method) returns a list of metadata, event data, the tables of tiles and types, a step function of the number of infectious individuals over time ($counter), i.e., the result of the as.stepfun-method for "epidataCS", and the number of potential sources of transmission for each event ($nSources) which is based on the given maximum interaction ranges eps.t and eps.s.

Note

Since the observation region W defines the integration domain in the point process likelihood, the more detailed the polygons of W are the longer it will take to fit a twinstim. You are advised to sacrifice some shape details for speed by reducing the polygon complexity, for example via the mapshaper JavaScript library wrapped by the R package rmapshaper, or via simplify.owin.

Author(s)

Sebastian Meyer

Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.

References

Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x

Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11

See Also

vignette("twinstim").

plot.epidataCS for plotting, and animate.epidataCS for the animation of such an epidemic. There is also an update method for the "epidataCS" class.

To re-extract the events point pattern from "epidataCS", use as(object, "SpatialPointsDataFrame").

It is possible to convert an "epidataCS" point pattern to an "epidata" object (as.epidata.epidataCS), or to aggregate the events into an "sts" object (epidataCS2sts).

Examples

## load "imdepi" example data (which is an object of class "epidataCS")
data("imdepi")

## print and summary
print(imdepi, n=5, digits=2)
print(s <- summary(imdepi))
plot(s$counter,  # same as 'as.stepfun(imdepi)'
     xlab = "Time [days]", ylab="Number of infectious individuals",
     main=paste("Time course of the number of infectious individuals",
                "assuming an infectious period of 30 days", sep="\n"))
plot(table(s$nSources), xlab="Number of \"close\" infective individuals",
     ylab="Number of events",
     main=paste("Distribution of the number of potential sources",
                "assuming an interaction range of 200 km and 30 days",
                sep="\n"))
## the summary object contains further information
str(s)

## a histogram of the spatial distances to potential source events
## (i.e., to events of the previous eps.t=30 days within eps.s=200 km)
sourceDists_space <- getSourceDists(imdepi, "space")
hist(sourceDists_space); rug(sourceDists_space)

## internal structure of an "epidataCS"-object
str(imdepi, max.level=4)
## see help("imdepi") for more info on the data set

## extraction methods subset the 'events' component
imdepi[101:200,]
head(imdepi, n=1)           # only first event
tail(imdepi, n=4)           # only last 4 events
subset(imdepi, type=="B")   # only events of type B

## see help("plot.epidataCS") for convenient plot-methods for "epidataCS"


###
### reconstruct the "imdepi" object
###

## observation region
load(system.file("shapes", "districtsD.RData", package="surveillance"),
     verbose = TRUE)

## extract point pattern of events from the "imdepi" data
## a) as a data frame with coordinate columns via marks()
eventsData <- marks(imdepi)
## b) as a Spatial object via the coerce-method
events <- as(imdepi, "SpatialPointsDataFrame")

## plot observation region with events
plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]")
points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type))
legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2)

summary(events)

## space-time grid with endemic covariates
head(stgrid <- imdepi$stgrid[,-1])

## reconstruct the "imdepi" object from its components
myimdepi <- as.epidataCS(events = events, stgrid = stgrid,
                         W = stateD, qmatrix = diag(2), nCircle2Poly = 16)

## This reconstructed object should be equal to 'imdepi' as long as the internal
## structures of the embedded classes ("owin", "SpatialPolygons", ...), and
## the calculation of the influence regions by "polyclip" have not changed:
all.equal(imdepi, myimdepi)

Conversion (aggregation) of "epidataCS" to "epidata" or "sts"

Description

Continuous-time continuous-space epidemic data stored in an object of class "epidataCS" can be aggregated in space or in space and time yielding an object of class "epidata" or "sts" for use of twinSIR or hhh4 modelling, respectively.

Usage

## aggregation in space and time over 'stgrid' for use of 'hhh4' models
epidataCS2sts(object, freq, start, neighbourhood,
              tiles = NULL, popcol.stgrid = NULL, popdensity = TRUE)

## aggregation in space for use of 'twinSIR' models
## S3 method for class 'epidataCS'
as.epidata(data, tileCentroids, eps = 0.001, ...)

Arguments

object, data

an object of class "epidataCS".

freq, start

see the description of the "sts" class. The start specification should reflect the beginning of object$stgrid, i.e., the start of the first time interval.

neighbourhood

binary adjacency or neighbourhood-order matrix of the regions (tiles). If missing but tiles is given, a binary adjacency matrix will be auto-generated from tiles using functionality of the spdep package (see poly2adjmat). Since the "neighbourhood" slot in "sts" is actually optional, neighbourhood=NULL also works.

tiles

object inheriting from "SpatialPolygons" representing the regions in object$stgrid (column "tile"). It will become the "map" slot of the resulting "sts" object. Its row.names must match levels(object$stgrid$tile). If neighbourhood is provided, tiles is optional (not required for hhh4, but for plots of the resulting "sts" object).

popcol.stgrid

single character or numeric value indexing the column in object$stgrid which contains the population data (counts or densities, depending on the popdensity argument). This will become the "populationFrac" slot (optional).

popdensity

logical indicating if the column referenced by popcol.stgrid contains population densities or absolute counts.

tileCentroids

a coordinate matrix of the region centroids (i.e., the result of coordinates(tiles)). Its row names must match levels(data$stgrid$tile). This will be the coordinates used for the “population” (i.e., the tiles from "epidataCS") in the discrete-space twinSIR modelling.

eps

numeric scalar for breaking tied removal and infection times between different individuals (tiles), which might occur during conversion from "epidataCS" to "epidata". Rather dumb, this is simply done by subtracting eps from each tied removal time. One should consider other ways of breaking the tied event times.

...

unused (argument of the generic).

Details

Conversion to "sts" only makes sense if the time intervals (BLOCKs) of the stgrid are regularly spaced (to give freq intervals per year). Note that events of the prehistory (not covered by stgrid) are not included in the resulting sts object.

Some comments on the conversion to "epidata": the conversion results into SIS epidemics only, i.e. the at-risk indicator is set to 1 immediately after recovery. A tile is considered infective if at least one individual within the tile is infective, otherwise it is susceptible. The lengths of the infectious periods are taken from data$events$eps.t. There will be no f columns in the resulting "epidata". These must be generated by a subsequent call to as.epidata with desired f.

Value

epidataCS2sts: an object of class "sts" representing the multivariate time-series of the number of cases aggregated over stgrid.

as.epidata.epidataCS: an object of class "epidata" representing an SIS epidemic in form of a multivariate point process (one for each region/tile).

Author(s)

Sebastian Meyer

See Also

epidata and twinSIR

linkS4class{sts} and hhh4.

Examples

data("imdepi")
load(system.file("shapes", "districtsD.RData", package="surveillance"))

## convert imdepi point pattern into multivariate time series
imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1),
                        neighbourhood = NULL, # not needed here
                        tiles = districtsD)

## check the overall number of events by district
stopifnot(all.equal(colSums(observed(imdsts)),
                    c(table(imdepi$events$tile))))

## compare plots of monthly number of cases
opar <- par(mfrow = c(2, 1))
plot(imdepi, "time")
plot(imdsts, type = observed ~ time)
par(opar)

## plot number of cases by district in Bavaria (municipality keys 09xxx)
imd09 <- imdsts[, grep("^09", colnames(imdsts), value = TRUE), drop = TRUE]
plot(imd09, type = observed ~ unit)

## also test conversion to an SIS event history ("epidata") of the "tiles"
if (requireNamespace("intervals")) {
    imdepi_short <- subset(imdepi, time < 50)  # to reduce the runtime
    imdepi_short$stgrid <- subset(imdepi_short$stgrid, start < 50)
    imdepidata <- as.epidata(imdepi_short,
                             tileCentroids = coordinates(districtsD))
    summary(imdepidata)
}

Spatio-Temporal Animation of a Continuous-Time Continuous-Space Epidemic

Description

Function for the animation of continuous-time continuous-space epidemic data, i.e. objects inheriting from class "epidataCS". There are three types of animation, see argument time.spacing. Besides the on-screen plotting in the interactive R session, it is possible and recommended to redirect the animation to an off-screen graphics device using the contributed R package animation. For instance, the animation can be watched and navigated in a web browser via saveHTML (see Examples).

Usage

## S3 method for class 'epidataCS'
animate(object, interval = c(0,Inf), time.spacing = NULL,
        nmax = NULL, sleep = NULL, legend.opts = list(), timer.opts = list(),
        pch = 15:18, col.current = "red", col.I = "#C16E41",
        col.R = "#B3B3B3", col.influence = NULL,
        main = NULL, verbose = interactive(), ...)

Arguments

object

an object inheriting from class "epidataCS".

interval

time range of the animation.

time.spacing

time interval for the animation steps.
If NULL (the default), the events are plotted sequentially by producing a snapshot at every time point where an event occurred. Thus, it is just the ordering of the events, which is shown.
To plot the appearance of events proportionally to the exact time line, time.spacing can be set to a numeric value indicating the period of time between consecutive snapshots. Then, for each time point in seq(0, end, by = time.spacing) the current state of the epidemic can be seen and an additional timer indicates the current time (see timer.opts below).
If time.spacing = NA, then the time spacing is automatically determined in such a way that nmax snapshots result. In this case, nmax must be given a finite value.

nmax

maximum number of snapshots to generate. The default NULL means to take the value from ani.options("nmax") if the animation package is available, and no limitation (Inf) otherwise.

sleep

numeric scalar specifying the artificial pause in seconds between two time points (using Sys.sleep), or NULL (default), when this is taken from ani.options("interval") if the animation package is available, and set to 0.1 otherwise. Note that sleep is ignored on non-interactive devices (see dev.interactive), e.g., if generating an animation inside animation's saveHTML.

pch, col

vectors of length equal to the number of event types specifying the point symbols and colors for events to plot (in this order). The vectors are recycled if necessary.

legend.opts

either a list of arguments passed to the legend function or NULL (or NA), in which case no legend will be plotted. All necessary arguments have sensible defaults and need not be specified.

timer.opts

either a list of arguments passed to the legend function or NULL (or NA), in which case no timer will be plotted. All necessary arguments have sensible defaults and need not be specified, i.e.

x:

"bottomright"

title:

"time"

box.lty:

0

adj:

c(0.5,0.5)

inset:

0.01

bg:

"white"

Note that the argument legend, which is the current time of the animation, can not be modified.

col.current

color of events when occurring (new).

col.I

color once infectious.

col.R

color event has once “recovered”. If NA, then recovered events will not be shown.

col.influence

color with which the influence region is drawn. Use NULL (default) if no influence regions should be drawn.

main

optional main title placed above the map.

verbose

logical specifying if a (textual) progress bar should be shown during snapshot generation. This is especially useful if the animation is produced within saveHTML or similar.

...

further graphical parameters passed to the plot method for "SpatialPolygons".

Author(s)

Sebastian Meyer with documentation contributions by Michael Höhle

See Also

plot.epidataCS for plotting the numbers of events by time (aggregated over space) or the locations of the events in the observation region W (aggregated over time).

The contributed R package animation.

Examples

data("imdepi")
imdepiB <- subset(imdepi, type == "B")

## Not run: 
# Animate the first year of type B with a step size of 7 days
animate(imdepiB, interval=c(0,365), time.spacing=7, nmax=Inf, sleep=0.1)

# Sequential animation of type B events during the first year
animate(imdepiB, interval=c(0,365), time.spacing=NULL, sleep=0.1)

# Animate the whole time range but with nmax=20 snapshots only
animate(imdepiB, time.spacing=NA, nmax=20, sleep=0.1)

## End(Not run)

# Such an animation can be saved in various ways using the tools of
# the animation package, e.g., saveHTML()
if (interactive() && require("animation")) {
  oldwd <- setwd(tempdir())  # to not clutter up the current working dir
  saveHTML(animate(imdepiB, interval = c(0,365), time.spacing = 7),
           nmax = Inf, interval = 0.2, loop = FALSE,
           title = "Animation of the first year of type B events")
  setwd(oldwd)
}

Randomly Permute Time Points or Locations of "epidataCS"

Description

Monte Carlo tests for space-time interaction (epitest) use the distribution of some test statistic under the null hypothesis of no space-time interaction. For this purpose, the function permute.epidataCS randomly permutes the time or space labels of the events.

Usage

permute.epidataCS(x, what = c("time", "space"), keep)

Arguments

x

an object of class "epidataCS".

what

character string determining what to permute: time points (default) or locations.

keep

optional logical expression to be evaluated in the context of x$events@data, determining for which events the time and location should be kept as is. For instance, to keep some “prehistory” before time point 30 unchanged, use keep = time <= 30.

Value

the permuted "epidataCS" object.

Author(s)

Sebastian Meyer

See Also

epitest

Examples

data("imdepi")

set.seed(3)
permepi <- permute.epidataCS(imdepi, what = "time", keep = time <= 30)

print(imdepi, n = 8)
print(permepi, n = 8)
## the first 6 events are kept (as are all row.names),
## the time labels of the remaining events are shuffled
## (and events then again sorted by time),
## the marginal temporal distribution is unchanged

Plotting the Events of an Epidemic over Time and Space

Description

The plot method for class "epidataCS" either plots the number of events along the time axis (epidataCSplot_time) as a hist(), or the locations of the events in the observation region W (epidataCSplot_space). The spatial plot can be enriched with tile-specific color levels to indicate attributes such as the population (using spplot).

Usage

## S3 method for class 'epidataCS'
plot(x, aggregate = c("time", "space"), subset, by = type, ...)

epidataCSplot_time(x, subset, by = type,
                   t0.Date = NULL, breaks = "stgrid", freq = TRUE,
                   col = rainbow(nTypes), cumulative = list(),
                   add = FALSE, mar = NULL, xlim = NULL, ylim = NULL,
                   xlab = "Time", ylab = NULL, main = NULL,
                   panel.first = abline(h=axTicks(2), lty=2, col="grey"),
                   legend.types = list(), ...)

epidataCSplot_space(x, subset, by = type, tiles = x$W, pop = NULL,
                    cex.fun = sqrt, points.args = list(), add = FALSE,
                    legend.types = list(), legend.counts = list(),
                    sp.layout = NULL, ...)

Arguments

x

an object of class "epidataCS".

aggregate

character, one of "time" and "space", referring to the specific plot functions epidataCSplot_time and epidataCSplot_time, respectively. For "time", the number of events over time is plotted as hist (or hist.Date). For "space", the observation region x$W (or the tiles) and the locations of the events therein are plotted.

subset

logical expression indicating a subset of events to consider for plotting: missing values are taken as false. Note that the expression is evaluated in the data frame of event marks (marks(x)), which means that column names can be referred to by name (like in subset.data.frame).

...

in the basic plot-method further arguments are passed to the aggregate-specific plot function. In epidataCSplot_time, further graphical parameters are passed to hist or hist.Date, respectively. In epidataCSplot_space, further arguments are passed to the plot-method for "SpatialPolygons", which draws tiles.

by

an expression evaluated in marks(x), defining how events should be stratified in the plot (the result is converted to a factor), or NULL to disregard event types. By default (by = type) the plot distinguishes between event types, i.e., the bars of the temporal plot are stacked by type, and the point colors in the spatial plot differ by type, respectively.
Note: to select specific event types for plotting use the subset argument, e.g., subset=(type=="B").

t0.Date

the beginning of the observation period t0 = x$stgrid$start[1] as a "Date" (or anything coercible by as.Date without further arguments), enabling a nice x-axis using hist.Date and sensible breaks of the histogram, e.g., breaks="months". The event times then equal t0.Date + as.integer(x$events$time - t0), i.e. possible fractional parts of the event times are removed (which ensures that using breaks = "months" or other automatic types always works).

breaks

a specification of the histogram break points, see hist (or hist.Date if t0.Date is used). The default value "stgrid" is special and means to use the temporal grid points with(x$stgrid, c(start[1L], unique.default(stop))) as breaks (or their "Date" equivalents).

freq

see hist, defaults to TRUE.

col

fill colour for the bars of the histogram, defaults to the vector of rainbow colours.

cumulative

if a list (of style options), lines for the cumulative number of events (per type) will be added to the plot. Possible options are axis (logical), lab (axis label), maxat (single integer affecting the axis range), lwd, col, and offset (a numeric vector of length the number of types).

add

logical (default: FALSE) indicating if the plot should be added to an existing window. Ignored if an spplot is created (if pop is non-NULL).

mar

see par. The default (NULL) is mar <- par("mar"), with mar[4] <- mar[2] if an axis is requested for the cumulative numbers.

xlim, ylim

NULL provides automatic axis limits.

xlab, ylab

axis labels (with sensible defaults).

main

main title of the plot (defaults to no title).

panel.first

expression that should be evaluated after the plotting window has been set up but before the histogram is plotted. Defaults to adding horizontal grid lines.

legend.types

if a list (of arguments for legend), a legend for the event types is added to the plot in case there is more than one type.

tiles

the observation region x$W (default) or, alternatively, a "SpatialPolygons" representation of the tiles of x$stgrid.

pop

if tiles is a "SpatialPolygonsDataFrame", pop can specify an attribute to be displayed in a levelplot behind the point pattern, see spplot. By default (NULL), the conventional graphics system is used to display the tiles and event locations, otherwise the result is a trellis.object.

cex.fun

function which takes a vector of counts of events at each unique location and returns a (vector of) cex value(s) for the sizes of the corresponding points. Defaults to the sqrt() function, which for the default circular pch=1 means that the area of each point is proportional to the number of events at its location.

points.args

a list of (type-specific) graphical parameters for points, specifically pch, lwd, and col, which are all recycled to give the length nlevels(x$events$type). In contrast, a possible cex element should be scalar (default: 0.5) and multiplies the sizes obtained from cex.fun.

legend.counts

if a list (of arguments for legend), a legend illustrating the effect of cex.fun is added to the plot. This list may contain a special element counts, which is an integer vector specifying the counts to illustrate.

sp.layout

optional list of additional layout items in case pop is non-NULL, see spplot.

Value

For aggregate="time" (i.e., epidataCSplot_time) the data of the histogram (as returned by hist), and for aggregate="space" (i.e., epidataCSplot_space) NULL, invisibly, or the trellis.object generated by spplot (if pop is non-NULL).

Author(s)

Sebastian Meyer

See Also

animate.epidataCS

Examples

data("imdepi")

## show the occurrence of events along time
plot(imdepi, "time", main = "Histogram of event time points")
plot(imdepi, "time", by = NULL, main = "Aggregated over both event types")

## show the distribution in space
plot(imdepi, "space", lwd = 2, col = "lavender")

## with the district-specific population density in the background,
## a scale bar, and customized point style
load(system.file("shapes", "districtsD.RData", package = "surveillance"))
districtsD$log10popdens <- log10(districtsD$POPULATION/districtsD$AREA)
keylabels <- (c(1,2,5) * rep(10^(1:3), each=3))[-1]
plot(imdepi, "space", tiles = districtsD, pop = "log10popdens",
     ## modify point style for better visibility on gray background
     points.args = list(pch=c(1,3), col=c("orangered","blue"), lwd=2),
     ## metric scale bar, see proj4string(imdepi$W)
     sp.layout = layout.scalebar(imdepi$W, scale=100, labels=c("0","100 km")),
     ## gray scale for the population density and white borders
     col.regions = gray.colors(100, start=0.9, end=0.1), col = "white",
     ## color key is equidistant on log10(popdens) scale
     at = seq(1.3, 3.7, by=0.05),
     colorkey = list(labels=list(at=log10(keylabels), labels=keylabels),
                     title=expression("Population density per " * km^2)))

Update method for "epidataCS"

Description

The update method for the "epidataCS" class may be used to modify the hyperparameters ϵ\epsilon (eps.t) and δ\delta (eps.s), the indicator matrix qmatrix determining possible transmission between the event types, the numerical accuracy nCircle2Poly of the polygonal approximation, and the endemic covariates from stgrid (including the time intervals). The update method will also update the auxiliary information contained in an "epidataCS" object accordingly, e.g., the vector of potential sources of each event, the influence regions, or the endemic covariates copied from the new stgrid.

Usage

## S3 method for class 'epidataCS'
update(object, eps.t, eps.s, qmatrix, nCircle2Poly, stgrid, ...)

Arguments

object

an object of class "epidataCS".

eps.t

numeric vector of length 1 or corresponding to the number of events in object$events. The event data column eps.t specifies the maximum temporal influence radius (e.g., length of infectious period, time to culling, etc.) of the events.

eps.s

numeric vector of length 1 or corresponding to the number of events in object$events. The event data column eps.s specifies the maximum spatial influence radius of the events.

qmatrix

square indicator matrix (0/1 or TRUE/FALSE) for possible transmission between the event types.

nCircle2Poly

accuracy (number of edges) of the polygonal approximation of a circle.

stgrid

a new data.frame with endemic covariates, possibly transformed from or adding to the original object$stgrid. The grid must cover the same regions as the original, i.e., levels(object$stgrid$tile) must remain identical. See epidataCS for a detailed description of the required format.

...

unused (argument of the generic).

Value

The updated "epidataCS" object.

Author(s)

Sebastian Meyer

See Also

class "epidataCS".

Examples

data("imdepi")

## assume different interaction ranges and simplify polygons
imdepi2 <- update(imdepi, eps.t = 20, eps.s = Inf, nCircle2Poly = 16)
    
(s <- summary(imdepi))
(s2 <- summary(imdepi2))
## The update reduced the number of infectives (along time)
## because the length of the infectious periods is reduced. It also 
## changed the set of potential sources of transmission for each
## event, since the interaction is shorter in time but wider in space
## (eps.s=Inf means interaction over the whole observation region).

## use a time-constant grid
imdepi3 <- update(imdepi, stgrid = subset(imdepi$stgrid, BLOCK == 1, -stop))
(s3 <- summary(imdepi3)) # "1 time block"

Fan Plot of Forecast Distributions

Description

The fanplot() function in surveillance wraps functionality of the dedicated fanplot package, employing a different default style and optionally adding point predictions and observed values.

Usage

fanplot(quantiles, probs, means = NULL, observed = NULL, start = 1,
        fan.args = list(), means.args = list(), observed.args = list(),
        key.args = NULL, xlim = NULL, ylim = NULL, log = "",
        xlab = "Time", ylab = "No. infected", add = FALSE, ...)

Arguments

quantiles

a time x probs matrix of forecast quantiles at each time point.

probs

numeric vector of probabilities with values between 0 and 1.

means

(optional) numeric vector of point forecasts.

observed

(optional) numeric vector of observed values.

start

time index (x-coordinate) of the first prediction.

fan.args

a list of graphical parameters for the fan, e.g., to employ a different colorRampPalette as fan.col, or to enable contour lines via ln.

means.args

a list of graphical parameters for lines to modify the plotting style of the means. The default is a white line within the fan.

observed.args

a list of graphical parameters for lines to modify the plotting style of the observed values.

key.args

if a list, a color key (in fan()'s "boxfan"-style) is added to the fan chart. The list may include positioning parameters start (the x-position) and ylim (the y-range of the color key), space to modify the width of the boxfan, and rlab to modify the labels. An alternative way of labeling the quantiles is via the argument ln in fan.args.

xlim, ylim

axis ranges.

log

a character string specifying which axes are to be logarithmic, e.g., log="y" (see plot.default).

xlab, ylab

axis labels.

add

logical indicating if the fan plot should be added to an existing plot.

...

further arguments are passed to plot.default. For instance, panel.first could be used to initialize the plot with grid(nx=NA, ny=NULL) lines.

Value

NULL (invisibly), with the side effect of drawing a fan chart.

Author(s)

Sebastian Meyer

See Also

the underlying fan function in package fanplot. The function is used in plot.oneStepAhead and plot.hhh4sims.

Examples

## artificial data example to illustrate the graphical options
if (requireNamespace("fanplot")) {
    means <- c(18, 19, 20, 25, 26, 35, 34, 25, 19)
    y <- rlnorm(length(means), log(means), 0.5)
    quantiles <- sapply(1:99/100, qlnorm, log(means), seq(.5,.8,length.out=length(means)))
    
    ## default style with point predictions, color key and log-scale
    fanplot(quantiles = quantiles, probs = 1:99/100, means = means,
            observed = y, key.args = list(start = 1, space = .3), log = "y")
    
    ## with contour lines instead of a key, and different colors
    pal <- colorRampPalette(c("darkgreen", "gray93"))
    fanplot(quantiles = quantiles, probs = 1:99/100, observed = y,
            fan.args = list(fan.col = pal, ln = c(5,10,25,50,75,90,95)/100),
            observed.args = list(type = "b", pch = 19))
}

Surveillance for Univariate Count Time Series Using an Improved Farrington Method

Description

The function takes range values of the surveillance time series sts and for each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised. The implementation is illustrated in Salmon et al. (2016).

Usage

farringtonFlexible(sts, control = list(
    range = NULL, b = 5, w = 3,
    reweight = TRUE, weightsThreshold = 2.58,
    verbose = FALSE, glmWarnings = TRUE,
    alpha = 0.05, trend = TRUE, pThresholdTrend = 0.05,
    limit54 = c(5,4), powertrans = "2/3",
    fitFun = "algo.farrington.fitGLM.flexible",
    populationOffset = FALSE,
    noPeriods = 1, pastWeeksNotIncluded = NULL,
    thresholdMethod = "delta"))

Arguments

sts

object of class sts (including the observed and the state time series)

control

Control object given as a list containing the following components:

range

Specifies the index of all timepoints which should be tested. If range is NULL all possible timepoints are used.

b

How many years back in time to include when forming the base counts.

w

Window's half-size, i.e. number of weeks to include before and after the current week in each year.

reweight

Boolean specifying whether to perform reweighting step.

weightsThreshold

Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).

verbose

Boolean specifying whether to show extra debugging information.

glmWarnings

Boolean specifying whether to print warnings from the call to glm.

alpha

An approximate (one-sided) (1α)100%(1-\alpha)\cdot 100\% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1α)100%(1-\alpha)\cdot 100\% quantile serves as an upperbound.

trend

Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.

pThresholdTrend

Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).

limit54

Vector containing two numbers: cases and period. To avoid alarms in cases where the time series only has about almost no cases in the specific week the algorithm uses the following heuristic criterion (see Section 3.8 of the Farrington paper) to protect against low counts: no alarm is sounded if fewer than cases=5\code{cases}=5 reports were received in the past period=4\code{period}=4 weeks. limit54=c(cases,period) is a vector allowing the user to change these numbers. Note: As of version 0.9-7 of the package the term "last" period of weeks includes the current week - otherwise no alarm is sounded for horrible large numbers if the four weeks before that are too low.

powertrans

Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996. Use either "2/3" for skewness correction (Default), "1/2" for variance stabilizing transformation or "none" for no transformation.

fitFun

String containing the name of the fit function to be used for fitting the GLM. The only current option is "algo.farrington.fitGLM.flexible".

populationOffset

Boolean specifying whether to include a population offset in the GLM. The slot sts@population gives the population vector.

noPeriods

Number of levels in the factor allowing to use more baseline. If equal to 1 no factor variable is created, the set of reference values is defined as in Farrington et al (1996).

pastWeeksNotIncluded

Number of past weeks to ignore in the calculation. The default (NULL) means to use the value of control$w. Setting pastWeeksNotIncluded=26 might be preferable (Noufaily et al., 2012).

thresholdMethod

Method to be used to derive the upperbound. Options are "delta" for the method described in Farrington et al. (1996), "nbPlugin" for the method described in Noufaily et al. (2012), and "muan" for the method extended from Noufaily et al. (2012).

Details

The following steps are performed according to the Farrington et al. (1996) paper.

  1. Fit of the initial model with intercept, time trend if trend is TRUE, seasonal factor variable if noPeriod is bigger than 1, and population offset if populationOffset is TRUE. Initial estimation of mean and overdispersion.

  2. Calculation of the weights omega (correction for past outbreaks) if reweighting is TRUE. The threshold for reweighting is defined in control.

  3. Refitting of the model

  4. Revised estimation of overdispersion

  5. Omission of the trend, if it is not significant

  6. Repetition of the whole procedure

  7. Calculation of the threshold value using the model to compute a quantile of the predictive distribution. The method used depends on thresholdMethod, this can either be:

    "delta"

    One assumes that the prediction error (or a transformation of the prediction error, depending on powertrans), is normally distributed. The threshold is deduced from a quantile of this normal distribution using the variance and estimate of the expected count given by GLM, and the delta rule. The procedure takes into account both the estimation error (variance of the estimator of the expected count in the GLM) and the prediction error (variance of the prediction error). This is the suggestion in Farrington et al. (1996).

    "nbPlugin"

    One assumes that the new count follows a negative binomial distribution parameterized by the expected count and the overdispersion estimated in the GLM. The threshold is deduced from a quantile of this discrete distribution. This process disregards the estimation error, though. This method was used in Noufaily, et al. (2012).

    "muan"

    One also uses the assumption of the negative binomial sampling distribution but does not plug in the estimate of the expected count from the GLM, instead one uses a quantile from the asymptotic normal distribution of the expected count estimated in the GLM; in order to take into account both the estimation error and the prediction error.

  8. Computation of exceedance score

Warning: monthly data containing the last day of each month as date should be analysed with epochAsDate=FALSE in the sts object. Otherwise February makes it impossible to find some reference time points.

Value

An object of class sts with the slots upperbound and alarm filled by appropriate output of the algorithm. The control slot of the input sts is amended with the following matrix elements, all with length(range) rows:

trend

Booleans indicating whether a time trend was fitted for this time point.

trendVector

coefficient of the time trend in the GLM for this time point. If no trend was fitted it is equal to NA.

pvalue

probability of observing a value at least equal to the observation under the null hypothesis .

expected

expectation of the predictive distribution for each timepoint. It is only reported if the conditions for raising an alarm are met (enough cases).

mu0Vector

input for the negative binomial distribution to get the upperbound as a quantile (either a plug-in from the GLM or a quantile from the asymptotic normal distribution of the estimator)

phiVector

overdispersion of the GLM at each timepoint.

Author(s)

M. Salmon, M. Höhle

References

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10

See Also

algo.farrington.fitGLM,algo.farrington.threshold

Examples

data("salmonella.agona")
# Create the corresponding sts object from the old disProg object
salm <- disProg2sts(salmonella.agona)

### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS
control1 <-  list(range=282:312,
                  noPeriods=1,
                  b=4, w=3, weightsThreshold=1,
                  pastWeeksNotIncluded=3,
                  pThresholdTrend=0.05,
                  alpha=0.1)
control2 <- list(range=282:312,
                 noPeriods=10,
                 b=4, w=3, weightsThreshold=2.58,
                 pastWeeksNotIncluded=26,
                 pThresholdTrend=1,
                 alpha=0.1)
salm1 <- farringtonFlexible(salm,control=control1)
salm2 <- farringtonFlexible(salm,control=control2)

### PLOT THE RESULTS
y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE)
plot(salm1, ylim=c(0,y.max), main='S. Newport in Germany', legend.opts=NULL)
lines(1:(nrow(salm1)+1)-0.5,
      c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]),
      type="s",col='tomato4',lwd=2)
lines(1:(nrow(salm2)+1)-0.5,
      c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]),
      type="s",col="blueviolet",lwd=2)
legend("topleft",
       legend=c('Alarm','Upperbound with old options',
                'Upperbound with new options'),
       pch=c(24,NA,NA),lty=c(NA,1,1),
       bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))

Determine the k and h values in a standard normal setting

Description

Given a specification of the average run length in the (a)cceptance and (r)ejected setting determine the k and h values in a standard normal setting.

Usage

find.kh(ARLa = 500, ARLr = 7, sided = "one", method = "BFGS", verbose=FALSE)

Arguments

ARLa

average run length in acceptance setting, aka. in control state. Specifies the number of observations before false alarm.

ARLr

average run length in rejection state, aka. out of control state. Specifies the number of observations before an increase is detected (i.e. detection delay)

sided

one-sided cusum scheme

method

Which method to use in the function optim. Standard choice is BFGS, but in some situation Nelder-Mead can be advantageous.

verbose

gives extra information about the root finding process

Details

Functions from the spc package are used in a simple univariate root finding problem.

Value

Returns a list with reference value k and decision interval h.

Examples

if (requireNamespace("spc")) {
    find.kh(ARLa=500,ARLr=7,sided="one")
    find.kh(ARLa=500,ARLr=3,sided="one")
}

Find decision interval for given in-control ARL and reference value

Description

Function to find a decision interval h* for given reference value k and desired ARL γ\gamma so that the average run length for a Poisson or Binomial CUSUM with in-control parameter θ0\theta_0, reference value k and is approximately γ\gamma, i.e. ARL(h)γγ<ϵ\Big| \frac{ARL(h^*) -\gamma}{\gamma} \Big| < \epsilon, or larger, i.e. ARL(h)>γARL(h^*) > \gamma.

Usage

findH(ARL0, theta0, s = 1, rel.tol = 0.03, roundK = TRUE,
      distr = c("poisson", "binomial"), digits = 1, FIR = FALSE, ...)

hValues(theta0, ARL0, rel.tol=0.02, s = 1, roundK = TRUE, digits = 1,
        distr = c("poisson", "binomial"), FIR = FALSE, ...)

Arguments

ARL0

desired in-control ARL γ\gamma

theta0

in-control parameter θ0\theta_0

s

change to detect, see details

distr

"poisson" or "binomial"

rel.tol

relative tolerance, i.e. the search for h* is stopped if ARL(h)γγ<\Big| \frac{ARL(h^*) -\gamma}{\gamma} \Big| < rel.tol

digits

the reference value k and the decision interval h are rounded to digits decimal places

roundK

passed to findK

FIR

if TRUE, the decision interval that leads to the desired ARL for a FIR CUSUM with head start h2\frac{\code{h}}{2} is returned

...

further arguments for the distribution function, i.e. number of trials n for binomial cdf

Details

The out-of-control parameter used to determine the reference value k is specified as:

θ1=λ0+sλ0\theta_1 = \lambda_0 + s \sqrt{\lambda_0}

for a Poisson variate XPo(λ)X \sim Po(\lambda)

θ1=sπ01+(s1)π0\theta_1 = \frac{s \pi_0}{1+(s-1) \pi_0}

for a Binomial variate XBin(n,π)X \sim Bin(n, \pi)

Value

findH returns a vector and hValues returns a matrix with elements

theta0

in-control parameter

h

decision interval

k

reference value

ARL

ARL for a CUSUM with parameters k and h

rel.tol

corresponds to ARL(h)γγ\Big| \frac{ARL(h) -\gamma}{\gamma} \Big|


Find Reference Value

Description

Calculates the reference value k for a Poisson or binomial CUSUM designed to detect a shift from θ0\theta_0 to θ1\theta_1

Usage

findK(theta0, theta1, distr = c("poisson", "binomial"),
      roundK = FALSE, digits = 1, ...)

Arguments

theta0

in-control parameter

theta1

out-of-control parameter

distr

"poisson" or "binomial"

digits

the reference value k is rounded to digits decimal places

roundK

For discrete data and rational reference value there is only a limited set of possible values that the CUSUM can take (and therefore there is also only a limited set of ARLs). If roundK=TRUE, integer multiples of 0.5 are avoided when rounding the reference value k, i.e. the CUSUM can take more values.

...

further arguments for the distribution function, i.e. number of trials n for the binomial CDF.

Value

Returns reference value k.


Influenza in Southern Germany

Description

Weekly number of influenza A & B cases in the 140 districts of the two Southern German states Bavaria and Baden-Wuerttemberg, for the years 2001 to 2008. These surveillance data have been analyzed originally by Paul and Held (2011) and more recently by Meyer and Held (2014).

Usage

data(fluBYBW)

Format

An sts object containing 416×140416\times 140 observations starting from week 1 in 2001.

The population slot contains the population fractions of each district at 31.12.2001, obtained from the Federal Statistical Office of Germany.

The map slot contains an object of class "SpatialPolygonsDataFrame".

Note

Prior to surveillance version 1.6-0, data(fluBYBW) contained a redundant last row (417) filled with zeroes only.

Source

Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 6 March 2009.

References

Paul, M. and Held, L. (2011) Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118-1136.

Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743

Examples

data("fluBYBW")

# Count time series plot
plot(fluBYBW, type = observed ~ time)

# Map of disease incidence (per 100000 inhabitants) for the year 2001
plot(fluBYBW, type = observed ~ unit, tps = 1:52, total.args = list(),
     population = fluBYBW@map$X31_12_01 / 100000)
# the overall rate for 2001 shown in the bottom right corner is
sum(observed(fluBYBW[1:52,])) / sum(fluBYBW@map$X31_12_01) * 100000

## Not run: 
# Generating an animation takes a while.
# Here we take the first 20 weeks of 2001 (runtime: ~3 minutes).
# The full animation is available in Supplement A of Meyer and Held (2014)
if (require("animation")) {
    oldwd <- setwd(tempdir())  # to not clutter up the current working dir
    saveHTML(animate(fluBYBW, tps = 1:20),
             title="Evolution of influenza in Bayern and Baden-Wuerttemberg",
             ani.width=500, ani.height=600)
    setwd(oldwd)
}

## End(Not run)

Convert Dates to Character (Including Quarter Strings)

Description

An extension of format.Date with additional formatting strings for quarters. Used by linelist2sts.

Usage

formatDate(x, format)

Arguments

x

a "Date" object.

format

a character string, see strftime for possible specifications. Further to these base formats, formatDate implements:

"%Q"

the quarter as a numeric

"%OQ"

the quarter as a roman numeral

"%q"

the day within the quarter

Value

a character vector representing the input date(s) x following the format specification.

See Also

strftime

Examples

formatDate(as.Date("2021-10-13"), "%G/%OQ/%q")

Pretty p-Value Formatting

Description

Just yapf – yet another p-value formatter...

It is a wrapper around format.pval, such that by default eps = 1e-4, scientific = FALSE, digits = if (p<10*eps) 1 else 2, and nsmall = 2.

Usage

formatPval(pv, eps = 1e-4, scientific = FALSE, ...)

Arguments

pv

a numeric vector (of p-values).

eps

a numerical tolerance, see format.pval.

scientific

see format.

...

further arguments passed to format.pval (but digits and nsmall are hard-coded internally).

Value

The character vector of formatted p-values.

Examples

formatPval(c(0.9, 0.13567, 0.0432, 0.000546, 1e-8))

Fit an Endemic-Only twinstim as a Poisson-glm

Description

An endemic-only twinstim is equivalent to a Poisson regression model for the aggregated number of events, Y[t][s],kY_{[t][\bm{s}],k}, by time-space-type cell. The rate of the corresponding Poisson distribution is e[t][s]λ([t],[s],k)e_{[t][\bm{s}]} \cdot \lambda([t],[\bm{s}],k), where e[t][s]=[t][s]e_{[t][\bm{s}]} = |[t]| |[\bm{s}]| is a multiplicative offset. Thus, the glm function can be used to fit an endemic-only twinstim. However, wrapping in glm is usually slower.

Usage

glm_epidataCS(formula, data, ...)

Arguments

formula

an endemic model formula without response, comprising variables of data$stgrid and possibly the variable type for a type-specific model.

data

an object of class "epidataCS".

...

arguments passed to glm. Note that family and offset are fixed internally.

Value

a glm

Author(s)

Sebastian Meyer

Examples

data("imdepi", "imdepifit")

## Fit an endemic-only twinstim() and an equivalent model wrapped in glm()
fit_twinstim <- update(imdepifit, epidemic = ~0, siaf = NULL, subset = NULL,
                       optim.args=list(control=list(trace=0)), verbose=FALSE)
fit_glm <- glm_epidataCS(formula(fit_twinstim)$endemic, data = imdepi)

## Compare the coefficients
cbind(twinstim = coef(fit_twinstim), glm = coef(fit_glm))


### also compare to an equivalent endemic-only hhh4() fit

## first need to aggregate imdepi into an "sts" object
load(system.file("shapes", "districtsD.RData", package="surveillance"))
imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1),
                        neighbourhood = NULL, tiles = districtsD,
                        popcol.stgrid = "popdensity")

## determine the correct offset to get an equivalent model
offset <- 2 * rep(with(subset(imdepi$stgrid, !duplicated(BLOCK)),
                  stop - start), ncol(imdsts)) *
          sum(districtsD$POPULATION) * population(imdsts)

## fit the model using hhh4()
fit_hhh4 <- hhh4(imdsts, control = list(
    end = list(
        f = addSeason2formula(~I(start/365-3.5), period=365, timevar="start"),
        offset = offset
    ), family = "Poisson", subset = 1:nrow(imdsts),
    data = list(start=with(subset(imdepi$stgrid, !duplicated(BLOCK)), start))))

summary(fit_hhh4)
stopifnot(all.equal(coef(fit_hhh4), coef(fit_glm), check.attributes=FALSE))

Hepatitis A in Berlin

Description

Number of Hepatitis A cases among adult (age>18) males in Berlin, 2001-2006. An increase is seen during 2006.

Usage

data("ha")
data("ha.sts")

Format

ha is a disProg object containing 290×12290\times 12 observations starting from week 1 in 2001 to week 30 in 2006. ha.sts was generated from ha via the converter function disProg2sts and includes a map of Berlin's districts.

Source

Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 25 August 2006.

Robert Koch Institut, Epidemiologisches Bulletin 33/2006, p.290.

Examples

## deprecated "disProg" object
data("ha")
ha
plot(aggregate(ha))

## new-style "sts" object
data("ha.sts")
ha.sts
plot(ha.sts, type = observed ~ time)  # = plot(aggregate(ha.sts, by = "unit"))
plot(ha.sts, type = observed ~ unit, labels = TRUE)

1861 Measles Epidemic in the City of Hagelloch, Germany

Description

Data on the 188 cases in the measles outbreak among children in the German city of Hagelloch (near Tübingen) 1861. The data were originally collected by Dr. Albert Pfeilsticker (1863) and augmented and re-analysed by Dr. Heike Oesterle (1992). This dataset is used to illustrate the twinSIR model class in vignette("twinSIR").

Usage

data("hagelloch")

Format

Loading data("hagelloch") gives two objects: hagelloch and hagelloch.df. The latter is the original data.frame of 188 rows with individual information for each infected child. hagelloch has been generated from hagelloch.df via as.epidata (see the Examples below) to obtain an "epidata" object for use with twinSIR. It contains the entire SIR event history of the outbreak (but not all of the covariates).

The covariate information in hagelloch.df is as follows:

PN:

patient number

NAME:

patient name (as a factor)

FN:

family index

HN:

house number

AGE:

age in years

SEX:

gender of the individual (factor: male, female)

PRO:

Date of prodromes

ERU:

Date of rash

CL:

class (factor: preschool, 1st class, 2nd class)

DEAD:

Date of death (with missings)

IFTO:

number of patient who is the putative source of infection (0 = unknown)

SI:

serial interval = number of days between dates of prodromes of infection source and infected person

C:

complications (factor: no complications, bronchopneumonia, severe bronchitis, lobar pneumonia, pseudocroup, cerebral edema)

PR:

duration of prodromes in days

CA:

number of cases in family

NI:

number of initial cases

GE:

generation number of the case

TD:

day of max. fever (days after rush)

TM:

max. fever (degree Celsius)

x.loc:

x coordinate of house (in meters). Scaling in metres is obtained by multiplying the original coordinates by 2.5 (see details in Neal and Roberts (2004))

y.loc:

y coordinate of house (in meters). See also the above description of x.loc.

tPRO:

Time of prodromes (first symptoms) in days after the start of the epidemic (30 Oct 1861).

tERU:

Time upon which the rash first appears.

tDEAD:

Time of death, if available.

tR:

Time at which the infectious period of the individual is assumed to end. This unknown time is calculated as

tRi=min(tDEADi,tERUi+d0),tR_i = \min(tDEAD_i, tERU_i+d_0),

where – as in Section 3.1 of Neal and Roberts (2004) – we use d0=3d_0=3.

tI:

Time at which the individual is assumed to become infectious. Actually this time is unknown, but we use

tIi=tPROid1,tI_i = tPRO_i - d_1,

where d1=1d_1=1 as in Neal and Roberts (2004).

The time variables describe the transitions of the individual in an Susceptible-Infectious-Recovered (SIR) model. Note that in order to avoid ties in the event times resulting from daily interval censoring, the times have been jittered uniformly within the respective day. The time point 0.5 would correspond to noon of 30 Oct 1861.

The hagelloch "epidata" object only retains some of the above covariates to save space. Apart from the usual "epidata" event columns, hagelloch contains a number of extra variables representing distance- and covariate-based weights for the force of infection:

household:

the number of currently infectious children in the same household (including the child itself if it is currently infectious).

nothousehold:

the number of currently infectious children outside the household.

c1, c2:

the number of children infectious during the respective time block and being members of class 1 and 2, respectively; but the value is 0 if the individual of the row is not herself a member of the respective class.

Such epidemic covariates can been computed by specifying suitable f and w arguments in as.epidata at conversion (see the code below), or at a later step via the update-method for "epidata".

Source

Thanks to Peter J. Neal, University of Manchester, for providing us with these data, which he again became from Niels Becker, Australian National University. To cite the data, the main references are Pfeilsticker (1863) and Oesterle (1992).

References

Pfeilsticker, A. (1863). Beiträge zur Pathologie der Masern mit besonderer Berücksichtigung der statistischen Verhältnisse, M.D. Thesis, Eberhard-Karls-Universität Tübingen. Available as https://archive.org/details/beitrgezurpatho00pfeigoog.

Oesterle, H. (1992). Statistische Reanalyse einer Masernepidemie 1861 in Hagelloch, M.D. Thesis, Eberhard-Karls-Universitäat Tübingen.

Neal, P. J. and Roberts, G. O (2004). Statistical inference and model selection for the 1861 Hagelloch measles epidemic, Biostatistics 5(2):249-261

See Also

data class: epidata

point process model: twinSIR

illustration with hagelloch: vignette("twinSIR")

Examples

data("hagelloch")
head(hagelloch.df)   # original data documented in Oesterle (1992)
head(as.data.frame(hagelloch))   # "epidata" event history format

## How the "epidata" 'hagelloch' was created from 'hagelloch.df'
stopifnot(all.equal(hagelloch,
  as.epidata(
    hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR",
    id.col = "PN", coords.cols = c("x.loc", "y.loc"),
    f = list(
        household    = function(u) u == 0,
        nothousehold = function(u) u > 0
    ),
    w = list(
        c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i,
        c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i
    ),
    keep.cols = c("SEX", "AGE", "CL"))
))


### Basic plots produced from hagelloch.df

# Show case locations as in Neal & Roberts (different scaling) using
# the data.frame (promoted to a SpatialPointsDataFrame)
coordinates(hagelloch.df) <- c("x.loc","y.loc")
plot(hagelloch.df, xlab="x [m]", ylab="x [m]", pch=15, axes=TRUE,
     cex=sqrt(multiplicity(hagelloch.df)))

# Epicurve
hist(as.numeric(hagelloch.df$tI), xlab="Time (days)", ylab="Cases", main="")


### "epidata" summary and plot methods

(s <- summary(hagelloch))
head(s$byID)
plot(s)

## Not run: 
  # Show a dynamic illustration of the spread of the infection
  animate(hagelloch, time.spacing=0.1, sleep=1/100,
          legend.opts=list(x="topleft"))

## End(Not run)

Hepatitis A in Germany

Description

Weekly number of reported hepatitis A infections in Germany 2001-2004.

Usage

data(hepatitisA)

Format

A disProg object containing 208×1208\times 1 observations starting from week 1 in 2001 to week 52 in 2004.

Source

Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 11-01-2005.

Examples

data(hepatitisA)
plot(hepatitisA)

Fitting HHH Models with Random Effects and Neighbourhood Structure

Description

Fits an autoregressive Poisson or negative binomial model to a univariate or multivariate time series of counts. The characteristic feature of hhh4 models is the additive decomposition of the conditional mean into epidemic and endemic components (Held et al, 2005). Log-linear predictors of covariates and random intercepts are allowed in all components; see the Details below. A general introduction to the hhh4 modelling approach and its implementation is given in the vignette("hhh4"). Meyer et al (2017, Section 5, available as vignette("hhh4_spacetime")) describe hhh4 models for areal time series of infectious disease counts.

Usage

hhh4(stsObj,
     control = list(
         ar = list(f = ~ -1, offset = 1, lag = 1),
         ne = list(f = ~ -1, offset = 1, lag = 1,
                   weights = neighbourhood(stsObj) == 1,
                   scale = NULL, normalize = FALSE),
         end = list(f = ~ 1, offset = 1),
         family = c("Poisson", "NegBin1", "NegBinM"),
         subset = 2:nrow(stsObj),
         optimizer = list(stop = list(tol=1e-5, niter=100),
                          regression = list(method="nlminb"),
                          variance = list(method="nlminb")),
         verbose = FALSE,
         start = list(fixed=NULL, random=NULL, sd.corr=NULL),
         data = list(t = stsObj@epoch - min(stsObj@epoch)),
         keep.terms = FALSE
     ),
     check.analyticals = FALSE)

Arguments

stsObj

object of class "sts" containing the (multivariate) count data time series.

control

a list containing the model specification and control arguments:

ar

Model for the autoregressive component given as list with the following components:

f = ~ -1

a formula specifying log(λit)\log(\lambda_{it})

offset = 1

optional multiplicative offset, either 1 or a matrix of the same dimension as observed(stsObj)

lag = 1

a positive integer meaning autoregression on yi,tlagy_{i,t-lag}

ne

Model for the neighbour-driven component given as list with the following components:

f = ~ -1

a formula specifying log(ϕit)\log(\phi_{it})

offset = 1

optional multiplicative offset, either 1 or a matrix of the same dimension as observed(stsObj)

lag = 1

a non-negative integer meaning dependency on yj,tlagy_{j,t-lag}

weights = neighbourhood(stsObj) == 1

neighbourhood weights wjiw_{ji}. The default corresponds to the original formulation by Held et al (2005), i.e., the spatio-temporal component incorporates an unweighted sum over the lagged cases of the first-order neighbours. See Paul et al (2008) and Meyer and Held (2014) for alternative specifications, e.g., W_powerlaw. Time-varying weights are possible by specifying an array of dim() c(nUnits, nUnits, nTime), where nUnits=ncol(stsObj) and nTime=nrow(stsObj).

scale = NULL

optional matrix of the same dimensions as weights (or a vector of length ncol(stsObj)) to scale the weights to scale * weights.

normalize = FALSE

logical indicating if the (scaled) weights should be normalized such that each row sums to 1.

end

Model for the endemic component given as list with the following components

f = ~ 1

a formula specifying log(νit)\log(\nu_{it})

offset = 1

optional multiplicative offset eite_{it}, either 1 or a matrix of the same dimension as observed(stsObj)

family

Distributional family – either "Poisson", or the Negative Binomial distribution. For the latter, the overdispersion parameter can be assumed to be the same for all units ("NegBin1"), to vary freely over all units ("NegBinM"), or to be shared by some units (specified by a factor of length ncol(stsObj) such that its number of levels determines the number of overdispersion parameters). Note that "NegBinM" is equivalent to factor(colnames(stsObj), levels = colnames(stsObj)).

subset

Typically 2:nrow(obs) if model contains autoregression

optimizer

a list of three lists of control arguments.

The "stop" list specifies two criteria for the outer optimization of regression and variance parameters: the relative tolerance for parameter change using the criterion max(abs(x[i+1]-x[i])) / max(abs(x[i])), and the maximum number niter of outer iterations.

Control arguments for the single optimizers are specified in the lists named "regression" and "variance". method="nlminb" is the default optimizer for both (taking advantage of the analytical Fisher information matrices), however, the methods from optim may also be specified (as well as "nlm" but that one is not recommended here). Especially for the variance updates, Nelder-Mead optimization (method="Nelder-Mead") is an attractive alternative. All other elements of these two lists are passed as control arguments to the chosen method, e.g., if method="nlminb", adding iter.max=50 increases the maximum number of inner iterations from 20 (default) to 50. For method="Nelder-Mead", the respective argument is called maxit and defaults to 500.

verbose

non-negative integer (usually in the range 0:3) specifying the amount of tracing information to be output during optimization.

start

a list of initial parameter values replacing initial values set via fe and ri. Since surveillance 1.8-2, named vectors are matched against the coefficient names in the model (where unmatched start values are silently ignored), and need not be complete, e.g., start = list(fixed = c("-log(overdisp)" = 0.5)) (default: 2) for a family = "NegBin1" model. In contrast, an unnamed start vector must specify the full set of parameters as used by the model.

data

a named list of covariates that are to be included as fixed effects (see fe) in any of the 3 component formulae. By default, the time variable t is available and used for seasonal effects created by addSeason2formula. In general, covariates in this list can be either vectors of length nrow(stsObj) interpreted as time-varying but common across all units, or matrices of the same dimension as the disease counts observed(stsObj).

keep.terms

logical indicating if the terms object used in the fit is to be kept as part of the returned object. This is usually not necessary, since the terms object is reconstructed by the terms-method for class "hhh4" if necessary (based on stsObj and control, which are both part of the returned "hhh4" object).

The auxiliary function makeControl might be useful to create such a list of control parameters.

check.analyticals

logical (or a subset of c("numDeriv", "maxLik")), indicating if (how) the implemented analytical score vector and Fisher information matrix should be checked against numerical derivatives at the parameter starting values, using the packages numDeriv and/or maxLik. If activated, hhh4 will return a list containing the analytical and numerical derivatives for comparison (no ML estimation will be performed). This is mainly intended for internal use by the package developers.

Details

An endemic-epidemic multivariate time-series model for infectious disease counts YitY_{it} from units i=1,,Ii=1,\dots,I during periods t=1,,Tt=1,\dots,T was proposed by Held et al (2005) and was later extended in a series of papers (Paul et al, 2008; Paul and Held, 2011; Held and Paul, 2012; Meyer and Held, 2014). In its most general formulation, this so-called hhh4 (or HHH or H3H^3 or triple-H) model assumes that, conditional on past observations, YitY_{it} has a Poisson or negative binomial distribution with mean

μit=λityi,t1+ϕitjiwjiyj,t1+eitνit\mu_{it} = \lambda_{it} y_{i,t-1} + \phi_{it} \sum_{j\neq i} w_{ji} y_{j,t-1} + e_{it} \nu_{it}

In the case of a negative binomial model, the conditional variance is μit(1+ψiμit)\mu_{it}(1+\psi_i\mu_{it}) with overdispersion parameters ψi>0\psi_i > 0 (possibly shared across different units, e.g., ψiψ\psi_i\equiv\psi). Univariate time series of counts YtY_t are supported as well, in which case hhh4 can be regarded as an extension of glm.nb to account for autoregression. See the Examples below for a comparison of an endemic-only hhh4 model with a corresponding glm.nb.

The three unknown quantities of the mean μit\mu_{it},

  • λit\lambda_{it} in the autoregressive (ar) component,

  • ϕit\phi_{it} in the neighbour-driven (ne) component, and

  • νit\nu_{it} in the endemic (end) component,

are log-linear predictors incorporating time-/unit-specific covariates. They may also contain unit-specific random intercepts as proposed by Paul and Held (2011). The endemic mean is usually modelled proportional to a unit-specific offset eite_{it} (e.g., population numbers or fractions); it is possible to include such multiplicative offsets in the epidemic components as well. The wjiw_{ji} are transmission weights reflecting the flow of infections from unit jj to unit ii. If weights vary over time (prespecified as a 3-dimensional array (wjit)(w_{jit})), the ne sum in the mean uses wjityj,t1w_{jit} y_{j,t-1}. In spatial hhh4 applications, the “units” refer to geographical regions and the weights could be derived from movement network data. Alternatively, the weights wjiw_{ji} can be estimated parametrically as a function of adjacency order (Meyer and Held, 2014), see W_powerlaw.

(Penalized) Likelihood inference for such hhh4 models has been established by Paul and Held (2011) with extensions for parametric neighbourhood weights by Meyer and Held (2014). Supplied with the analytical score function and Fisher information, the function hhh4 by default uses the quasi-Newton algorithm available through nlminb to maximize the log-likelihood. Convergence is usually fast even for a large number of parameters. If the model contains random effects, the penalized and marginal log-likelihoods are maximized alternately until convergence.

Value

hhh4 returns an object of class "hhh4", which is a list containing the following components:

coefficients

named vector with estimated (regression) parameters of the model

se

estimated standard errors (for regression parameters)

cov

covariance matrix (for regression parameters)

Sigma

estimated variance-covariance matrix of random effects

Sigma.orig

estimated variance parameters on internal scale used for optimization

Sigma.cov

inverse of marginal Fisher information (on internal scale), i.e., the asymptotic covariance matrix of Sigma.orig

call

the matched call

dim

vector with number of fixed and random effects in the model

loglikelihood

(penalized) loglikelihood evaluated at the MLE

margll

(approximate) log marginal likelihood should the model contain random effects

convergence

logical. Did optimizer converge?

fitted.values

fitted mean values μi,t\mu_{i,t}

control

control object of the fit

terms

the terms object used in the fit if keep.terms = TRUE and NULL otherwise

stsObj

the supplied stsObj

lags

named integer vector of length two containing the lags used for the epidemic components "ar" and "ne", respectively. The corresponding lag is NA if the component was not included in the model.

nObs

number of observations used for fitting the model

nTime

number of time points used for fitting the model

nUnit

number of units (e.g. areas) used for fitting the model

runtime

the proc.time-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class "proc_time"

Author(s)

Michaela Paul, Sebastian Meyer, Leonhard Held

References

Held, L., Höhle, M. and Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5 (3), 187-199. doi:10.1191/1471082X05st098oa

Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27 (29), 6250-6267. doi:10.1002/sim.4177

Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177

Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54 (6), 824-843. doi:10.1002/bimj.201200037

Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743

Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11

See Also

See the special functions fe, ri and the examples below for how to specify unit-specific effects.

Further details on the modelling approach and illustrations of its implementation can be found in vignette("hhh4") and vignette("hhh4_spacetime").

Examples

######################
## Univariate examples
######################

### weekly counts of salmonella agona cases, UK, 1990-1995

data("salmonella.agona")
## convert old "disProg" to new "sts" data class
salmonella <- disProg2sts(salmonella.agona)
salmonella
plot(salmonella)

## generate formula for an (endemic) time trend and seasonality
f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52)
f.end
## specify a simple autoregressive negative binomial model
model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1")
## fit this model to the data
res <- hhh4(salmonella, model1)
## summarize the model fit
summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE)
plot(res)
plot(res, type = "season", components = "end")


### weekly counts of meningococcal infections, Germany, 2001-2006

data("influMen")
fluMen <- disProg2sts(influMen)
meningo <- fluMen[