Title: | Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena |
---|---|
Description: | Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>. |
Authors: | Michael Hoehle [aut, ths] |
Maintainer: | Sebastian Meyer <[email protected]> |
License: | GPL-2 |
Version: | 1.24.1.9000 |
Built: | 2025-02-08 06:25:36 UTC |
Source: | https://github.com/r-forge/surveillance |
The R package surveillance implements statistical methods for the retrospective modeling and prospective monitoring of epidemic phenomena in temporal and spatio-temporal contexts. Focus is on (routinely collected) public health surveillance data, but the methods just as well apply to data from environmetrics, econometrics or the social sciences. As many of the monitoring methods rely on statistical process control methodology, the package is also relevant to quality control and reliability engineering.
The package implements many typical outbreak detection procedures such
as Stroup et al. (1989), Farrington et al. (1996), Rossi et al. (1999),
Rogerson and Yamada (2001), a Bayesian approach (Höhle, 2007),
negative binomial CUSUM methods (Höhle and Mazick, 2009), and a
detector based on generalized likelihood ratios (Höhle
and Paul, 2008), see wrap.algo
.
Also CUSUMs for the prospective change-point detection in binomial,
beta-binomial and multinomial time series are covered based on
generalized linear modeling, see categoricalCUSUM
.
This includes, e.g., paired comparison Bradley-Terry modeling described
in Höhle (2010), or paired binary CUSUM
(pairedbinCUSUM
) described by Steiner et al. (1999).
The package contains several real-world datasets, the ability
to simulate outbreak data, visualize the results of the monitoring in
temporal, spatial or spatio-temporal fashion. In dealing with time
series data, the fundamental data structure of the package is the S4
class sts
wrapping observations, monitoring results and
date handling for multivariate time series.
A recent overview of the available monitoring procedures is
given by Salmon et al. (2016).
For the retrospective analysis of epidemic spread, the package
provides three endemic-epidemic modeling frameworks with
tools for visualization, likelihood inference, and simulation.
The function hhh4
offers inference methods for the
(multivariate) count time series models of Held et al. (2005), Paul et
al. (2008), Paul and Held (2011), Held and Paul (2012), and Meyer and
Held (2014). See vignette("hhh4")
for a general introduction
and vignette("hhh4_spacetime")
for a discussion and
illustration of spatial hhh4
models.
Self-exciting point processes are modeled through endemic-epidemic
conditional intensity functions.
twinSIR
(Höhle, 2009) models the
susceptible-infectious-recovered (SIR) event history of a
fixed population, e.g, epidemics across farms or networks;
see vignette("twinSIR")
for an illustration.
twinstim
(Meyer et al., 2012) fits spatio-temporal point
process models to point patterns of infective events, e.g.,
time-stamped geo-referenced surveillance data on infectious disease
occurrence; see vignette("twinstim")
for an illustration.
A recent overview of the implemented space-time modeling frameworks
for epidemic phenomena is given by Meyer et al. (2017).
Substantial contributions of code by: Leonhard Held, Howard Burkom, Thais Correa, Mathias Hofmann, Christian Lang, Juliane Manitz, Sophie Reichert, Andrea Riebler, Daniel Sabanes Bove, Maelle Salmon, Dirk Schumacher, Stefan Steiner, Mikko Virtanen, Wei Wei, Valentin Wimmer.
Furthermore, the authors would like to thank the following people for ideas, discussions, testing and feedback: Doris Altmann, Johannes Bracher, Caterina De Bacco, Johannes Dreesman, Johannes Elias, Marc Geilhufe, Jim Hester, Kurt Hornik, Mayeul Kauffmann, Junyi Lu, Lore Merdrignac, Tim Pollington, Marcos Prates, André Victor Ribeiro Amaral, Brian D. Ripley, François Rousseu, Barry Rowlingson, Christopher W. Ryan, Klaus Stark, Yann Le Strat, André Michael Toschke, Wei Wei, George Wood, Achim Zeileis, Bing Zhang.
Michael Hoehle, Sebastian Meyer, Michaela Paul
Maintainer: Sebastian Meyer [email protected]
citation(package="surveillance")
gives the two main software
references for the modeling (Meyer et al., 2017) and the monitoring
(Salmon et al., 2016) functionalities:
Meyer S, Held L, Höhle M (2017). “Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance.” Journal of Statistical Software, 77(11), 1–55. doi:10.18637/jss.v077.i11.
Salmon M, Schumacher D, Höhle M (2016). “Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance.” Journal of Statistical Software, 70(10), 1–35. doi:10.18637/jss.v070.i10.
Further references are listed in surveillance:::REFERENCES
.
If you use the surveillance package in your own work, please do cite the corresponding publications.
https://surveillance.R-forge.R-project.org/
## Additional documentation and illustrations of the methods are ## available in the form of package vignettes and demo scripts: vignette(package = "surveillance") demo(package = "surveillance")
## Additional documentation and illustrations of the methods are ## available in the form of package vignettes and demo scripts: vignette(package = "surveillance") demo(package = "surveillance")
A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.
data(abattoir)
data(abattoir)
The object of class "sts"
contains an artificial data set
inspired by meat inspection data used by Danish Pig Production,
Denmark. For each week the number of pigs with positive audit reports
is recorded together with the total number of audits made that week.
Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.
data("abattoir") plot(abattoir) population(abattoir)
data("abattoir") plot(abattoir) population(abattoir)
"sts"
Objects
Add a nicely formatted x-axis to time series plots related to the
"sts"
class. This utility function is, e.g., used
by stsplot_time1
and plotHHH4_fitted1
.
addFormattedXAxis(x, epochsAsDate = FALSE, xaxis.tickFreq = list("%Q"=atChange), xaxis.labelFreq = xaxis.tickFreq, xaxis.labelFormat = "%G\n\n%OQ", ...)
addFormattedXAxis(x, epochsAsDate = FALSE, xaxis.tickFreq = list("%Q"=atChange), xaxis.labelFreq = xaxis.tickFreq, xaxis.labelFormat = "%G\n\n%OQ", ...)
x |
an object of class |
epochsAsDate |
a logical indicating if the old ( |
xaxis.labelFormat , xaxis.tickFreq , xaxis.labelFreq
|
see the details below. |
... |
further arguments passed to |
The setting epochsAsDate = TRUE
enables very flexible formatting of the x-axis and its
annotations using the xaxis.tickFreq
, xaxis.labelFreq
and xaxis.labelFormat
arguments. The first two are named lists containing
pairs with the name being a strftime
single
conversion specification and the second part is a function which based
on this conversion returns a subset of the rows in the sts
objects. The subsetting function has the following header:
function(x,xm1)
, where x
is a vector containing
the result of applying the conversion in name
to the epochs of
the sts
object and xm1
is the scalar result when
applying the conversion to the natural element just before the first
epoch. Please note that the input to the subsetting function is converted
using as.numeric
before calling the function. Hence, the
conversion specification needs to result in a string convertible to integer.
Three predefined subsetting functions exist:
atChange
, at2ndChange
and atMedian
, which
are used to make a tick at each (each 2nd for at2ndChange
)
change and at the median index computed on all having the same value,
respectively:
atChange <- function(x,xm1) which(diff(c(xm1,x)) != 0) at2ndChange <- function(x,xm1) which(diff(c(xm1,x) %/% 2) != 0) atMedian <- function(x,xm1) tapply(seq_along(x), INDEX=x, quantile, prob=0.5, type=3)
By defining own functions here, one can obtain an arbitrary degree of flexibility.
Finally, xaxis.labelFormat
is a strftime
compatible formatting string., e.g. the default value is
"%G\n\n%OQ"
, which means ISO year and quarter (in roman
letters) stacked on top of each other.
NULL
(invisibly). The function is called for its side effects.
Michael Höhle with contributions by Sebastian Meyer
the examples in stsplot_time1
and plotHHH4_fitted1
This function helps to construct a formula
object that
can be used in a call to hhh4
to model
seasonal variation via a sum of sine and cosine terms.
addSeason2formula(f = ~1, S = 1, period = 52, timevar = "t")
addSeason2formula(f = ~1, S = 1, period = 52, timevar = "t")
f |
formula that the seasonal terms should be added to,
defaults to an intercept |
S |
number of sine and cosine terms. If |
period |
period of the season, defaults to 52 for weekly data. |
timevar |
the time variable in the model. Defaults to |
The function adds the seasonal terms
for to an existing formula
f
.
Note the following equivalence when interpreting the coefficients of the seasonal terms:
with amplitude
and phase shift
.
The amplitude and phase shift can be obtained from a fitted
hhh4
model via coef(..., amplitudeShift = TRUE)
,
see coef.hhh4
.
Returns a formula
with the seasonal terms added and
its environment set to .GlobalEnv
.
Note that to use the resulting formula in hhh4
,
a time variable named as specified by the argument timevar
must
be available.
M. Paul, with contributions by S. Meyer
# add 2 sine/cosine terms to a model with intercept and linear trend addSeason2formula(f = ~ 1 + t, S = 2) # the same for monthly data addSeason2formula(f = ~ 1 + t, S = 2, period = 12) # different number of seasons for a bivariate time series addSeason2formula(f = ~ 1, S = c(3, 1), period = 52)
# add 2 sine/cosine terms to a model with intercept and linear trend addSeason2formula(f = ~ 1 + t, S = 2) # the same for monthly data addSeason2formula(f = ~ 1 + t, S = 2, period = 12) # different number of seasons for a bivariate time series addSeason2formula(f = ~ 1, S = c(3, 1), period = 52)
"sts"
Object Over Time or Across UnitsAggregate the matrix slots of an "sts"
object.
Either the time series is aggregated so a new sampling frequency of
nfreq
observations per year is obtained (i.e., as in
aggregate.ts
), or the aggregation is over all
columns (units).
## S4 method for signature 'sts' aggregate(x, by = "time", nfreq = "all", ...)
## S4 method for signature 'sts' aggregate(x, by = "time", nfreq = "all", ...)
x |
an object of class |
by |
a string being either |
nfreq |
new sampling frequency for |
... |
unused (argument of the generic). |
an object of class "sts"
.
Aggregation over units fills the upperbound slot with
NA
s and the map
slot is left as-is, but the object
cannot be plotted by unit any longer.
The populationFrac
slot is aggregated just like observed
.
Population fractions are recomputed if and only if x
is no
multinomialTS
and already contains population fractions.
This might not be intended, especially for aggregation over time.
data("ha.sts") dim(ha.sts) dim(aggregate(ha.sts, by = "unit")) dim(aggregate(ha.sts, nfreq = 13))
data("ha.sts") dim(ha.sts) dim(aggregate(ha.sts, by = "unit")) dim(aggregate(ha.sts, nfreq = 13))
Evaluation of timepoints with the Bayes subsystem 1, 2, 3 or a self defined Bayes subsystem.
algo.bayesLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes(disProgObj, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes1(disProgObj, control = list(range = range)) algo.bayes2(disProgObj, control = list(range = range)) algo.bayes3(disProgObj, control = list(range = range))
algo.bayesLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes(disProgObj, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes1(disProgObj, control = list(range = range)) algo.bayes2(disProgObj, control = list(range = range)) algo.bayes3(disProgObj, control = list(range = range))
disProgObj |
object of class disProg (including the observed and the state chain) |
timePoint |
time point which should be evaluated in
|
control |
control object: |
Using the reference values the quantile of the
predictive posterior distribution is calculated as a threshold.
An alarm is given if the actual value is bigger or equal than this threshold.
It is possible to show using analytical computations that the predictive
posterior in this case is the negative
binomial distribution. Note:
algo.rki
or algo.farrington
use two-sided prediction intervals – if one wants to compare with
these procedures it is necessary to use an alpha, which is half the
one used for these procedures.
Note also that algo.bayes
calls
algo.bayesLatestTimepoint
for the values specified in
range
and for the system specified in control
.
algo.bayes1
, algo.bayes2
, algo.bayes3
call
algo.bayesLatestTimepoint
for the values specified in
range
for the Bayes 1 system, Bayes 2 system or Bayes 3 system.
"Bayes 1"
reference values from 6 weeks. Alpha is fixed a
t 0.05.
"Bayes 2"
reference values from 6 weeks ago and
13 weeks of the previous year (symmetrical around the
same week as the current one in the previous year). Alpha is fixed at 0.05.
"Bayes 3"
18 reference values. 9 from the year ago
and 9 from two years ago (also symmetrical around the
comparable week). Alpha is fixed at 0.05.
The procedure is now able to handle NA
's in the reference
values. In the summation and when counting the number of observed
reference values these are simply not counted.
survRes |
|
M. Höhle, A. Riebler, C. Lang
Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor's thesis.
algo.call
, algo.rkiLatestTimepoint
and algo.rki
for
the RKI system.
disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test for bayes 1 the latest timepoint algo.bayesLatestTimepoint(disProg) # Test week 200 to 208 for outbreaks with a selfdefined bayes algo.bayes(disProg, control = list(range = 200:208, b = 1, w = 5, actY = TRUE,alpha=0.05)) # The same for bayes 1 to bayes 3 algo.bayes1(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes2(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes3(disProg, control = list(range = 200:208,alpha=0.05))
disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test for bayes 1 the latest timepoint algo.bayesLatestTimepoint(disProg) # Test week 200 to 208 for outbreaks with a selfdefined bayes algo.bayes(disProg, control = list(range = 200:208, b = 1, w = 5, actY = TRUE,alpha=0.05)) # The same for bayes 1 to bayes 3 algo.bayes1(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes2(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes3(disProg, control = list(range = 200:208,alpha=0.05))
Transmission of a object of class disProg to the specified surveillance algorithm.
algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki", range = range, b = 2, w = 4, actY = TRUE), list(funcName = "rki", range = range, b = 2, w = 5, actY = TRUE)))
algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki", range = range, b = 2, w = 4, actY = TRUE), list(funcName = "rki", range = range, b = 2, w = 5, actY = TRUE)))
disProgObj |
object of class disProg, which includes the state chain and the observed |
control |
specifies which surveillance algorithm should be used with their parameters.
The parameter |
a list of survRes objects generated by the specified surveillance algorithm
algo.rki
, algo.bayes
, algo.farrington
# Create a test object disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProg, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) # show selected survRes objects names(survRes) plot(survRes[["rki(6,6,0)"]]) survRes[["bayes(5,5,1)"]]
# Create a test object disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProg, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) # show selected survRes objects names(survRes) plot(survRes[["rki(6,6,0)"]]) survRes[["bayes(5,5,1)"]]
Surveillance using the CDC Algorithm
algo.cdcLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 5, m = 1, alpha=0.025)) algo.cdc(disProgObj, control = list(range = range, b= 5, m=1, alpha = 0.025))
algo.cdcLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 5, m = 1, alpha=0.025)) algo.cdc(disProgObj, control = list(range = range, b= 5, m=1, alpha = 0.025))
disProgObj |
object of class disProg (including the observed and the state chain). |
timePoint |
time point which should be evaluated in |
control |
control object: |
Using the reference values for calculating an upper limit, alarm is
given if the actual value is bigger than a computed threshold.
algo.cdc
calls algo.cdcLatestTimepoint
for the values
specified in range
and for the system specified in
control
. The threshold is calculated from the predictive
distribution, i.e.
which corresponds to Equation 8-1 in Farrington and Andrews (2003).
Note that an aggregation into 4-week blocks occurs in
algo.cdcLatestTimepoint
and m
denotes number of 4-week
blocks (months) to use as reference values. This function currently
does the same for monthly data (not correct!)
algo.cdcLatestTimepoint
returns a list of class survRes
(surveillance result), which
includes the alarm value (alarm = 1, no alarm = 0) for recognizing an
outbreak, the threshold value for recognizing the alarm and
the input object of class disProg.
algo.cdc
gives a list of class survRes
which
includes the vector of alarm values for every timepoint in
range
, the vector of threshold values for every timepoint
in range
for the system specified by b
, w
,
the range and the input object of class disProg.
M. Höhle
Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurrence of notifiable diseases surveillance data. Statistics in Medicine 8, 323–329. doi:10.1002/sim.4780080312
Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.
algo.rkiLatestTimepoint
,algo.bayesLatestTimepoint
and algo.bayes
for the Bayes system.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 500, A = 1,alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined cdc algo.cdc(disProgObj, control = list(range = 400:500,alpha=0.025))
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 500, A = 1,alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined cdc algo.cdc(disProgObj, control = list(range = 400:500,alpha=0.025))
Comparison of specified surveillance algorithms using quality values.
algo.compare(survResList)
algo.compare(survResList)
survResList |
a list of survRes objects to compare via quality values. |
Matrix with values from algo.quality
, i.e. quality
values for every surveillance algorithm found in survResults
.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) algo.compare(survRes)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) algo.compare(survRes)
Approximate one-side CUSUM method for a Poisson variate based on the cumulative sum of the deviation between a reference value k and the transformed observed values. An alarm is raised if the cumulative sum equals or exceeds a prespecified decision boundary h. The function can handle time varying expectations.
algo.cusum(disProgObj, control = list(range = range, k = 1.04, h = 2.26, m = NULL, trans = "standard", alpha = NULL))
algo.cusum(disProgObj, control = list(range = range, k = 1.04, h = 2.26, m = NULL, trans = "standard", alpha = NULL))
disProgObj |
object of class disProg (including the observed and the state chain) |
control |
control object:
|
algo.cusum
gives a list of class "survRes"
which includes the
vector of alarm values for every timepoint in range
and the vector
of cumulative sums for every timepoint in range
for the system
specified by k
and h
, the range and the input object of
class "disProg"
.
The upperbound
entry shows for each time instance the number of diseased individuals
it would have taken the cusum to signal. Once the CUSUM signals no resetting is applied, i.e.
signals occurs until the CUSUM statistic again returns below the threshold.
In case control$m="glm"
was used, the returned
control$m.glm
entry contains the fitted "glm"
object.
This implementation is experimental, but will not be developed further.
M. Paul and M. Höhle
G. Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122
D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986
# Xi ~ Po(5), i=1,...,500 set.seed(321) stsObj <- sts(observed = rpois(500,lambda=5)) # there should be no alarms as mean doesn't change res <- cusum(stsObj, control = list(range = 100:500, trans = "anscombe")) plot(res, xaxis.labelFormat = NULL) # simulated data disProgObj <- sim.pointSource(p = 1, r = 1, length = 250, A = 0, alpha = log(5), beta = 0, phi = 10, frequency = 10, state = NULL, K = 0) plot(disProgObj) # Test weeks 200 to 250 for outbreaks surv0 <- algo.cusum(disProgObj, control = list(range = 200:250)) plot(surv0, xaxis.years = FALSE) # alternatively, using the newer "sts" interface stsObj <- disProg2sts(disProgObj) surv <- cusum(stsObj, control = list(range = 200:250)) plot(surv) stopifnot(upperbound(surv) == surv0$upperbound)
# Xi ~ Po(5), i=1,...,500 set.seed(321) stsObj <- sts(observed = rpois(500,lambda=5)) # there should be no alarms as mean doesn't change res <- cusum(stsObj, control = list(range = 100:500, trans = "anscombe")) plot(res, xaxis.labelFormat = NULL) # simulated data disProgObj <- sim.pointSource(p = 1, r = 1, length = 250, A = 0, alpha = log(5), beta = 0, phi = 10, frequency = 10, state = NULL, K = 0) plot(disProgObj) # Test weeks 200 to 250 for outbreaks surv0 <- algo.cusum(disProgObj, control = list(range = 200:250)) plot(surv0, xaxis.years = FALSE) # alternatively, using the newer "sts" interface stsObj <- disProg2sts(disProgObj) surv <- cusum(stsObj, control = list(range = 200:250)) plot(surv) stopifnot(upperbound(surv) == surv0$upperbound)
Implements the procedure of Farrington et al. (1996).
At each time point of the specified range
, a GLM is fitted to
predict the counts. This is then compared to the observed
counts. If the observation is above a specific quantile of
the prediction interval, then an alarm is raised.
# original interface for a single "disProg" time series algo.farrington(disProgObj, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, plot=FALSE, alpha=0.05, trend=TRUE, limit54=c(5,4), powertrans="2/3", fitFun="algo.farrington.fitGLM.fast")) # wrapper for "sts" data, possibly multivariate farrington(sts, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, alpha=0.05), ...)
# original interface for a single "disProg" time series algo.farrington(disProgObj, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, plot=FALSE, alpha=0.05, trend=TRUE, limit54=c(5,4), powertrans="2/3", fitFun="algo.farrington.fitGLM.fast")) # wrapper for "sts" data, possibly multivariate farrington(sts, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, alpha=0.05), ...)
disProgObj |
an object of class |
control |
list of control parameters
|
sts |
an object of class |
... |
arguments for |
The following steps are performed according to the Farrington et al. (1996) paper.
fit of the initial model and initial estimation of mean and overdispersion.
calculation of the weights omega (correction for past outbreaks)
refitting of the model
revised estimation of overdispersion
rescaled model
omission of the trend, if it is not significant
repetition of the whole procedure
calculation of the threshold value
computation of exceedance score
For algo.farrington
, a list object of class "survRes"
with elements alarm
, upperbound
, trend
,
disProgObj
, and control
.
For farrington
, the input "sts"
object with updated
alarm
, upperbound
and control
slots, and subsetted
to control$range
.
M. Höhle
A statistical algorithm for the early detection of outbreaks of infectious disease, Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996), J. R. Statist. Soc. A, 159, 547-563.
algo.farrington.fitGLM
,
algo.farrington.threshold
An improved Farrington algorithm is available as function
farringtonFlexible
.
#load "disProg" data data("salmonella.agona") #Do surveillance for the last 42 weeks n <- length(salmonella.agona$observed) control <- list(b=4,w=3,range=(n-42):n,reweight=TRUE, verbose=FALSE,alpha=0.01) res <- algo.farrington(salmonella.agona,control=control) plot(res) #Generate Poisson counts and create an "sts" object set.seed(123) x <- rpois(520,lambda=1) stsObj <- sts(observed=x, frequency=52) if (surveillance.options("allExamples")) { #Compare timing of the two possible fitters for algo.farrington range <- 312:520 system.time( sts1 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM.fast"), verbose=FALSE)) system.time( sts2 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM"), verbose=FALSE)) #Check if results are the same stopifnot(upperbound(sts1) == upperbound(sts2)) }
#load "disProg" data data("salmonella.agona") #Do surveillance for the last 42 weeks n <- length(salmonella.agona$observed) control <- list(b=4,w=3,range=(n-42):n,reweight=TRUE, verbose=FALSE,alpha=0.01) res <- algo.farrington(salmonella.agona,control=control) plot(res) #Generate Poisson counts and create an "sts" object set.seed(123) x <- rpois(520,lambda=1) stsObj <- sts(observed=x, frequency=52) if (surveillance.options("allExamples")) { #Compare timing of the two possible fitters for algo.farrington range <- 312:520 system.time( sts1 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM.fast"), verbose=FALSE)) system.time( sts2 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM"), verbose=FALSE)) #Check if results are the same stopifnot(upperbound(sts1) == upperbound(sts2)) }
Weights are assigned according to the Anscombe residuals
algo.farrington.assign.weights(s, weightsThreshold=1)
algo.farrington.assign.weights(s, weightsThreshold=1)
s |
Vector of standardized Anscombe residuals |
weightsThreshold |
A scalar indicating when observations are seen as outlier. In the original Farrington proposal the value was 1 (default value), in the improved version this value is suggested to be 2.58. |
Weights according to the residuals
The function fits a Poisson regression model (GLM) with mean predictor
as specified by the Farrington procedure. If requested, Anscombe residuals are computed based on an initial fit and a 2nd fit is made using weights, where base counts suspected to be caused by earlier outbreaks are downweighted.
algo.farrington.fitGLM(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.fast(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.populationOffset(response, wtime, population, timeTrend=TRUE,reweight=TRUE, ...)
algo.farrington.fitGLM(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.fast(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.populationOffset(response, wtime, population, timeTrend=TRUE,reweight=TRUE, ...)
response |
The vector of observed base counts |
wtime |
Vector of week numbers corresponding to |
timeTrend |
Boolean whether to fit the |
reweight |
Fit twice – 2nd time with Anscombe residuals |
population |
Population size. Possibly used as offset, i.e. in
This provides a way to adjust the Farrington procedure to the case of greatly varying populations. Note: This is an experimental implementation with methodology not covered by the original paper. |
... |
Used to catch additional arguments, currently not used. |
Compute weights from an initial fit and rescale using
Anscombe based residuals as described in the
anscombe.residuals
function.
Note that algo.farrington.fitGLM
uses the glm
routine
for fitting. A faster alternative is provided by
algo.farrington.fitGLM.fast
which uses the glm.fit
function directly (thanks to Mikko Virtanen). This saves
computational overhead and increases speed for 500 monitored time
points by a factor of approximately two. However, some of the
routine glm
functions might not work on the output of this
function. Which function is used for algo.farrington
can be
controlled by the control$fitFun
argument.
an object of class GLM with additional fields wtime
,
response
and phi
. If the glm
returns without
convergence NULL
is returned.
anscombe.residuals
,algo.farrington
Depending on the current transformation ,
is used to compute a prediction interval. The prediction variance consists of a component due to the variance of having a single observation and a prediction variance.
algo.farrington.threshold(pred,phi,alpha=0.01,skewness.transform="none",y)
algo.farrington.threshold(pred,phi,alpha=0.01,skewness.transform="none",y)
pred |
A GLM prediction object |
phi |
Current overdispersion parameter (superfluous?) |
alpha |
Quantile level in Gaussian based CI, i.e. an |
skewness.transform |
Skewness correction, i.e. one of
|
y |
Observed number |
Vector of length four with lower and upper bounds of an
confidence interval (first two
arguments) and corresponding quantile of observation
y
together with the median of the predictive distribution.
Count data regression charts for the monitoring of surveillance time series as proposed by Höhle and Paul (2008). The implementation is described in Salmon et al. (2016).
algo.glrnb(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4)) algo.glrpois(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4))
algo.glrnb(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4)) algo.glrpois(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4))
disProgObj |
object of class |
control |
A list controlling the behaviour of the algorithm
|
This function implements the seasonal count data chart based on generalized likelihood ratio (GLR) as described in the Höhle and Paul (2008) paper. A moving-window generalized likelihood ratio detector is used, i.e. the detector has the form
where instead of the GLR statistic is
computed for all
. To
achieve the typical behaviour from
use
Mtilde=1
and M=-1
.
So is the time point where the GLR statistic is above the
threshold the first time: An alarm is given and the surveillance is
reset starting from time
. Note that the same
c.ARL
as before is used, but if mu0
is different at
compared to time
the run length
properties differ. Because
c.ARL
to obtain a specific ARL can
only be obtained my Monte Carlo simulation there is no good way to
update c.ARL
automatically at the moment. Also, FIR GLR-detectors
might be worth considering.
In case is.null(theta)
and alpha>0
as well as
ret="cases"
then a brute-force search is conducted for each time
point in range in order to determine the number of cases necessary
before an alarm is sounded. In case no alarm was sounded so far by time
, the function increases
until an alarm is sounded any
time before time point
. If no alarm is sounded by
xMax
, a return value
of 1e99 is given. Similarly, if an alarm was sounded by time the
function counts down instead. Note: This is slow experimental code!
At the moment, window limited “intercept
” charts have not been
extensively tested and are at the moment not supported. As speed is
not an issue here this doesn't bother too much. Therefore, a value of
M=-1
is always used in the intercept charts.
algo.glrpois
simply calls algo.glrnb
with
control$alpha
set to 0.
algo.glrnb
returns a list of class
survRes
(surveillance result), which includes the alarm
value for recognizing an outbreak (1 for alarm, 0 for no alarm),
the threshold value for recognizing the alarm and the input object
of class disProg. The upperbound
slot of the object are
filled with the current value or with the number of
cases that are necessary to produce an alarm at any time point
. Both lead to the same alarm timepoints, but
"cases"
has an obvious interpretation.
M. Höhle with contributions by V. Wimmer
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
##Simulate data and apply the algorithm S <- 1 ; t <- 1:120 ; m <- length(t) beta <- c(1.5,0.6,0.6) omega <- 2*pi/52 #log mu_{0,t} base <- beta[1] + beta[2] * cos(omega*t) + beta[3] * sin(omega*t) #Generate example data with changepoint and tau=tau tau <- 100 kappa <- 0.4 mu0 <- exp(base) mu1 <- exp(base + kappa) ## Poisson example #Generate data set.seed(42) x <- rpois(length(t),mu0*(exp(kappa)^(t>=tau))) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, change="intercept",ret="value",dir="inc") glr.ts <- glrpois(s.ts,control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) lr.ts <- glrpois(s.ts,control=c(cntrl,theta=0.4)) plot(lr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #using the legacy interface for "disProg" data lr.ts0 <- algo.glrpois(sts2disProg(s.ts), control=c(cntrl,theta=0.4)) stopifnot(upperbound(lr.ts) == lr.ts0$upperbound) ## NegBin example #Generate data set.seed(42) alpha <- 0.2 x <- rnbinom(length(t),mu=mu0*(exp(kappa)^(t>=tau)),size=1/alpha) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run GLR based detection cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="value",dir="inc") glr.ts <- glrnb(s.ts, control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #CUSUM LR detection with backcalculated number of cases cntrl2 = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="cases",dir="inc",theta=1.2) glr.ts2 <- glrnb(s.ts, control=cntrl2) plot(glr.ts2, xaxis.labelFormat=NULL)
##Simulate data and apply the algorithm S <- 1 ; t <- 1:120 ; m <- length(t) beta <- c(1.5,0.6,0.6) omega <- 2*pi/52 #log mu_{0,t} base <- beta[1] + beta[2] * cos(omega*t) + beta[3] * sin(omega*t) #Generate example data with changepoint and tau=tau tau <- 100 kappa <- 0.4 mu0 <- exp(base) mu1 <- exp(base + kappa) ## Poisson example #Generate data set.seed(42) x <- rpois(length(t),mu0*(exp(kappa)^(t>=tau))) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, change="intercept",ret="value",dir="inc") glr.ts <- glrpois(s.ts,control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) lr.ts <- glrpois(s.ts,control=c(cntrl,theta=0.4)) plot(lr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #using the legacy interface for "disProg" data lr.ts0 <- algo.glrpois(sts2disProg(s.ts), control=c(cntrl,theta=0.4)) stopifnot(upperbound(lr.ts) == lr.ts0$upperbound) ## NegBin example #Generate data set.seed(42) alpha <- 0.2 x <- rnbinom(length(t),mu=mu0*(exp(kappa)^(t>=tau)),size=1/alpha) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run GLR based detection cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="value",dir="inc") glr.ts <- glrnb(s.ts, control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #CUSUM LR detection with backcalculated number of cases cntrl2 = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="cases",dir="inc",theta=1.2) glr.ts2 <- glrnb(s.ts, control=cntrl2) plot(glr.ts2, xaxis.labelFormat=NULL)
This function implements on-line HMM detection of outbreaks based on
the retrospective procedure described in Le Strat and Carret (1999).
Using the function msm
(from package msm) a specified HMM
is estimated, the decoding problem, i.e. the most probable state
configuration, is found by the Viterbi algorithm and the most
probable state of the last observation is recorded. On-line
detection is performed by sequentially repeating this procedure.
Warning: This function can be very slow - a more efficient implementation would be nice!
algo.hmm(disProgObj, control = list(range=range, Mtilde=-1, noStates=2, trend=TRUE, noHarmonics=1, covEffectEqual=FALSE, saveHMMs = FALSE, extraMSMargs=list()))
algo.hmm(disProgObj, control = list(range=range, Mtilde=-1, noStates=2, trend=TRUE, noHarmonics=1, covEffectEqual=FALSE, saveHMMs = FALSE, extraMSMargs=list()))
disProgObj |
object of class disProg (including the observed and the state chain) |
control |
control object:
|
For each time point t the reference values values are extracted. If the number of requested values is larger than the number of possible values the latter is used. Now the following happens on these reference values:
A noStates
-State Hidden Markov Model (HMM) is used based on
the Poisson distribution with linear predictor on the log-link
scale. I.e.
where
and noHarmonics
and depending on the
sampling frequency of the surveillance data. In the above
is
used, because the first week is always saved as
t=1
, i.e. we
want to ensure that the first observation corresponds to cos(0) and
sin(0).
If covEffectEqual
then all covariate effects parameters are
equal for the states, i.e. for all
.
In case more complicated HMM models are to be fitted it is possible to
modify the msm
code used in this function. Using
e.g. AIC
one can select between different models (see the
msm package for further details).
Using the Viterbi algorithms the most probable state configuration
is obtained for the reference values and if the most probable
configuration for the last reference value (i.e. time t) equals
control$noOfStates
then an alarm is given.
Note: The HMM is re-fitted from scratch every time, sequential updating schemes of the HMM would increase speed considerably! A major advantage of the approach is that outbreaks in the reference values are handled automatically.
algo.hmm
gives a list of class survRes
which includes the
vector of alarm values for every timepoint in range
. No
upperbound
can be specified and is put equal to zero.
The resulting object contains a list control$hmms
, which
contains the "msm"
objects with the fitted HMMs
(if saveHMMs=TRUE
).
M. Höhle
Y. Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478
I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70
#Simulate outbreak data from HMM set.seed(123) counts <- sim.pointSource(p = 0.98, r = 0.8, length = 3*52, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.5) ## Not run: #Do surveillance using a two state HMM without trend component and #the effect of the harmonics being the same in both states. A sliding #window of two years is used to fit the HMM surv <- algo.hmm(counts, control=list(range=(2*52):length(counts$observed), Mtilde=2*52,noStates=2,trend=FALSE, covEffectsEqual=TRUE,extraMSMargs=list())) plot(surv,legend.opts=list(x="topright")) ## End(Not run) if (require("msm")) { #Retrospective use of the function, i.e. monitor only the last time point #but use option saveHMMs to store the output of the HMM fitting surv <- algo.hmm(counts,control=list(range=length(counts$observed),Mtilde=-1,noStates=2, trend=FALSE,covEffectsEqual=TRUE, saveHMMs=TRUE)) #Compute most probable state using the viterbi algorithm - 1 is "normal", 2 is "outbreak". viterbi.msm(surv$control$hmms[[1]])$fitted #How often correct? tab <- cbind(truth=counts$state + 1 , hmm=viterbi.msm(surv$control$hmm[[1]])$fitted) table(tab[,1],tab[,2]) }
#Simulate outbreak data from HMM set.seed(123) counts <- sim.pointSource(p = 0.98, r = 0.8, length = 3*52, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.5) ## Not run: #Do surveillance using a two state HMM without trend component and #the effect of the harmonics being the same in both states. A sliding #window of two years is used to fit the HMM surv <- algo.hmm(counts, control=list(range=(2*52):length(counts$observed), Mtilde=2*52,noStates=2,trend=FALSE, covEffectsEqual=TRUE,extraMSMargs=list())) plot(surv,legend.opts=list(x="topright")) ## End(Not run) if (require("msm")) { #Retrospective use of the function, i.e. monitor only the last time point #but use option saveHMMs to store the output of the HMM fitting surv <- algo.hmm(counts,control=list(range=length(counts$observed),Mtilde=-1,noStates=2, trend=FALSE,covEffectsEqual=TRUE, saveHMMs=TRUE)) #Compute most probable state using the viterbi algorithm - 1 is "normal", 2 is "outbreak". viterbi.msm(surv$control$hmms[[1]])$fitted #How often correct? tab <- cbind(truth=counts$state + 1 , hmm=viterbi.msm(surv$control$hmm[[1]])$fitted) table(tab[,1],tab[,2]) }
Frisen and Andersson (2009) method for semiparametric surveillance of outbreaks
algo.outbreakP(disProgObj, control = list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5))
algo.outbreakP(disProgObj, control = list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5))
disProgObj |
object of class disProg (including the observed and the state chain). |
control |
A list controlling the behaviour of the algorithm
|
A generalized likelihood ratio test based on the Poisson distribution is implemented where the means of the in-control and out-of-control states are computed by isotonic regression.
where is the estimated mean obtained by
uni-modal regression under the assumption of one change-point and
is the estimated result when there is no
change-point (i.e. this is just the mean of all observations). Note
that the contrasted hypothesis assume all means are equal until the
change-point, i.e. this detection method is especially suited for
detecting a shift from a relative constant mean. Hence, this is less
suited for detection in diseases with strong seasonal endemic
component. Onset of influenza detection is an example where this
method works particular well.
In case control$ret == "cases"
then a brute force numerical
search for the number needed before alarm (NNBA) is performed. That
is, given the past observations, what's the minimum number which would
have caused an alarm? Note: Computing this might take a while because
the search is done by sequentially increasing/decreasing the last
observation by one for each time point in control$range
and
then calling the workhorse function of the algorithm again. The argument
control$maxUpperboundCases
controls the upper limit of this
search (default is 1e5).
Currently, even though the statistic has passed the threshold, the NNBA
is still computed. After a few time instances what typically happens is
that no matter the observed value we would have an alarm at this time point. In this case the value of NNBA is set to NA
. Furthermore, the first time
point is always NA
, unless k<1
.
algo.outbreakP
gives a list of class survRes
which
includes the vector of alarm values for every time-point in
range
, the vector of threshold values for every time-point
in range
.
M. Höhle – based on Java code by M. Frisen and L. Schiöler
The code is an extended R port of the Java code by Marianne
Frisén and Linus Schiöler from the
Computer Assisted Search For Epidemics (CASE) project,
formerly available from https://case.folkhalsomyndigheten.se/
under the GNU GPL License v3.
An additional feature of the R code is that it contains a search for NNBA (see details).
Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.
Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.
#Use data from outbreakP manual (http://www.hgu.gu.se/item.aspx?id=16857) y <- matrix(c(1,0,3,1,2,3,5,4,7,3,5,8,16,23,33,34,48),ncol=1) #Generate sts object with these observations mysts <- sts(y, alarm=y*0) #Run the algorithm and present results #Only the value of outbreakP statistic upperbound(outbreakP(mysts, control=list(range=1:length(y),k=100, ret="value"))) #Graphical illustration with number-needed-before-alarm (NNBA) upperbound. res <- outbreakP(mysts, control=list(range=1:length(y),k=100, ret="cases")) plot(res,dx.upperbound=0,lwd=c(1,1,3),legend.opts=list(legend=c("Infected", "NNBA","Outbreak","Alarm"),horiz=TRUE))
#Use data from outbreakP manual (http://www.hgu.gu.se/item.aspx?id=16857) y <- matrix(c(1,0,3,1,2,3,5,4,7,3,5,8,16,23,33,34,48),ncol=1) #Generate sts object with these observations mysts <- sts(y, alarm=y*0) #Run the algorithm and present results #Only the value of outbreakP statistic upperbound(outbreakP(mysts, control=list(range=1:length(y),k=100, ret="value"))) #Graphical illustration with number-needed-before-alarm (NNBA) upperbound. res <- outbreakP(mysts, control=list(range=1:length(y),k=100, ret="cases")) plot(res,dx.upperbound=0,lwd=c(1,1,3),legend.opts=list(legend=c("Infected", "NNBA","Outbreak","Alarm"),horiz=TRUE))
Computation of the quality values for a surveillance system output.
algo.quality(sts, penalty = 20)
algo.quality(sts, penalty = 20)
sts |
object of class |
penalty |
the maximal penalty for the lag |
The lag is defined as follows:
In the state chain just the beginnings of an outbreak chain (outbreaks directly
following each other) are considered. In the alarm chain, the range from the beginning
of an outbreak until min(next outbreak beginning, penalty)
timepoints is considered. The penalty
timepoints were
chosen, to provide an upper bound on the penalty for not discovering
an outbreak. Now the difference between the first alarm by the system
and the defined beginning is denoted “the lag”.
Additionally outbreaks found by the system are not
punished. At the end, the mean of the lags for every outbreak chain is returned
as summary lag.
an object of class "algoQV"
, which is
a list of quality values:
TP |
Number of correct found outbreaks. |
FP |
Number of false found outbreaks. |
TN |
Number of correct found non outbreaks. |
FN |
Number of false found non outbreaks. |
sens |
True positive rate, meaning TP/(FN + TP). |
spec |
True negative rate, meaning TN/(TN + FP). |
dist |
Euclidean distance between (1-spec, sens) to (0,1). |
lag |
Lag of the outbreak recognizing by the system. |
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the list of quality values quality <- algo.quality(survResObj) quality # the list is printed in matrix form # Format as an "xtable", which is printed with LaTeX markup (by default) library("xtable") xtable(quality)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the list of quality values quality <- algo.quality(survResObj) quality # the list is printed in matrix form # Format as an "xtable", which is printed with LaTeX markup (by default) library("xtable") xtable(quality)
Evaluation of timepoints with the detection algorithms used by the RKI
algo.rkiLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 2, w = 4, actY = FALSE)) algo.rki(disProgObj, control = list(range = range, b = 2, w = 4, actY = FALSE)) algo.rki1(disProgObj, control = list(range = range)) algo.rki2(disProgObj, control = list(range = range)) algo.rki3(disProgObj, control = list(range = range))
algo.rkiLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 2, w = 4, actY = FALSE)) algo.rki(disProgObj, control = list(range = range, b = 2, w = 4, actY = FALSE)) algo.rki1(disProgObj, control = list(range = range)) algo.rki2(disProgObj, control = list(range = range)) algo.rki3(disProgObj, control = list(range = range))
disProgObj |
object of class disProg (including the observed and the state chain). |
timePoint |
time point which should be evaluated in |
control |
control object: |
Using the reference values for calculating an upper limit (threshold),
alarm is given if the actual value is bigger than a computed threshold.
algo.rki
calls algo.rkiLatestTimepoint
for the values specified
in range
and for the system specified in control
.
algo.rki1
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 1 system.
algo.rki2
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 2 system.
algo.rki3
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 3 system.
"RKI 1"
reference values from 6 weeks ago
"RKI 2"
reference values from 6 weeks ago and
13 weeks of the year ago (symmetrical around the
comparable week).
"RKI 3"
18 reference values. 9 from the year ago
and 9 from two years ago (also symmetrical around the
comparable week).
algo.rkiLatestTimepoint
returns a list of class survRes
(surveillance result), which
includes the alarm value (alarm = 1, no alarm = 0) for recognizing an
outbreak, the threshold value for recognizing the alarm and
the input object of class disProg.
algo.rki
gives a list of class survRes
which includes the vector
of alarm values for every timepoint in range
, the vector of threshold values
for every timepoint in range
for the system specified by b
, w
and
actY
, the range and the input object of class disProg.
algo.rki1
returns the same for the RKI 1 system, algo.rki2
for the RKI 2 system and algo.rki3
for the RKI 3 system.
M. Höhle, A. Riebler, Christian Lang
algo.bayesLatestTimepoint
and algo.bayes
for
the Bayes system.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined rki algo.rki(disProgObj, control = list(range = 200:208, b = 1, w = 5, actY = TRUE)) # The same for rki 1 to rki 3 algo.rki1(disProgObj, control = list(range = 200:208)) algo.rki2(disProgObj, control = list(range = 200:208)) algo.rki3(disProgObj, control = list(range = 200:208)) # Test for rki 1 the latest timepoint algo.rkiLatestTimepoint(disProgObj)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined rki algo.rki(disProgObj, control = list(range = 200:208, b = 1, w = 5, actY = TRUE)) # The same for rki 1 to rki 3 algo.rki1(disProgObj, control = list(range = 200:208)) algo.rki2(disProgObj, control = list(range = 200:208)) algo.rki3(disProgObj, control = list(range = 200:208)) # Test for rki 1 the latest timepoint algo.rkiLatestTimepoint(disProgObj)
Modified Poisson CUSUM method that allows for a time-varying in-control parameter
as proposed by Rogerson and Yamada (2004). The
same approach can be applied to binomial data if
distribution="binomial"
is specified.
algo.rogerson(disProgObj, control = list(range = range, theta0t = NULL, ARL0 = NULL, s = NULL, hValues = NULL, distribution = c("poisson","binomial"), nt = NULL, FIR=FALSE, limit = NULL, digits = 1))
algo.rogerson(disProgObj, control = list(range = range, theta0t = NULL, ARL0 = NULL, s = NULL, hValues = NULL, distribution = c("poisson","binomial"), nt = NULL, FIR=FALSE, limit = NULL, digits = 1))
disProgObj |
object of class |
control |
list with elements
|
The CUSUM for a sequence of Poisson or binomial
variates is computed as
where and
;
and
are time-varying
reference values and decision intervals.
An alarm is given at time
if
.
If FIR=TRUE
, the CUSUM starts
with a head start value at time
.
After an alarm is given, the FIR CUSUM starts again at this head start value.
The procedure after the CUSUM gives an alarm can be determined by limit
.
Suppose that the CUSUM signals at time , i.e.
.
For numeric values of
limit
, the CUSUM is bounded
above after an alarm is given,
i.e. is set to
.
Using
limit
=0 corresponds to
resetting to zero after an alarm as proposed in the original
formulation of the CUSUM. If
FIR=TRUE
,
is reset to
(i.e.
limit
= ).
If
limit=NULL
, no resetting occurs after an alarm is given.
Returns an object of class survRes
with elements
alarm |
indicates whether the CUSUM signaled at time |
upperbound |
CUSUM values |
disProgObj |
|
control |
list with the alarm threshold |
algo.rogerson
is a univariate CUSUM method. If the data are
available in several regions (i.e. observed
is a matrix),
multiple univariate CUSUMs are applied to each region.
Rogerson, P. A. and Yamada, I. Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts. Morbidity and Mortality Weekly Report, 2004, 53/Supplement, 79-85
# simulate data (seasonal Poisson) set.seed(123) t <- 1:300 lambda <- exp(-0.5 + 0.4 * sin(2*pi*t/52) + 0.6 * cos(2*pi*t/52)) data <- sts(observed = rpois(length(lambda), lambda)) # determine a matrix with h values hVals <- hValues(theta0 = 10:150/100, ARL0=500, s = 1, distr = "poisson") # convert to legacy "disProg" class and apply modified Poisson CUSUM disProgObj <- sts2disProg(data) res <- algo.rogerson(disProgObj, control=c(hVals, list(theta0t=lambda, range=1:300))) plot(res, xaxis.years = FALSE)
# simulate data (seasonal Poisson) set.seed(123) t <- 1:300 lambda <- exp(-0.5 + 0.4 * sin(2*pi*t/52) + 0.6 * cos(2*pi*t/52)) data <- sts(observed = rpois(length(lambda), lambda)) # determine a matrix with h values hVals <- hValues(theta0 = 10:150/100, ARL0=500, s = 1, distr = "poisson") # convert to legacy "disProg" class and apply modified Poisson CUSUM disProgObj <- sts2disProg(data) res <- algo.rogerson(disProgObj, control=c(hVals, list(theta0t=lambda, range=1:300))) plot(res, xaxis.years = FALSE)
Summary table generation for several disease chains.
algo.summary(compMatrices)
algo.summary(compMatrices)
compMatrices |
list of matrices constructed by algo.compare. |
As lag the mean of all single lags is returned. TP values, FN values,
TN values and FP values are summed up. dist
, sens
and
spec
are new computed on the basis of the new TP value, FN value,
TN value and FP value.
a matrix summing up the singular input matrices
# Create a test object disProgObj1 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) disProgObj2 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 5) disProgObj3 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 17) # Let this object be tested from any methods in range = 200:400 range <- 200:400 control <- list(list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range)) compMatrix1 <- algo.compare(algo.call(disProgObj1, control=control)) compMatrix2 <- algo.compare(algo.call(disProgObj2, control=control)) compMatrix3 <- algo.compare(algo.call(disProgObj3, control=control)) algo.summary( list(a=compMatrix1, b=compMatrix2, c=compMatrix3) )
# Create a test object disProgObj1 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) disProgObj2 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 5) disProgObj3 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 17) # Let this object be tested from any methods in range = 200:400 range <- 200:400 control <- list(list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range)) compMatrix1 <- algo.compare(algo.call(disProgObj1, control=control)) compMatrix2 <- algo.compare(algo.call(disProgObj2, control=control)) compMatrix3 <- algo.compare(algo.call(disProgObj3, control=control)) algo.summary( list(a=compMatrix1, b=compMatrix2, c=compMatrix3) )
Two model fits are compared using standard all.equal
-methods
after discarding certain elements considered irrelevant for the equality
of the fits, e.g., the runtime and the call.
## S3 method for class 'twinstim' all.equal(target, current, ..., ignore = NULL) ## S3 method for class 'hhh4' all.equal(target, current, ..., ignore = NULL)
## S3 method for class 'twinstim' all.equal(target, current, ..., ignore = NULL) ## S3 method for class 'hhh4' all.equal(target, current, ..., ignore = NULL)
target , current
|
the model fits to be compared. |
... |
further arguments for standard
|
ignore |
an optional character vector of elements to ignore when
comparing the two fitted objects. The following elements are always
ignored: |
Either TRUE
or a character vector describing differences between
the target
and the current
model fit.
Sebastian Meyer
Generic function for animation of R objects.
animate(object, ...)
animate(object, ...)
object |
The object to animate. |
... |
Arguments to be passed to methods, such as graphical parameters or time interval options for the snapshots. |
The methods animate.epidata
, animate.epidataCS
,
and animate.sts
for the animation of surveillance data.
Compute Anscombe residuals from a fitted glm
,
which makes them approximately standard normal distributed.
anscombe.residuals(m, phi)
anscombe.residuals(m, phi)
m |
a fitted |
phi |
the current estimated overdispersion |
The standardized Anscombe residuals of m
McCullagh & Nelder, Generalized Linear Models, 1989
Calculates the average run length (ARL) for an upward CUSUM scheme for discrete distributions (i.e. Poisson and binomial) using the Markov chain approach.
arlCusum(h=10, k=3, theta=2.4, distr=c("poisson","binomial"), W=NULL, digits=1, ...)
arlCusum(h=10, k=3, theta=2.4, distr=c("poisson","binomial"), W=NULL, digits=1, ...)
h |
decision interval |
k |
reference value |
theta |
distribution parameter for the cumulative distribution function
(cdf) |
distr |
|
W |
Winsorizing value |
digits |
|
... |
further arguments for the distribution function, i.e. number of trials |
Returns a list with the ARL of the regular (zero-start)
and the fast initial response (FIR)
CUSUM scheme with reference value k
, decision interval h
for
, where F is the Poisson or binomial CDF.
ARL |
one-sided ARL of the regular (zero-start) CUSUM scheme |
FIR.ARL |
one-sided ARL of the FIR CUSUM scheme with head start
|
Based on the FORTRAN code of
Hawkins, D. M. (1992). Evaluation of Average Run Lengths of Cumulative Sum Charts for an Arbitrary Data Distribution. Communications in Statistics - Simulation and Computation, 21(4), p. 1001-1020.
The function is an implementation of the non-parametric
back-projection of incidence cases to exposure cases described in
Becker et al. (1991). The method back-projects exposure times
from a univariate time series containing the number of symptom onsets per time
unit. Here, the delay between exposure and symptom onset for an
individual is seen as a realization of a random variable governed by a
known probability mass function.
The back-projection function calculates the expected number of exposures
for each time unit under the assumption of a
Poisson distribution, but without any parametric assumption on how the
evolve in time.
Furthermore, the function contains a bootstrap based procedure, as
given in Yip et al (2011), which allows an indication of uncertainty
in the estimated . The procedure is
equivalent to the suggestion in Becker and Marschner (1993). However,
the present implementation in
backprojNP
allows only a
univariate time series, i.e. simultaneous age groups as in Becker and
Marschner (1993) are not possible.
The method in Becker et al. (1991) was originally developed for the back-projection of AIDS incidence, but it is equally useful for analysing the epidemic curve in outbreak situations of a disease with long incubation time, e.g. in order to qualitatively investigate the effect of intervention measures.
backprojNP(sts, incu.pmf, control = list(k = 2, eps = rep(0.005,2), iter.max=rep(250,2), Tmark = nrow(sts), B = -1, alpha = 0.05, verbose = FALSE, lambda0 = NULL, eq3a.method = c("R","C"), hookFun = function(stsbp) {}), ...)
backprojNP(sts, incu.pmf, control = list(k = 2, eps = rep(0.005,2), iter.max=rep(250,2), Tmark = nrow(sts), B = -1, alpha = 0.05, verbose = FALSE, lambda0 = NULL, eq3a.method = c("R","C"), hookFun = function(stsbp) {}), ...)
sts |
an object of class |
incu.pmf |
Probability mass function (PMF) of the incubation
time. The PMF is specified as a vector or matrix with the value of
the PMF evaluated at |
control |
A list with named arguments controlling the functionality of the non-parametric back-projection.
|
... |
Additional arguments are sent to the hook function. |
Becker et al. (1991) specify a non-parametric back-projection algorithm based on the Expectation-Maximization-Smoothing (EMS) algorithm.
In the present implementation the algorithm iterates until
This is a slight adaptation of the proposals in Becker et
al. (1991). If is the length of
then one can
avoid instability of the algorithm near the end by considering only
the
's with index
.
See the references for further information.
backprojNP
returns an object of "stsBP"
.
The method is still experimental. A proper plot routine for
stsBP
objects is currently missing.
Michael Höhle with help by
Daniel Sabanés Bové
and Sebastian Meyer for eq3a.method = "C"
Becker NG, Watson LF and Carlin JB (1991), A method for non-parametric back-projection and its application to AIDS data, Statistics in Medicine, 10:1527-1542.
Becker NG and Marschner IC (1993), A method for estimating the age-specific relative risk of HIV infection from AIDS incidence data, Biometrika, 80(1):165-178.
Yip PSF, Lam KF, Xu Y, Chau PH, Xu J, Chang W, Peng Y, Liu Z, Xie X and Lau HY (2011), Reconstruction of the Infection Curve for SARS Epidemic in Beijing, China Using a Back-Projection Method, Communications in Statistics - Simulation and Computation, 37(2):425-433.
Associations of Age and Sex on Clinical Outcome and Incubation Period of Shiga toxin-producing Escherichia coli O104:H4 Infections, 2011 (2013), Werber D, King LA, Müller L, Follin P, Buchholz U, Bernard H, Rosner BM, Ethelberg S, de Valk H, Höhle M, American Journal of Epidemiology, 178(6):984-992.
#Generate an artificial outbreak of size n starting at time t0 and being of length n <- 1e3 ; t0 <- 23 ; l <- 10 #PMF of the incubation time is an interval censored gamma distribution #with mean 15 truncated at 25. dmax <- 25 inc.pmf <- c(0,(pgamma(1:dmax,15,1.4) - pgamma(0:(dmax-1),15,1.4))/pgamma(dmax,15,1.4)) #Function to sample from the incubation time rincu <- function(n) { sample(0:dmax, size=n, replace=TRUE, prob=inc.pmf) } #Sample time of exposure and length of incubation time set.seed(123) exposureTimes <- t0 + sample(x=0:(l-1),size=n,replace=TRUE) symptomTimes <- exposureTimes + rincu(n) #Time series of exposure (truth) and symptom onset (observed) X <- table( factor(exposureTimes,levels=1:(max(symptomTimes)+dmax))) Y <- table( factor(symptomTimes,levels=1:(max(symptomTimes)+dmax))) #Convert Y to an sts object Ysts <- sts(Y) #Plot the outbreak plot(Ysts, xaxis.labelFormat=NULL, legend=NULL) #Add true number of exposures to the plot lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Helper function to show the EM step plotIt <- function(cur.sts) { plot(cur.sts,xaxis.labelFormat=NULL, legend.opts=NULL,ylim=c(0,140)) } #Call non-parametric back-projection function with hook function but #without bootstrapped confidence intervals bpnp.control <- list(k=0,eps=rep(0.005,2),iter.max=rep(250,2),B=-1,hookFun=plotIt,verbose=TRUE) #Fast C version (use argument: eq3a.method="C")! sts.bp <- backprojNP(Ysts, incu.pmf=inc.pmf, control=modifyList(bpnp.control,list(eq3a.method="C")), ylim=c(0,max(X,Y))) #Show result plot(sts.bp,xaxis.labelFormat=NULL,legend=NULL,lwd=c(1,1,2),lty=c(1,1,1),main="") lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Do the convolution for the expectation mu <- matrix(0,ncol=ncol(sts.bp),nrow=nrow(sts.bp)) #Loop over all series for (j in 1:ncol(sts.bp)) { #Loop over all time points for (t in 1:nrow(sts.bp)) { #Convolution, note support of inc.pmf starts at zero (move idx by 1) i <- seq_len(t) mu[t,j] <- sum(inc.pmf[t-i+1] * upperbound(sts.bp)[i,j],na.rm=TRUE) } } #Show the fit lines(1:nrow(sts.bp)-0.5,mu[,1],col="green",type="s",lwd=3) #Non-parametric back-projection including bootstrap CIs bpnp.control2 <- modifyList(bpnp.control, list(hookFun=NULL, k=2, B=10, # in practice, use B >= 1000 ! eq3a.method="C")) sts.bp2 <- backprojNP(Ysts, incu.pmf=inc.pmf, control=bpnp.control2) ###################################################################### # Plot the result. This is currently a manual routine. # ToDo: Need to specify a plot method for stsBP objects which also # shows the CI. # # Parameters: # stsBP - object of class stsBP which is to be plotted. ###################################################################### plot.stsBP <- function(stsBP) { maxy <- max(observed(stsBP),upperbound(stsBP),stsBP@ci,na.rm=TRUE) plot(upperbound(stsBP),type="n",ylim=c(0,maxy), ylab="Cases",xlab="time") if (!all(is.na(stsBP@ci))) { polygon( c(1:nrow(stsBP),rev(1:nrow(stsBP))), c(stsBP@ci[2,,1],rev(stsBP@ci[1,,1])),col="lightgray") } lines(upperbound(stsBP),type="l",lwd=2) legend(x="topright",c(expression(lambda[t])),lty=c(1),col=c(1),fill=c(NA),border=c(NA),lwd=c(2)) invisible() } #Plot the result of k=0 and add truth for comparison. No CIs available plot.stsBP(sts.bp) lines(1:length(Y),X,col=2,type="h") #Same for k=2 plot.stsBP(sts.bp2) lines(1:length(Y),X,col=2,type="h")
#Generate an artificial outbreak of size n starting at time t0 and being of length n <- 1e3 ; t0 <- 23 ; l <- 10 #PMF of the incubation time is an interval censored gamma distribution #with mean 15 truncated at 25. dmax <- 25 inc.pmf <- c(0,(pgamma(1:dmax,15,1.4) - pgamma(0:(dmax-1),15,1.4))/pgamma(dmax,15,1.4)) #Function to sample from the incubation time rincu <- function(n) { sample(0:dmax, size=n, replace=TRUE, prob=inc.pmf) } #Sample time of exposure and length of incubation time set.seed(123) exposureTimes <- t0 + sample(x=0:(l-1),size=n,replace=TRUE) symptomTimes <- exposureTimes + rincu(n) #Time series of exposure (truth) and symptom onset (observed) X <- table( factor(exposureTimes,levels=1:(max(symptomTimes)+dmax))) Y <- table( factor(symptomTimes,levels=1:(max(symptomTimes)+dmax))) #Convert Y to an sts object Ysts <- sts(Y) #Plot the outbreak plot(Ysts, xaxis.labelFormat=NULL, legend=NULL) #Add true number of exposures to the plot lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Helper function to show the EM step plotIt <- function(cur.sts) { plot(cur.sts,xaxis.labelFormat=NULL, legend.opts=NULL,ylim=c(0,140)) } #Call non-parametric back-projection function with hook function but #without bootstrapped confidence intervals bpnp.control <- list(k=0,eps=rep(0.005,2),iter.max=rep(250,2),B=-1,hookFun=plotIt,verbose=TRUE) #Fast C version (use argument: eq3a.method="C")! sts.bp <- backprojNP(Ysts, incu.pmf=inc.pmf, control=modifyList(bpnp.control,list(eq3a.method="C")), ylim=c(0,max(X,Y))) #Show result plot(sts.bp,xaxis.labelFormat=NULL,legend=NULL,lwd=c(1,1,2),lty=c(1,1,1),main="") lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Do the convolution for the expectation mu <- matrix(0,ncol=ncol(sts.bp),nrow=nrow(sts.bp)) #Loop over all series for (j in 1:ncol(sts.bp)) { #Loop over all time points for (t in 1:nrow(sts.bp)) { #Convolution, note support of inc.pmf starts at zero (move idx by 1) i <- seq_len(t) mu[t,j] <- sum(inc.pmf[t-i+1] * upperbound(sts.bp)[i,j],na.rm=TRUE) } } #Show the fit lines(1:nrow(sts.bp)-0.5,mu[,1],col="green",type="s",lwd=3) #Non-parametric back-projection including bootstrap CIs bpnp.control2 <- modifyList(bpnp.control, list(hookFun=NULL, k=2, B=10, # in practice, use B >= 1000 ! eq3a.method="C")) sts.bp2 <- backprojNP(Ysts, incu.pmf=inc.pmf, control=bpnp.control2) ###################################################################### # Plot the result. This is currently a manual routine. # ToDo: Need to specify a plot method for stsBP objects which also # shows the CI. # # Parameters: # stsBP - object of class stsBP which is to be plotted. ###################################################################### plot.stsBP <- function(stsBP) { maxy <- max(observed(stsBP),upperbound(stsBP),stsBP@ci,na.rm=TRUE) plot(upperbound(stsBP),type="n",ylim=c(0,maxy), ylab="Cases",xlab="time") if (!all(is.na(stsBP@ci))) { polygon( c(1:nrow(stsBP),rev(1:nrow(stsBP))), c(stsBP@ci[2,,1],rev(stsBP@ci[1,,1])),col="lightgray") } lines(upperbound(stsBP),type="l",lwd=2) legend(x="topright",c(expression(lambda[t])),lty=c(1),col=c(1),fill=c(NA),border=c(NA),lwd=c(2)) invisible() } #Plot the result of k=0 and add truth for comparison. No CIs available plot.stsBP(sts.bp) lines(1:length(Y),X,col=2,type="h") #Same for k=2 plot.stsBP(sts.bp2) lines(1:length(Y),X,col=2,type="h")
Given a prime number factorization x
, bestCombination
partitions x
into two groups, such that the product of the numbers
in group one is as similar as possible to the product
of the numbers of group two. This is useful in magic.dim
.
bestCombination(x)
bestCombination(x)
x |
prime number factorization |
a vector c(prod(set1),prod(set2))
The function takes range
values of a univariate surveillance time
series sts
and for each time point uses a negative binomial
regression model to compute the predictive posterior distribution for
the current observation. The
quantile of this predictive distribution is then
used as bound: If the actual observation is above the bound an alarm
is raised.
The Bayesian Outbreak Detection Algorithm (
boda
) is due to
Manitz and Höhle (2013) and its implementation is
illustrated in Salmon et al. (2016).
However, boda
should be considered as an experiment, see the
Warning section below!
boda(sts, control = list( range=NULL, X=NULL, trend=FALSE, season=FALSE, prior=c('iid','rw1','rw2'), alpha=0.05, mc.munu=100, mc.y=10, verbose=FALSE, samplingMethod=c('joint','marginals'), quantileMethod=c("MC","MM") ))
boda(sts, control = list( range=NULL, X=NULL, trend=FALSE, season=FALSE, prior=c('iid','rw1','rw2'), alpha=0.05, mc.munu=100, mc.y=10, verbose=FALSE, samplingMethod=c('joint','marginals'), quantileMethod=c("MC","MM") ))
sts |
object of class sts (including the |
control |
Control object given as a
|
This function is currently experimental!! It also heavily depends on the INLA package so changes there might affect the operational ability of this function. Since the computations for the Bayesian GAM are quite involved do not expect this function to be particularly fast.
Results are not reproducible if INLA uses parallelization (as by default);
set INLA::inla.setOption(num.threads = "1:1")
to avoid that,
then do set.seed
as usual.
This function requires the R package INLA, which is currently
not available from CRAN. It can be obtained from INLA's own
repository via
install.packages("INLA", repos="https://inla.r-inla-download.org/R/stable")
.
J. Manitz, M. Höhle, M. Salmon
Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
## Not run: ## running this example takes a couple of minutes #Load the campylobacteriosis data for Germany data("campyDE") #Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) #Define monitoring period # range <- which(epoch(cam.sts)>=as.Date("2007-01-01")) # range <- which(epoch(cam.sts)>=as.Date("2011-12-10")) range <- tail(1:nrow(cam.sts),n=2) control <- list(range=range, X=NULL, trend=TRUE, season=TRUE, prior='iid', alpha=0.025, mc.munu=100, mc.y=10, samplingMethod = "joint") #Apply the boda algorithm in its simples form, i.e. spline is #described by iid random effects and no extra covariates library("INLA") # needs to be attached cam.boda1 <- boda(cam.sts, control=control) plot(cam.boda1, xlab='time [weeks]', ylab='No. reported', dx.upperbound=0) ## End(Not run)
## Not run: ## running this example takes a couple of minutes #Load the campylobacteriosis data for Germany data("campyDE") #Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) #Define monitoring period # range <- which(epoch(cam.sts)>=as.Date("2007-01-01")) # range <- which(epoch(cam.sts)>=as.Date("2011-12-10")) range <- tail(1:nrow(cam.sts),n=2) control <- list(range=range, X=NULL, trend=TRUE, season=TRUE, prior='iid', alpha=0.025, mc.munu=100, mc.y=10, samplingMethod = "joint") #Apply the boda algorithm in its simples form, i.e. spline is #described by iid random effects and no extra covariates library("INLA") # needs to be attached cam.boda1 <- boda(cam.sts, control=control) plot(cam.boda1, xlab='time [weeks]', ylab='No. reported', dx.upperbound=0) ## End(Not run)
The function takes range
values of the surveillance time
series sts
and for each time point uses a Bayesian model of the negative binomial family with
log link inspired by the work of Noufaily et al. (2012) and of Manitz and Höhle (2014). It allows delay-corrected aberration detection as explained in Salmon et al. (2015). A reportingTriangle
has to be provided in the control
slot.
bodaDelay(sts, control = list( range = NULL, b = 5, w = 3, mc.munu = 100, mc.y = 10, pastAberrations = TRUE, verbose = FALSE, alpha = 0.05, trend = TRUE, limit54 = c(5,4), inferenceMethod = c("asym","INLA"), quantileMethod = c("MC","MM"), noPeriods = 1, pastWeeksNotIncluded = NULL, delay = FALSE))
bodaDelay(sts, control = list( range = NULL, b = 5, w = 3, mc.munu = 100, mc.y = 10, pastAberrations = TRUE, verbose = FALSE, alpha = 0.05, trend = TRUE, limit54 = c(5,4), inferenceMethod = c("asym","INLA"), quantileMethod = c("MC","MM"), noPeriods = 1, pastWeeksNotIncluded = NULL, delay = FALSE))
sts |
sts-object to be analysed. Needs to have a reporting triangle. |
control |
list of control arguments:
|
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
Salmon, M., Schumacher, D., Stark, K., Höhle, M. (2015): Bayesian outbreak detection in the presence of reporting delays. Biometrical Journal, 57 (6), 1051-1067.
## Not run: data("stsNewport") salm.Normal <- list() salmDelayAsym <- list() for (week in 43:45){ listWeeks <- as.Date(row.names(stsNewport@control$reportingTriangle$n)) dateObs <- listWeeks[isoWeekYear(listWeeks)$ISOYear==2011 & isoWeekYear(listWeeks)$ISOWeek==week] stsC <- sts_observation(stsNewport, dateObservation=dateObs, cut=TRUE) inWeeks <- with(isoWeekYear(epoch(stsC)), ISOYear == 2011 & ISOWeek >= 40 & ISOWeek <= 48) rangeTest <- which(inWeeks) alpha <- 0.07 # Control slot for Noufaily method controlNoufaily <- list(range=rangeTest,noPeriods=10, b=4,w=3,weightsThreshold=2.58,pastWeeksNotIncluded=26, pThresholdTrend=1,thresholdMethod="nbPlugin",alpha=alpha*2, limit54=c(0,50)) # Control slot for the Proposed algorithm with D=0 correction controlNormal <- list(range = rangeTest, b = 4, w = 3, reweight = TRUE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=FALSE) # Control slot for the Proposed algorithm with D=10 correction controlDelayNorm <- list(range = rangeTest, b = 4, w = 3, reweight = FALSE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=TRUE,inferenceMethod="asym") set.seed(1) salm.Normal[[week]] <- farringtonFlexible(stsC, controlNoufaily) salmDelayAsym[[week]] <- bodaDelay(stsC, controlDelayNorm) } opar <- par(mfrow=c(2,3)) lapply(salmDelayAsym[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) lapply(salm.Normal[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) par(opar) ## End(Not run)
## Not run: data("stsNewport") salm.Normal <- list() salmDelayAsym <- list() for (week in 43:45){ listWeeks <- as.Date(row.names(stsNewport@control$reportingTriangle$n)) dateObs <- listWeeks[isoWeekYear(listWeeks)$ISOYear==2011 & isoWeekYear(listWeeks)$ISOWeek==week] stsC <- sts_observation(stsNewport, dateObservation=dateObs, cut=TRUE) inWeeks <- with(isoWeekYear(epoch(stsC)), ISOYear == 2011 & ISOWeek >= 40 & ISOWeek <= 48) rangeTest <- which(inWeeks) alpha <- 0.07 # Control slot for Noufaily method controlNoufaily <- list(range=rangeTest,noPeriods=10, b=4,w=3,weightsThreshold=2.58,pastWeeksNotIncluded=26, pThresholdTrend=1,thresholdMethod="nbPlugin",alpha=alpha*2, limit54=c(0,50)) # Control slot for the Proposed algorithm with D=0 correction controlNormal <- list(range = rangeTest, b = 4, w = 3, reweight = TRUE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=FALSE) # Control slot for the Proposed algorithm with D=10 correction controlDelayNorm <- list(range = rangeTest, b = 4, w = 3, reweight = FALSE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=TRUE,inferenceMethod="asym") set.seed(1) salm.Normal[[week]] <- farringtonFlexible(stsC, controlNoufaily) salmDelayAsym[[week]] <- bodaDelay(stsC, controlDelayNorm) } opar <- par(mfrow=c(2,3)) lapply(salmDelayAsym[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) lapply(salm.Normal[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) par(opar) ## End(Not run)
The implemented calibration tests for Poisson or negative binomial
predictions of count data are based on proper scoring rules and
described in detail in Wei and Held (2014).
The following proper scoring rules are available:
Dawid-Sebastiani score ("dss"
),
logarithmic score ("logs"
),
ranked probability score ("rps"
).
calibrationTest(x, ...) ## Default S3 method: calibrationTest(x, mu, size = NULL, which = c("dss", "logs", "rps"), tolerance = 1e-4, method = 2, ...)
calibrationTest(x, ...) ## Default S3 method: calibrationTest(x, mu, size = NULL, which = c("dss", "logs", "rps"), tolerance = 1e-4, method = 2, ...)
x |
the observed counts. All involved functions are vectorized and also accept matrices or arrays. |
mu |
the means of the predictive distributions for the
observations |
size |
either |
which |
a character string indicating which proper scoring rule to apply. |
tolerance |
absolute tolerance for the null expectation and variance of
|
method |
selection of the |
... |
unused (argument of the generic). |
an object of class "htest"
,
which is a list with the following components:
method |
a character string indicating the type of test
performed (including |
data.name |
a character string naming the supplied |
statistic |
the |
parameter |
the number of predictions underlying the test, i.e., |
p.value |
the p-value for the test. |
If the gsl package is installed, its implementations of the
Bessel and hypergeometric functions are used when calculating the null
expectation and variance of the rps
.
These functions are faster and yield more accurate results (especially
for larger mu
).
Sebastian Meyer and Wei Wei
Wei, W. and Held, L. (2014): Calibration tests for count data. Test, 23, 787-805.
mu <- c(0.1, 1, 3, 6, pi, 100) size <- 0.1 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) calibrationTest(y, mu = mu, size = size) # p = 0.99 calibrationTest(y, mu = mu, size = 1) # p = 4.3e-05 calibrationTest(y, mu = 1, size = size) # p = 0.6959 calibrationTest(y, mu = 1, size = size, which = "rps") # p = 0.1286
mu <- c(0.1, 1, 3, 6, pi, 100) size <- 0.1 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) calibrationTest(y, mu = mu, size = size) # p = 0.99 calibrationTest(y, mu = mu, size = 1) # p = 4.3e-05 calibrationTest(y, mu = 1, size = size) # p = 0.6959 calibrationTest(y, mu = 1, size = size, which = "rps") # p = 0.1286
Weekly number of reported campylobacteriosis cases in Germany, 2002-2011, together with the corresponding absolute humidity (in g/m^3) that week. The absolute humidity was computed according to the procedure by Dengler (1997) using the means of representative weather station data from the German Climate service.
data(campyDE)
data(campyDE)
A data.frame
containing the following columns
date
Date
instance containing the Monday of the
reporting week.
case
Number of reported cases that week.
state
Boolean indicating whether there is external knowledge about an outbreak that week
hum
Mean absolute humidity (in g/m^3) of that week as measured by a single representative weather station.
l1.hum
-l5.hum
Lagged version (lagged by 1-5) of
the hum
covariate.
Boolean indicating whether the reporting week corresponds to the first two weeks of the year (TRUE) or not (FALSE). Note: The first week of a year is here defined as the first reporting week, which has its corresponding Monday within new year.
Boolean indicating whether the reporting week
corresponds to the last two weeks of the year (TRUE) or not
(FALSE). Note: This are the first two weeks before the
newyears
weeks.
Boolean indicating whether the reporting week corresponds to the W21-W30 period of increased gastroenteritis awareness during the O104:H4 STEC outbreak.
The data on campylobacteriosis cases have been queried from the Survstat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/).
Data for the computation of absolute humidity were obtained from the German Climate Service (Deutscher Wetterdienst), Climate data of Germany, available at https://www.dwd.de.
A complete data description and an analysis of the data can be found in Manitz and Höhle (2013).
Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.
# Load the data data("campyDE") # O104 period is W21-W30 in 2011 stopifnot(all(campyDE$O104period == ( (campyDE$date >= as.Date("2011-05-23")) & (campyDE$date < as.Date("2011-07-31")) ))) # Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) # Plot the result plot(cam.sts)
# Load the data data("campyDE") # O104 period is W21-W30 in 2011 stopifnot(all(campyDE$O104period == ( (campyDE$date >= as.Date("2011-05-23")) & (campyDE$date < as.Date("2011-07-31")) ))) # Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) # Plot the result plot(cam.sts)
Function to process sts
object by binomial, beta-binomial
or multinomial CUSUM as described by Höhle (2010).
Logistic, multinomial logistic, proportional
odds or Bradley-Terry regression models are used to specify in-control
and out-of-control parameters.
The implementation is illustrated in Salmon et al. (2016).
categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL, pi1=NULL, dfun=NULL, ret=c("cases","value")),...)
categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL, pi1=NULL, dfun=NULL, ret=c("cases","value")),...)
stsObj |
Object of class |
control |
Control object containing several items
|
... |
Additional arguments to send to |
The function allows the monitoring of categorical time series as described by regression models for binomial, beta-binomial or multinomial data. The later includes e.g. multinomial logistic regression models, proportional odds models or Bradley-Terry models for paired comparisons. See the Höhle (2010) reference for further details about the methodology.
Once an alarm is found the CUSUM scheme is reset (to zero) and monitoring continues from there.
An sts
object with observed
, alarm
,
etc. slots trimmed to the control$range
indices.
M. Höhle
Höhle, M. (2010): Online Change-Point Detection in Categorical Time Series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
## IGNORE_RDIFF_BEGIN have_GAMLSS <- require("gamlss") ## IGNORE_RDIFF_END if (have_GAMLSS) { ########################################################################### #Beta-binomial CUSUM for a small example containing the time-varying #number of positive test out of a time-varying number of total #test. ####################################### #Load meat inspection data data("abattoir") #Use GAMLSS to fit beta-bin regression model phase1 <- 1:(2*52) phase2 <- (max(phase1)+1) : nrow(abattoir) #Fit beta-binomial model using GAMLSS abattoir.df <- as.data.frame(abattoir) #Replace the observed and epoch column names to something more convenient dict <- c("observed"="y", "epoch"="t", "population"="n") replace <- dict[colnames(abattoir.df)] colnames(abattoir.df)[!is.na(replace)] <- replace[!is.na(replace)] m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + + sin(2*pi/52*t) + cos(2*pi/52*t) + + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1, family=BB(sigma.link="log"), data=abattoir.df[phase1,c("n","y","t")]) #CUSUM parameters R <- 2 #detect a doubling of the odds for a test being positive h <- 4 #threshold of the cusum #Compute in-control and out of control mean pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response") pi1 <- plogis(qlogis(pi0)+log(R)) #Create matrix with in control and out of control proportions. #Categories are D=1 and D=0, where the latter is the reference category pi0m <- rbind(pi0, 1-pi0) pi1m <- rbind(pi1, 1-pi1) ###################################################################### # Use the multinomial surveillance function. To this end it is necessary # to create a new abattoir object containing counts and proportion for # each of the k=2 categories. For binomial data this appears a bit # redundant, but generalizes easier to k>2 categories. ###################################################################### abattoir2 <- sts(epoch=1:nrow(abattoir), start=c(2006,1), freq=52, observed=cbind(abattoir@observed, abattoir@populationFrac-abattoir@observed), populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac), state=matrix(0,nrow=nrow(abattoir),ncol=2), multinomialTS=TRUE) ###################################################################### #Function to use as dfun in the categoricalCUSUM #(just a wrapper to the dBB function). Note that from v 3.0-1 the #first argument of dBB changed its name from "y" to "x"! ###################################################################### mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) { return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log)) } #Create control object for multinom cusum and use the categoricalCUSUM #method control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases", dfun=mydBB.cusum) surv <- categoricalCUSUM(abattoir2, control=control, sigma=exp(m.bbin$sigma.coef)) #Show results plot(surv[,1],dx.upperbound=0) lines(pi0,col="green") lines(pi1,col="red") #Index of the alarm which.max(alarms(surv[,1])) }
## IGNORE_RDIFF_BEGIN have_GAMLSS <- require("gamlss") ## IGNORE_RDIFF_END if (have_GAMLSS) { ########################################################################### #Beta-binomial CUSUM for a small example containing the time-varying #number of positive test out of a time-varying number of total #test. ####################################### #Load meat inspection data data("abattoir") #Use GAMLSS to fit beta-bin regression model phase1 <- 1:(2*52) phase2 <- (max(phase1)+1) : nrow(abattoir) #Fit beta-binomial model using GAMLSS abattoir.df <- as.data.frame(abattoir) #Replace the observed and epoch column names to something more convenient dict <- c("observed"="y", "epoch"="t", "population"="n") replace <- dict[colnames(abattoir.df)] colnames(abattoir.df)[!is.na(replace)] <- replace[!is.na(replace)] m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + + sin(2*pi/52*t) + cos(2*pi/52*t) + + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1, family=BB(sigma.link="log"), data=abattoir.df[phase1,c("n","y","t")]) #CUSUM parameters R <- 2 #detect a doubling of the odds for a test being positive h <- 4 #threshold of the cusum #Compute in-control and out of control mean pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response") pi1 <- plogis(qlogis(pi0)+log(R)) #Create matrix with in control and out of control proportions. #Categories are D=1 and D=0, where the latter is the reference category pi0m <- rbind(pi0, 1-pi0) pi1m <- rbind(pi1, 1-pi1) ###################################################################### # Use the multinomial surveillance function. To this end it is necessary # to create a new abattoir object containing counts and proportion for # each of the k=2 categories. For binomial data this appears a bit # redundant, but generalizes easier to k>2 categories. ###################################################################### abattoir2 <- sts(epoch=1:nrow(abattoir), start=c(2006,1), freq=52, observed=cbind(abattoir@observed, abattoir@populationFrac-abattoir@observed), populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac), state=matrix(0,nrow=nrow(abattoir),ncol=2), multinomialTS=TRUE) ###################################################################### #Function to use as dfun in the categoricalCUSUM #(just a wrapper to the dBB function). Note that from v 3.0-1 the #first argument of dBB changed its name from "y" to "x"! ###################################################################### mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) { return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log)) } #Create control object for multinom cusum and use the categoricalCUSUM #method control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases", dfun=mydBB.cusum) surv <- categoricalCUSUM(abattoir2, control=control, sigma=exp(m.bbin$sigma.coef)) #Show results plot(surv[,1],dx.upperbound=0) lines(pi0,col="green") lines(pi1,col="red") #Index of the alarm which.max(alarms(surv[,1])) }
twinSIR
or twinstim
Transform the residual process (cf. the
residuals
methods for classes
"twinSIR"
and "twinstim"
) such that the transformed
residuals should be uniformly distributed if the fitted model
well describes the true conditional intensity function. Graphically
check this using ks.plot.unif
.
The transformation for the residuals tau
is
1 - exp(-diff(c(0,tau)))
(cf. Ogata, 1988).
Another plot inspects the serial correlation between the transformed
residuals (scatterplot between and
).
checkResidualProcess(object, plot = 1:2, mfrow = c(1,length(plot)), ...)
checkResidualProcess(object, plot = 1:2, mfrow = c(1,length(plot)), ...)
object |
|
plot |
logical (or integer index) vector indicating if (which) plots of the
transformed residuals should be produced. The |
mfrow |
see |
... |
further arguments passed to |
A list (returned invisibly, if plot = TRUE
) with the following
components:
the residual process obtained by
residuals(object)
.
the transformed residuals which should be distributed as U(0,1).
the result of the ks.test
for the uniform
distribution of U
.
Sebastian Meyer
Ogata, Y. (1988) Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83, 9-27
ks.plot.unif
and the
residuals
-method for classes
"twinSIR"
and "twinstim"
.
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) # a simplistic model ## extract the "residual process", i.e., the fitted cumulative intensities residuals(fit) ## assess goodness of fit based on these residuals checkResidualProcess(fit) # could be better
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) # a simplistic model ## extract the "residual process", i.e., the fitted cumulative intensities residuals(fit) ## assess goodness of fit based on these residuals checkResidualProcess(fit) # could be better
lapply
Use lapply
if the input is a list and otherwise apply the
function directly to the input and wrap the result in a list.
The function is implemented as
if (is.list(X)) lapply(X, FUN, ...) else list(FUN(X, ...))
clapply(X, FUN, ...)
clapply(X, FUN, ...)
X |
a list or a single |
FUN |
the function to be applied to (each element of) |
... |
optional arguments to |
a list (of length 1 if X
is not a list).
S3-generic function to use with models which contain several groups of
coefficients in their coefficient vector. The coeflist
methods
are intended to list the coefficients by group. The default method
simply split
s the coefficient vector given the number of
coefficients by group.
coeflist(x, ...) ## Default S3 method: coeflist(x, npars, ...)
coeflist(x, ...) ## Default S3 method: coeflist(x, npars, ...)
x |
a model with groups of coefficients or, for the default method, a vector of coefficients. |
npars |
a named vector specifying the number of coefficients per group. |
... |
potential further arguments (currently ignored). |
a list of coefficients
Sebastian Meyer
## the default method just 'split's the coefficient vector coefs <- c(a = 1, b = 3, dispersion = 0.5) npars <- c(regression = 2, variance = 1) coeflist(coefs, npars)
## the default method just 'split's the coefficient vector coefs <- c(a = 1, b = 3, dispersion = 0.5) npars <- c(regression = 2, variance = 1) coeflist(coefs, npars)
The dataset from Steiner et al. (1999) on A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.
data(deleval)
data(deleval)
Steiner et al. (1999) use data from de Leval et al. (1994) to illustrate monitoring of failure rates of a surgical procedure for a bivariate outcome.
Over a period of six years an arterial switch operation was performed
on 104 newborn babies. Since the death rate from this surgery was
relatively low the idea of surgical "near miss" was introduced. It is
defined as the need to reinstitute cardiopulmonary bypass after a trial
period of weaning. The object of class sts
contains the
recordings of near misses and deaths from the surgery for the 104
newborn babies of the study.
The data could also be handled by a multinomial CUSUM model.
Steiner, S. H., Cook, R. J., and Farewell, V. T. (1999), Monitoring paired binary surgical outcomes using cumulative sum charts, Statistics in Medicine, 18, pp. 69–86.
De Leval, Marc R., Franiois, K., Bull, C., Brawn, W. B. and Spiegelhalter, D. (1994), Analysis of a cluster of surgical failures, Journal of Thoracic and Cardiovascular Surgery, March, pp. 914–924.
data("deleval") plot(deleval, xaxis.labelFormat=NULL,ylab="Response",xlab="Patient number")
data("deleval") plot(deleval, xaxis.labelFormat=NULL,ylab="Response",xlab="Patient number")
Generates a polygon representing a disc/circle (in planar
coordinates) as an object of one of three possible
classes: "Polygon"
from package sp,
"owin"
from package spatstat.geom, or
"gpc.poly"
from gpclib (if available).
discpoly(center, radius, npoly = 64, class = c("Polygon", "owin", "gpc.poly"), hole = FALSE)
discpoly(center, radius, npoly = 64, class = c("Polygon", "owin", "gpc.poly"), hole = FALSE)
center |
numeric vector of length 2 (center coordinates of the circle). |
radius |
single numeric value (radius of the circle). |
npoly |
single integer. Number of edges of the polygonal approximation. |
class |
class of the resulting polygon (partial name
matching applies). For |
hole |
logical. Does the resulting polygon represent a hole? |
A polygon of class class
representing a
circle/disc with npoly
edges accuracy.
If class="gpc.poly"
and this S4 class is not yet registered
in the current R session (by loading gpclib beforehand), only the
pts
slot of a "gpc.poly"
is returned with a warning.
disc
in package spatstat.geom.
## Construct circles with increasing accuracy and of different spatial classes disc1 <- discpoly(c(0,0), 5, npoly=4, class = "owin") disc2 <- discpoly(c(0,0), 5, npoly=16, class = "Polygon") disc3 <- discpoly(c(0,0), 5, npoly=64, class = "gpc.poly") # may warn ## Look at the results print(disc1) plot(disc1, axes=TRUE, main="", border=2) str(disc2) lines(disc2, col=3) str(disc3) # a list or a formal "gpc.poly" (if gpclib is available) if (is(disc3, "gpc.poly")) { plot(disc3, add=TRUE, poly.args=list(border=4)) } else { lines(disc3[[1]], col=4) } ## to only _draw_ a circle symbols(0, 0, circles=5, inches=FALSE, add=TRUE, fg=5)
## Construct circles with increasing accuracy and of different spatial classes disc1 <- discpoly(c(0,0), 5, npoly=4, class = "owin") disc2 <- discpoly(c(0,0), 5, npoly=16, class = "Polygon") disc3 <- discpoly(c(0,0), 5, npoly=64, class = "gpc.poly") # may warn ## Look at the results print(disc1) plot(disc1, axes=TRUE, main="", border=2) str(disc2) lines(disc2, col=3) str(disc3) # a list or a formal "gpc.poly" (if gpclib is available) if (is(disc3, "gpc.poly")) { plot(disc3, add=TRUE, poly.args=list(border=4)) } else { lines(disc3[[1]], col=4) } ## to only _draw_ a circle symbols(0, 0, circles=5, inches=FALSE, add=TRUE, fg=5)
A small helper function to convert a disProg
object to become
an object of the S4 class sts
and vice versa. In the future the
sts
should replace the disProg
class, but for now this
function allows for conversion between the two formats.
disProg2sts(disProgObj, map=NULL) sts2disProg(sts)
disProg2sts(disProgObj, map=NULL) sts2disProg(sts)
disProgObj |
an object of class |
map |
an optional |
sts |
an object of class |
an object of class "sts"
or "disProg"
, respectively.
data(ha) print(disProg2sts(ha)) class(sts2disProg(disProg2sts(ha)))
data(ha) print(disProg2sts(ha)) class(sts2disProg(disProg2sts(ha)))
The function takes range
values of the surveillance time
series sts
and for each time point computes a threshold for the number of counts
based on values from the recent past.
This is then compared to the observed
number of counts. If the observation is above a specific quantile of
the prediction interval, then an alarm is raised. This method is especially useful
for data without many historic values, since it only needs counts from the recent past.
earsC(sts, control = list(range = NULL, method = "C1", baseline = 7, minSigma = 0, alpha = 0.001))
earsC(sts, control = list(range = NULL, method = "C1", baseline = 7, minSigma = 0, alpha = 0.001))
sts |
object of class sts (including the |
control |
Control object
|
The three methods are different in terms of baseline used for calculation of the expected value and in terms of method for calculating the expected value:
in C1 and C2 the expected value is the moving average of counts over the sliding window of the baseline and the prediction interval depends on the standard derivation of the observed counts in this window. They can be considered as Shewhart control charts with a small sample used for calculations.
in C3 the expected value is based on the sum over 3 timepoints (assessed timepoints and the two previous timepoints) of the discrepancy between observations and predictions, predictions being calculated with the C2 method. This method has similarities with a CUSUM method due to it adding discrepancies between predictions and observations over several timepoints, but is not a CUSUM (sum over 3 timepoints, not accumulation over a whole range), even if it sometimes is presented as such.
Here is what the function does for each method, see the literature sources for further details:
For C1 the baseline are the baseline
(default 7) timepoints before the assessed timepoint t,
t-baseline
to t-1. The expected value is the mean of the baseline. An approximate
(two-sided) prediction interval is calculated based on the
assumption that the difference between the expected value and the observed
value divided by the standard derivation of counts over the sliding window,
called
, follows a standard normal distribution in the absence
of outbreaks:
where
and
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
C2 is very similar to C1 apart from a 2-day lag in the baseline definition.
In other words the baseline for C2 is baseline
(Default: 7) timepoints with a 2-day lag before the monitored
timepoint t, i.e. to
. The expected value is the mean of the baseline. An approximate
(two-sided)
prediction interval is calculated based on the
assumption that the difference between the expected value and the observed
value divided by the standard derivation of counts over the sliding window,
called
, follows a standard normal distribution in the absence
of outbreaks:
where
and
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
C3 is quite different from the two other methods, but it is based on C2.
Indeed it uses from timepoint t and the two previous timepoints.
This means the baseline consists of the timepoints
to
.
The statistic
is the sum of discrepancies between observations and
predictions.
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
An object of class sts
with the slots upperbound
and alarm
filled
by the chosen method.
M. Salmon, H. Burkom
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
#Sim data and convert to sts object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) stsObj <- disProg2sts( disProgObj) # Call earsC function and show result res1 <- earsC(stsObj, control = list(range = 20:208, method="C1")) plot(res1, legend.opts=list(horiz=TRUE, x="topright")) # Compare C3 upperbounds depending on alpha res3 <- earsC(stsObj, control = list(range = 20:208,method="C3",alpha = 0.001)) plot(upperbound(res3), type='l') res3 <- earsC(stsObj, control = list(range = 20:208,method="C3")) lines(upperbound(res3), col='red')
#Sim data and convert to sts object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) stsObj <- disProg2sts( disProgObj) # Call earsC function and show result res1 <- earsC(stsObj, control = list(range = 20:208, method="C1")) plot(res1, legend.opts=list(horiz=TRUE, x="topright")) # Compare C3 upperbounds depending on alpha res3 <- earsC(stsObj, control = list(range = 20:208,method="C3",alpha = 0.001)) plot(upperbound(res3), type='l') res3 <- earsC(stsObj, control = list(range = 20:208,method="C3")) lines(upperbound(res3), col='red')
The function as.epidata
is used to generate objects
of class "epidata"
. Objects of this class are
specific data frames containing the event history of an epidemic together
with some additional attributes. These objects are the basis for fitting
spatio-temporal epidemic intensity models with the function
twinSIR
. Their implementation is illustrated in Meyer
et al. (2017, Section 4), see vignette("twinSIR")
.
Note that the spatial information itself, i.e.
the positions of the individuals, is assumed to be constant over time.
Besides epidemics following the SIR compartmental model, also data from SI,
SIRS and SIS epidemics may be supplied.
as.epidata(data, ...) ## S3 method for class 'data.frame' as.epidata(data, t0, tE.col, tI.col, tR.col, id.col, coords.cols, f = list(), w = list(), D = dist, max.time = NULL, keep.cols = TRUE, ...) ## Default S3 method: as.epidata(data, id.col, start.col, stop.col, atRiskY.col, event.col, Revent.col, coords.cols, f = list(), w = list(), D = dist, .latent = FALSE, ...) ## S3 method for class 'epidata' print(x, ...) ## S3 method for class 'epidata' x[i, j, drop] ## S3 method for class 'epidata' update(object, f = list(), w = list(), D = dist, ...)
as.epidata(data, ...) ## S3 method for class 'data.frame' as.epidata(data, t0, tE.col, tI.col, tR.col, id.col, coords.cols, f = list(), w = list(), D = dist, max.time = NULL, keep.cols = TRUE, ...) ## Default S3 method: as.epidata(data, id.col, start.col, stop.col, atRiskY.col, event.col, Revent.col, coords.cols, f = list(), w = list(), D = dist, .latent = FALSE, ...) ## S3 method for class 'epidata' print(x, ...) ## S3 method for class 'epidata' x[i, j, drop] ## S3 method for class 'epidata' update(object, f = list(), w = list(), D = dist, ...)
data |
For the |
t0 , max.time
|
observation period. In the resulting |
tE.col , tI.col , tR.col
|
single numeric or character indexes of the time columns in
|
id.col |
single numeric or character index of the |
start.col |
single index of the |
stop.col |
single index of the |
atRiskY.col |
single index of the |
event.col |
single index of the |
Revent.col |
single index of the |
coords.cols |
indexes of the |
f |
a named list of vectorized functions for a
distance-based force of infection.
The functions must interact elementwise on a (distance) matrix |
w |
a named list of vectorized functions for extra
covariate-based weights |
D |
either a function to calculate the distances between the individuals
with locations taken from |
keep.cols |
logical indicating if all columns in |
.latent |
(internal) logical indicating whether to allow for latent periods (EXPERIMENTAL). Otherwise (default), the function verifies that an event (i.e., switching to the I state) only happens when the respective individual is at risk (i.e., in the S state). |
x , object
|
an object of class |
... |
arguments passed to |
i , j , drop
|
arguments passed to |
The print
method for objects of class "epidata"
simply prints
the data frame with a small header containing the time range of the observed
epidemic and the number of infected individuals. Usually, the data frames
are quite long, so the summary method summary.epidata
might be
useful. Also, indexing/subsetting "epidata"
works exactly as for
data.frame
s, but there is an own method, which
assures consistency of the resulting "epidata"
or drops this class, if
necessary.
The update
-method can be used to add or replace distance-based
(f
) or covariate-based (w
) epidemic variables in an
existing "epidata"
object.
SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.
a data.frame
with the columns "BLOCK"
, "id"
,
"start"
, "stop"
, "atRiskY"
, "event"
,
"Revent"
and the coordinate columns (with the original names from
data
), which are all obligatory. These columns are followed by any
remaining columns of the input data
. Last but not least, the newly
generated columns with epidemic variables corresponding to the functions
in the list f
are appended, if length(f)
> 0.
The data.frame
is given the additional attributes
"eventTimes" |
numeric vector of infection time points (sorted chronologically). |
"timeRange" |
numeric vector of length 2: |
"coords.cols" |
numeric vector containing the column indices of the coordinate columns in the resulting data frame. |
"f" |
this equals the argument |
"w" |
this equals the argument |
The column name "BLOCK"
is a reserved name. This column will be
added automatically at conversion and the resulting data frame will be
sorted by this column and by id. Also the names "id"
, "start"
,
"stop"
, "atRiskY"
, "event"
and "Revent"
are
reserved for the respective columns only.
Sebastian Meyer
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
The hagelloch
data as an example.
The plot
and the
summary
method for class "epidata"
.
Furthermore, the function animate.epidata
for the animation of
epidemics.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
Function simEpidata
for the simulation of epidemic data.
data("hagelloch") # see help("hagelloch") for a description head(hagelloch.df) ## convert the original data frame to an "epidata" event history myEpi <- as.epidata(hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), keep.cols = c("SEX", "AGE", "CL")) str(myEpi) head(as.data.frame(myEpi)) # "epidata" has event history format summary(myEpi) # see 'summary.epidata' plot(myEpi) # see 'plot.epidata' and also 'animate.epidata' ## add distance- and covariate-based weights for the force of infection ## in a twinSIR model, see vignette("twinSIR") for a description myEpi <- update(myEpi, f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ) ) ## this is now identical to the prepared hagelloch "epidata" stopifnot(all.equal(myEpi, hagelloch))
data("hagelloch") # see help("hagelloch") for a description head(hagelloch.df) ## convert the original data frame to an "epidata" event history myEpi <- as.epidata(hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), keep.cols = c("SEX", "AGE", "CL")) str(myEpi) head(as.data.frame(myEpi)) # "epidata" has event history format summary(myEpi) # see 'summary.epidata' plot(myEpi) # see 'plot.epidata' and also 'animate.epidata' ## add distance- and covariate-based weights for the force of infection ## in a twinSIR model, see vignette("twinSIR") for a description myEpi <- update(myEpi, f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ) ) ## this is now identical to the prepared hagelloch "epidata" stopifnot(all.equal(myEpi, hagelloch))
Function for the animation of epidemic data, i.e. objects inheriting from
class "epidata"
. This only works with 1- or 2-dimensional coordinates
and is not useful if some individuals share the same coordinates
(overlapping). There are two types of animation, see argument
time.spacing
. Besides the direct plotting in the R session, it is
also possible to generate a sequence of graphics files to create animations
outside R.
## S3 method for class 'summary.epidata' animate(object, main = "An animation of the epidemic", pch = 19, col = c(3, 2, gray(0.6)), time.spacing = NULL, sleep = quote(5/.nTimes), legend.opts = list(), timer.opts = list(), end = NULL, generate.snapshots = NULL, ...) ## S3 method for class 'epidata' animate(object, ...)
## S3 method for class 'summary.epidata' animate(object, main = "An animation of the epidemic", pch = 19, col = c(3, 2, gray(0.6)), time.spacing = NULL, sleep = quote(5/.nTimes), legend.opts = list(), timer.opts = list(), end = NULL, generate.snapshots = NULL, ...) ## S3 method for class 'epidata' animate(object, ...)
object |
an object inheriting from class |
main |
a main title for the plot, see also |
pch , col
|
vectors of length 3 specifying the point symbols and colors for
susceptible, infectious and removed individuals (in this order).
The vectors are recycled if necessary.
By default, susceptible individuals are marked as filled green circles,
infectious individuals as filled red circles and removed individuals as
filled gray circles. Note that the symbols are iteratively drawn
(overlaid) in the same plotting region as time proceeds.
For information about the possible values of |
time.spacing |
time interval for the animation steps. If |
sleep |
time in seconds to |
legend.opts |
either a list of arguments passed to the
|
timer.opts |
either a list of arguments passed to the
Note that the argument |
end |
ending time of the animation in case of |
generate.snapshots |
By default (
will store the animation steps in pdf-files in the current
working directory, where the file names each end with the time point
represented by the corresponding plot. Because the variables |
... |
further graphical parameters passed to the basic call of |
Sebastian Meyer
summary.epidata
for the data, on which the plot is based.
plot.epidata
for plotting the evolution of an epidemic by
the numbers of susceptible, infectious and removed individuals.
The contributed R package animation.
data("hagelloch") (s <- summary(hagelloch)) # plot the ordering of the events only animate(s) # or: animate(hagelloch) # with timer (animate only up to t=10) animate(s, time.spacing=0.1, end=10, sleep=0.01, legend.opts=list(x="topleft")) # Such an animation can be saved in various ways using tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML({ par(bg="white") # default "transparent" is grey in some browsers animate(s, time.spacing=1, sleep=0, legend.opts=list(x="topleft"), generate.snapshots="epiani") }, use.dev=FALSE, img.name="epiani", ani.width=600, interval=0.5) setwd(oldwd) }
data("hagelloch") (s <- summary(hagelloch)) # plot the ordering of the events only animate(s) # or: animate(hagelloch) # with timer (animate only up to t=10) animate(s, time.spacing=0.1, end=10, sleep=0.01, legend.opts=list(x="topleft")) # Such an animation can be saved in various ways using tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML({ par(bg="white") # default "transparent" is grey in some browsers animate(s, time.spacing=1, sleep=0, legend.opts=list(x="topleft"), generate.snapshots="epiani") }, use.dev=FALSE, img.name="epiani", ani.width=600, interval=0.5) setwd(oldwd) }
"epidata"
Objects
This function modifies an object inheriting from class "epidata"
such
that it features the specified stop time points. For this purpose, the time
interval in the event history into which the new stop falls will be split
up into two parts, one block for the time period until the new stop – where
no infection or removal occurs – and the other block for the time period
from the new stop to the end of the original interval.
Main application is to enable the use of knots
in twinSIR
, which
are not existing stop time points in the "epidata"
object.
intersperse(epidata, stoptimes, verbose = FALSE)
intersperse(epidata, stoptimes, verbose = FALSE)
epidata |
an object inheriting from class |
stoptimes |
a numeric vector of time points inside the observation period of the
|
verbose |
logical indicating if a |
an object of the same class as epidata
with additional time blocks
for any new stoptimes
.
Sebastian Meyer
as.epidata.epidataCS
where this function is used.
data("hagelloch") subset(hagelloch, start < 25 & stop > 25 & id %in% 9:13, select = 1:7) # there is no "stop" time at 25, but we can add this extra stop nrow(hagelloch) moreStopsEpi <- intersperse(hagelloch, stoptimes = 25) nrow(moreStopsEpi) subset(moreStopsEpi, (stop == 25 | start == 25) & id %in% 9:13, select = 1:7)
data("hagelloch") subset(hagelloch, start < 25 & stop > 25 & id %in% 9:13, select = 1:7) # there is no "stop" time at 25, but we can add this extra stop nrow(hagelloch) moreStopsEpi <- intersperse(hagelloch, stoptimes = 25) nrow(moreStopsEpi) subset(moreStopsEpi, (stop == 25 | start == 25) & id %in% 9:13, select = 1:7)
Functions for plotting the evolution of epidemics. The plot
methods for class
es "epidata"
and
"summary.epidata"
plots the numbers of susceptible, infectious and
recovered (= removed) individuals by step functions along the time axis. The
function stateplot
shows individual state changes along the time axis.
## S3 method for class 'summary.epidata' plot(x, lty = c(2, 1, 3), lwd = 2, col = c("#1B9E77", "#D95F02", "#7570B3"), col.hor = col, col.vert = col, xlab = "Time", ylab = "Number of individuals", xlim = NULL, ylim = NULL, legend.opts = list(), do.axis4 = NULL, panel.first = grid(), rug.opts = list(), which.rug = c("infections", "removals", "susceptibility", "all"), ...) ## S3 method for class 'epidata' plot(x, ...) stateplot(x, id, ...)
## S3 method for class 'summary.epidata' plot(x, lty = c(2, 1, 3), lwd = 2, col = c("#1B9E77", "#D95F02", "#7570B3"), col.hor = col, col.vert = col, xlab = "Time", ylab = "Number of individuals", xlim = NULL, ylim = NULL, legend.opts = list(), do.axis4 = NULL, panel.first = grid(), rug.opts = list(), which.rug = c("infections", "removals", "susceptibility", "all"), ...) ## S3 method for class 'epidata' plot(x, ...) stateplot(x, id, ...)
x |
an object inheriting from class |
lty , lwd
|
vectors of length 3 containing the line types and widths, respectively, for
the numbers of susceptible, infectious and removed individuals (in this
order). By default, all lines have width 1 and the line types are dashed
(susceptible), solid (infectious) and dotted (removed), respectively. To
omit the drawing of a specific line, just set the corresponding entry in
|
col , col.hor , col.vert
|
vectors of length 3 containing the line colors for the numbers of
susceptible, infectious and removed individuals (in this order).
|
xlab , ylab
|
axis labels, default to "Time" and "Number of individuals", respectively. |
xlim , ylim
|
the x and y limits of the plot in the form |
legend.opts |
if this is a list (of arguments for the
|
do.axis4 |
logical indicating if the final numbers of susceptible and removed
individuals should be indicated on the right axis. The default |
panel.first |
an expression to be evaluated after the plot axes are set up but before any plotting takes place. By default, a standard grid is drawn. |
rug.opts |
either a list of arguments passed to the function |
which.rug |
By default, tick marks are drawn at the time points of infections.
Alternatively, one can choose to mark only |
id |
single character string or factor of length 1 specifying the individual for
which the |
... |
For |
plot.summary.epidata
(and plot.epidata
) invisibly returns the
matrix used for plotting, which contains the evolution of the three
counters.stateplot
invisibly returns the function, which was plotted,
typically of class "stepfun"
, but maybe of class "function"
,
if no events have been observed for the individual in question (then the
function always returns the initial state). The vertical axis of
stateplot
can range from 1 to 3, where 1 corresponds to
Susceptible, 2 to Infectious and 3 to Removed.
Sebastian Meyer
summary.epidata
for the data, on which the plots are based.
animate.epidata
for the animation of epidemics.
data("hagelloch") (s <- summary(hagelloch)) # rudimentary stateplot stateplot(s, id = "187") # evolution of the epidemic plot(s)
data("hagelloch") (s <- summary(hagelloch)) # rudimentary stateplot stateplot(s, id = "187") # evolution of the epidemic plot(s)
The summary
method for class
"epidata"
gives an overview of the epidemic. Its
print
method shows the type of the epidemic, the time range, the
total number of individuals, the initially and never infected individuals and
the size of the epidemic. An excerpt of the returned counters
data
frame is also printed (see the Value section below).
## S3 method for class 'epidata' summary(object, ...) ## S3 method for class 'summary.epidata' print(x, ...)
## S3 method for class 'epidata' summary(object, ...) ## S3 method for class 'summary.epidata' print(x, ...)
object |
an object inheriting from class |
x |
an object inheriting from class |
... |
unused (argument of the generic). |
A list with the following components:
type |
character string. Compartmental type of the epidemic, i.e. one of "SIR", "SI", "SIS" or "SIRS". |
size |
integer. Size of the epidemic, i.e. the number of initially susceptible individuals, which became infected during the course of the epidemic. |
initiallyInfected |
factor (with the same levels as the |
neverInfected |
factor (with the same levels as the |
coordinates |
numeric matrix of individual coordinates with as many rows as there are
individuals and one column for each spatial dimension. The row names of
the matrix are the |
byID |
data frame with time points of infection and optionally removal and
re-susceptibility (depending on the |
counters |
data frame containing all events (S, I and R) ordered by time. The columns
are |
Sebastian Meyer
as.epidata
for generating objects of class "epidata"
.
data("hagelloch") s <- summary(hagelloch) s # uses the print method for summary.epidata names(s) # components of the list 's' # positions of the individuals plot(s$coordinates) # events by id head(s$byID)
data("hagelloch") s <- summary(hagelloch) s # uses the print method for summary.epidata names(s) # components of the list 's' # positions of the individuals plot(s$coordinates) # events by id head(s$byID)
Data structure for continuous spatio-temporal event
data, e.g. individual case reports of an infectious disease.
Apart from the actual events
, the class simultaneously
holds a spatio-temporal grid of endemic covariates (similar to
disease mapping) and a representation of the observation region.
The "epidataCS"
class is the basis for fitting
spatio-temporal endemic-epidemic intensity models with the function
twinstim
(Meyer et al., 2012).
The implementation is described in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes), nCircle2Poly = 32L, T = NULL, clipper = "polyclip", verbose = interactive()) ## S3 method for class 'epidataCS' print(x, n = 6L, digits = getOption("digits"), ...) ## S3 method for class 'epidataCS' nobs(object, ...) ## S3 method for class 'epidataCS' head(x, n = 6L, ...) ## S3 method for class 'epidataCS' tail(x, n = 6L, ...) ## S3 method for class 'epidataCS' x[i, j, ..., drop = TRUE] ## S3 method for class 'epidataCS' subset(x, subset, select, drop = TRUE, ...) ## S3 method for class 'epidataCS' marks(x, coords = TRUE, ...) ## S3 method for class 'epidataCS' summary(object, ...) ## S3 method for class 'summary.epidataCS' print(x, ...) ## S3 method for class 'epidataCS' as.stepfun(x, ...) getSourceDists(object, dimension = c("space", "time"))
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes), nCircle2Poly = 32L, T = NULL, clipper = "polyclip", verbose = interactive()) ## S3 method for class 'epidataCS' print(x, n = 6L, digits = getOption("digits"), ...) ## S3 method for class 'epidataCS' nobs(object, ...) ## S3 method for class 'epidataCS' head(x, n = 6L, ...) ## S3 method for class 'epidataCS' tail(x, n = 6L, ...) ## S3 method for class 'epidataCS' x[i, j, ..., drop = TRUE] ## S3 method for class 'epidataCS' subset(x, subset, select, drop = TRUE, ...) ## S3 method for class 'epidataCS' marks(x, coords = TRUE, ...) ## S3 method for class 'epidataCS' summary(object, ...) ## S3 method for class 'summary.epidataCS' print(x, ...) ## S3 method for class 'epidataCS' as.stepfun(x, ...) getSourceDists(object, dimension = c("space", "time"))
events |
a
The |
stgrid |
a
The remaining columns are endemic covariates.
Note that the column name |
W |
an object of class |
qmatrix |
a square indicator matrix (0/1 or |
nCircle2Poly |
accuracy (number of edges) of the polygonal approximation of a circle,
see |
T |
end of observation period (i.e. last |
clipper |
polygon clipping engine to use for calculating the
|
verbose |
logical indicating if status messages should be printed
during input checking and |
x |
an object of class |
n |
a single integer. If positive, the first ( |
digits |
minimum number of significant digits to be printed in values. |
i , j , drop
|
arguments passed to the
|
... |
unused (arguments of the generics) with a few exceptions:
The |
subset , select
|
arguments used to subset the |
coords |
logical indicating if the data frame of event marks
returned by |
object |
an object of class |
dimension |
the distances of all events to their potential source
events can be computed in either the |
The function as.epidataCS
is used to generate objects of class
"epidataCS"
, which is the data structure required for
twinstim
models.
The [
-method for class "epidataCS"
ensures that the subsetted object will be valid, for instance, it
updates the auxiliary list of potential transmission paths stored
in the object. The [
-method is used in
subset.epidataCS
, which is implemented similar to
subset.data.frame
.
The print
method for "epidataCS"
prints some metadata
of the epidemic, e.g., the observation period, the dimensions of the
spatio-temporal grid, the types of events, and the total number of
events. By default, it also prints the first n = 6
rows of the
events
.
An object of class "epidataCS"
is a list containing the
following components:
events |
a
|
stgrid |
a |
W |
a |
qmatrix |
see the above description of the argument. The
|
The nobs
-method returns the number of events.
The head
and tail
methods subset the epidemic data using
the extraction method ([
), i.e. they return an object of class
"epidataCS"
, which only contains (all but) the first/last
n
events.
For the "epidataCS"
class, the method of the generic function
marks
defined by the spatstat.geom package
returns a data.frame
of the event marks (actually also
including time and location of the events), disregarding endemic
covariates and the auxiliary columns from the events
component
of the "epidataCS"
object.
The summary
method (which has again a print
method)
returns a list of metadata, event data, the tables of tiles and types,
a step function of the number of infectious individuals over time
($counter
), i.e., the result of the
as.stepfun
-method for "epidataCS"
, and the number
of potential sources of transmission for each event ($nSources
)
which is based on the given maximum interaction ranges eps.t
and eps.s
.
Since the observation region W
defines the integration domain
in the point process likelihood,
the more detailed the polygons of W
are the longer it will
take to fit a twinstim
. You are advised to
sacrifice some shape details for speed by reducing the polygon
complexity, for example via the mapshaper
JavaScript library
wrapped by the R package rmapshaper, or via
simplify.owin
.
Sebastian Meyer
Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
vignette("twinstim")
.
plot.epidataCS
for plotting, and
animate.epidataCS
for the animation of such an epidemic.
There is also an update
method for the
"epidataCS"
class.
To re-extract the events
point pattern from "epidataCS"
,
use as(object, "SpatialPointsDataFrame")
.
It is possible to convert an "epidataCS"
point pattern to
an "epidata"
object (as.epidata.epidataCS
),
or to aggregate the events into an "sts"
object
(epidataCS2sts
).
## load "imdepi" example data (which is an object of class "epidataCS") data("imdepi") ## print and summary print(imdepi, n=5, digits=2) print(s <- summary(imdepi)) plot(s$counter, # same as 'as.stepfun(imdepi)' xlab = "Time [days]", ylab="Number of infectious individuals", main=paste("Time course of the number of infectious individuals", "assuming an infectious period of 30 days", sep="\n")) plot(table(s$nSources), xlab="Number of \"close\" infective individuals", ylab="Number of events", main=paste("Distribution of the number of potential sources", "assuming an interaction range of 200 km and 30 days", sep="\n")) ## the summary object contains further information str(s) ## a histogram of the spatial distances to potential source events ## (i.e., to events of the previous eps.t=30 days within eps.s=200 km) sourceDists_space <- getSourceDists(imdepi, "space") hist(sourceDists_space); rug(sourceDists_space) ## internal structure of an "epidataCS"-object str(imdepi, max.level=4) ## see help("imdepi") for more info on the data set ## extraction methods subset the 'events' component imdepi[101:200,] head(imdepi, n=1) # only first event tail(imdepi, n=4) # only last 4 events subset(imdepi, type=="B") # only events of type B ## see help("plot.epidataCS") for convenient plot-methods for "epidataCS" ### ### reconstruct the "imdepi" object ### ## observation region load(system.file("shapes", "districtsD.RData", package="surveillance"), verbose = TRUE) ## extract point pattern of events from the "imdepi" data ## a) as a data frame with coordinate columns via marks() eventsData <- marks(imdepi) ## b) as a Spatial object via the coerce-method events <- as(imdepi, "SpatialPointsDataFrame") ## plot observation region with events plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]") points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type)) legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2) summary(events) ## space-time grid with endemic covariates head(stgrid <- imdepi$stgrid[,-1]) ## reconstruct the "imdepi" object from its components myimdepi <- as.epidataCS(events = events, stgrid = stgrid, W = stateD, qmatrix = diag(2), nCircle2Poly = 16) ## This reconstructed object should be equal to 'imdepi' as long as the internal ## structures of the embedded classes ("owin", "SpatialPolygons", ...), and ## the calculation of the influence regions by "polyclip" have not changed: all.equal(imdepi, myimdepi)
## load "imdepi" example data (which is an object of class "epidataCS") data("imdepi") ## print and summary print(imdepi, n=5, digits=2) print(s <- summary(imdepi)) plot(s$counter, # same as 'as.stepfun(imdepi)' xlab = "Time [days]", ylab="Number of infectious individuals", main=paste("Time course of the number of infectious individuals", "assuming an infectious period of 30 days", sep="\n")) plot(table(s$nSources), xlab="Number of \"close\" infective individuals", ylab="Number of events", main=paste("Distribution of the number of potential sources", "assuming an interaction range of 200 km and 30 days", sep="\n")) ## the summary object contains further information str(s) ## a histogram of the spatial distances to potential source events ## (i.e., to events of the previous eps.t=30 days within eps.s=200 km) sourceDists_space <- getSourceDists(imdepi, "space") hist(sourceDists_space); rug(sourceDists_space) ## internal structure of an "epidataCS"-object str(imdepi, max.level=4) ## see help("imdepi") for more info on the data set ## extraction methods subset the 'events' component imdepi[101:200,] head(imdepi, n=1) # only first event tail(imdepi, n=4) # only last 4 events subset(imdepi, type=="B") # only events of type B ## see help("plot.epidataCS") for convenient plot-methods for "epidataCS" ### ### reconstruct the "imdepi" object ### ## observation region load(system.file("shapes", "districtsD.RData", package="surveillance"), verbose = TRUE) ## extract point pattern of events from the "imdepi" data ## a) as a data frame with coordinate columns via marks() eventsData <- marks(imdepi) ## b) as a Spatial object via the coerce-method events <- as(imdepi, "SpatialPointsDataFrame") ## plot observation region with events plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]") points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type)) legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2) summary(events) ## space-time grid with endemic covariates head(stgrid <- imdepi$stgrid[,-1]) ## reconstruct the "imdepi" object from its components myimdepi <- as.epidataCS(events = events, stgrid = stgrid, W = stateD, qmatrix = diag(2), nCircle2Poly = 16) ## This reconstructed object should be equal to 'imdepi' as long as the internal ## structures of the embedded classes ("owin", "SpatialPolygons", ...), and ## the calculation of the influence regions by "polyclip" have not changed: all.equal(imdepi, myimdepi)
"epidataCS"
to "epidata"
or "sts"
Continuous-time continuous-space epidemic data stored in an object of
class "epidataCS"
can be aggregated in space or in space
and time yielding an object of class "epidata"
or
"sts"
for use of twinSIR
or
hhh4
modelling, respectively.
## aggregation in space and time over 'stgrid' for use of 'hhh4' models epidataCS2sts(object, freq, start, neighbourhood, tiles = NULL, popcol.stgrid = NULL, popdensity = TRUE) ## aggregation in space for use of 'twinSIR' models ## S3 method for class 'epidataCS' as.epidata(data, tileCentroids, eps = 0.001, ...)
## aggregation in space and time over 'stgrid' for use of 'hhh4' models epidataCS2sts(object, freq, start, neighbourhood, tiles = NULL, popcol.stgrid = NULL, popdensity = TRUE) ## aggregation in space for use of 'twinSIR' models ## S3 method for class 'epidataCS' as.epidata(data, tileCentroids, eps = 0.001, ...)
object , data
|
an object of class |
freq , start
|
see the description of the |
neighbourhood |
binary adjacency or neighbourhood-order matrix of the regions
( |
tiles |
object inheriting from |
popcol.stgrid |
single character or numeric value indexing the
column in |
popdensity |
logical indicating if the column referenced by
|
tileCentroids |
a coordinate matrix of the region centroids (i.e., the result of
|
eps |
numeric scalar for breaking tied removal and infection times between different
individuals (tiles), which might occur during conversion from
|
... |
unused (argument of the generic). |
Conversion to "sts"
only makes sense if the time
intervals (BLOCK
s) of the stgrid
are regularly spaced
(to give freq
intervals per year). Note that events of the
prehistory (not covered by stgrid
) are not included in the
resulting sts
object.
Some comments on the conversion to "epidata"
:
the conversion results into SIS epidemics only,
i.e. the at-risk indicator is set to 1 immediately after
recovery. A tile is considered infective if at least one individual
within the tile is infective, otherwise it is susceptible.
The lengths of the infectious periods are taken from
data$events$eps.t
. There will be no f
columns in the resulting
"epidata"
. These must be generated by a subsequent call to
as.epidata
with desired f
.
epidataCS2sts
: an object of class "sts"
representing the multivariate time-series of the number of
cases aggregated over stgrid
.
as.epidata.epidataCS
: an object of class
"epidata"
representing an SIS epidemic in form of a
multivariate point process (one for each region/tile
).
Sebastian Meyer
linkS4class{sts}
and hhh4
.
data("imdepi") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## convert imdepi point pattern into multivariate time series imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, # not needed here tiles = districtsD) ## check the overall number of events by district stopifnot(all.equal(colSums(observed(imdsts)), c(table(imdepi$events$tile)))) ## compare plots of monthly number of cases opar <- par(mfrow = c(2, 1)) plot(imdepi, "time") plot(imdsts, type = observed ~ time) par(opar) ## plot number of cases by district in Bavaria (municipality keys 09xxx) imd09 <- imdsts[, grep("^09", colnames(imdsts), value = TRUE), drop = TRUE] plot(imd09, type = observed ~ unit) ## also test conversion to an SIS event history ("epidata") of the "tiles" if (requireNamespace("intervals")) { imdepi_short <- subset(imdepi, time < 50) # to reduce the runtime imdepi_short$stgrid <- subset(imdepi_short$stgrid, start < 50) imdepidata <- as.epidata(imdepi_short, tileCentroids = coordinates(districtsD)) summary(imdepidata) }
data("imdepi") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## convert imdepi point pattern into multivariate time series imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, # not needed here tiles = districtsD) ## check the overall number of events by district stopifnot(all.equal(colSums(observed(imdsts)), c(table(imdepi$events$tile)))) ## compare plots of monthly number of cases opar <- par(mfrow = c(2, 1)) plot(imdepi, "time") plot(imdsts, type = observed ~ time) par(opar) ## plot number of cases by district in Bavaria (municipality keys 09xxx) imd09 <- imdsts[, grep("^09", colnames(imdsts), value = TRUE), drop = TRUE] plot(imd09, type = observed ~ unit) ## also test conversion to an SIS event history ("epidata") of the "tiles" if (requireNamespace("intervals")) { imdepi_short <- subset(imdepi, time < 50) # to reduce the runtime imdepi_short$stgrid <- subset(imdepi_short$stgrid, start < 50) imdepidata <- as.epidata(imdepi_short, tileCentroids = coordinates(districtsD)) summary(imdepidata) }
Function for the animation of continuous-time continuous-space
epidemic data, i.e. objects inheriting from class "epidataCS"
.
There are three types of animation, see argument time.spacing
.
Besides the on-screen plotting in the interactive R session, it is possible
and recommended to redirect the animation to an off-screen graphics
device using the contributed R package animation. For instance,
the animation can be watched and navigated in a web browser via
saveHTML
(see Examples).
## S3 method for class 'epidataCS' animate(object, interval = c(0,Inf), time.spacing = NULL, nmax = NULL, sleep = NULL, legend.opts = list(), timer.opts = list(), pch = 15:18, col.current = "red", col.I = "#C16E41", col.R = "#B3B3B3", col.influence = NULL, main = NULL, verbose = interactive(), ...)
## S3 method for class 'epidataCS' animate(object, interval = c(0,Inf), time.spacing = NULL, nmax = NULL, sleep = NULL, legend.opts = list(), timer.opts = list(), pch = 15:18, col.current = "red", col.I = "#C16E41", col.R = "#B3B3B3", col.influence = NULL, main = NULL, verbose = interactive(), ...)
object |
an object inheriting from class |
interval |
time range of the animation. |
time.spacing |
time interval for the animation steps. |
nmax |
maximum number of snapshots to generate. The default |
sleep |
numeric scalar specifying the artificial pause in seconds between two
time points (using |
pch , col
|
vectors of length equal to the number of event types specifying the point symbols and colors for events to plot (in this order). The vectors are recycled if necessary. |
legend.opts |
either a list of arguments passed to the |
timer.opts |
either a list of arguments passed to the
Note that the argument |
col.current |
color of events when occurring (new). |
col.I |
color once infectious. |
col.R |
color event has once “recovered”. If |
col.influence |
color with which the influence region is drawn. Use
|
main |
optional main title placed above the map. |
verbose |
logical specifying if a (textual) progress bar should
be shown during snapshot generation. This is especially useful if
the animation is produced within |
... |
further graphical parameters passed to the |
Sebastian Meyer with documentation contributions by Michael Höhle
plot.epidataCS
for plotting the numbers of events by time
(aggregated over space) or the locations of the events in the
observation region W
(aggregated over time).
The contributed R package animation.
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Not run: # Animate the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, nmax=Inf, sleep=0.1) # Sequential animation of type B events during the first year animate(imdepiB, interval=c(0,365), time.spacing=NULL, sleep=0.1) # Animate the whole time range but with nmax=20 snapshots only animate(imdepiB, time.spacing=NA, nmax=20, sleep=0.1) ## End(Not run) # Such an animation can be saved in various ways using the tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(imdepiB, interval = c(0,365), time.spacing = 7), nmax = Inf, interval = 0.2, loop = FALSE, title = "Animation of the first year of type B events") setwd(oldwd) }
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Not run: # Animate the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, nmax=Inf, sleep=0.1) # Sequential animation of type B events during the first year animate(imdepiB, interval=c(0,365), time.spacing=NULL, sleep=0.1) # Animate the whole time range but with nmax=20 snapshots only animate(imdepiB, time.spacing=NA, nmax=20, sleep=0.1) ## End(Not run) # Such an animation can be saved in various ways using the tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(imdepiB, interval = c(0,365), time.spacing = 7), nmax = Inf, interval = 0.2, loop = FALSE, title = "Animation of the first year of type B events") setwd(oldwd) }
"epidataCS"
Monte Carlo tests for space-time interaction (epitest
)
use the distribution of some test statistic under the null hypothesis
of no space-time interaction. For this purpose, the function
permute.epidataCS
randomly permutes the time or space labels of
the events.
permute.epidataCS(x, what = c("time", "space"), keep)
permute.epidataCS(x, what = c("time", "space"), keep)
x |
an object of class |
what |
character string determining what to permute: time points (default) or locations. |
keep |
optional logical expression to be evaluated in the context
of |
the permuted "epidataCS"
object.
Sebastian Meyer
data("imdepi") set.seed(3) permepi <- permute.epidataCS(imdepi, what = "time", keep = time <= 30) print(imdepi, n = 8) print(permepi, n = 8) ## the first 6 events are kept (as are all row.names), ## the time labels of the remaining events are shuffled ## (and events then again sorted by time), ## the marginal temporal distribution is unchanged
data("imdepi") set.seed(3) permepi <- permute.epidataCS(imdepi, what = "time", keep = time <= 30) print(imdepi, n = 8) print(permepi, n = 8) ## the first 6 events are kept (as are all row.names), ## the time labels of the remaining events are shuffled ## (and events then again sorted by time), ## the marginal temporal distribution is unchanged
The plot
method for class "epidataCS"
either plots the
number of events along the time axis (epidataCSplot_time
) as a
hist()
, or the locations of the events in the observation region
W
(epidataCSplot_space
).
The spatial plot can be enriched with tile-specific color levels to
indicate attributes such as the population (using spplot
).
## S3 method for class 'epidataCS' plot(x, aggregate = c("time", "space"), subset, by = type, ...) epidataCSplot_time(x, subset, by = type, t0.Date = NULL, breaks = "stgrid", freq = TRUE, col = rainbow(nTypes), cumulative = list(), add = FALSE, mar = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = NULL, main = NULL, panel.first = abline(h=axTicks(2), lty=2, col="grey"), legend.types = list(), ...) epidataCSplot_space(x, subset, by = type, tiles = x$W, pop = NULL, cex.fun = sqrt, points.args = list(), add = FALSE, legend.types = list(), legend.counts = list(), sp.layout = NULL, ...)
## S3 method for class 'epidataCS' plot(x, aggregate = c("time", "space"), subset, by = type, ...) epidataCSplot_time(x, subset, by = type, t0.Date = NULL, breaks = "stgrid", freq = TRUE, col = rainbow(nTypes), cumulative = list(), add = FALSE, mar = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = NULL, main = NULL, panel.first = abline(h=axTicks(2), lty=2, col="grey"), legend.types = list(), ...) epidataCSplot_space(x, subset, by = type, tiles = x$W, pop = NULL, cex.fun = sqrt, points.args = list(), add = FALSE, legend.types = list(), legend.counts = list(), sp.layout = NULL, ...)
x |
an object of class |
aggregate |
character, one of |
subset |
logical expression indicating a subset of events to consider for
plotting: missing values are taken as false. Note that the
expression is evaluated in the data frame of event marks
( |
... |
in the basic |
by |
an expression evaluated in |
t0.Date |
the beginning of the observation period
|
breaks |
a specification of the histogram break points, see
|
freq |
see |
col |
fill colour for the bars of the histogram, defaults to
the vector of |
cumulative |
if a list (of style options),
lines for the cumulative number of events (per type) will be
added to the plot. Possible options are |
add |
logical (default: |
mar |
see |
xlim , ylim
|
|
xlab , ylab
|
axis labels (with sensible defaults). |
main |
main title of the plot (defaults to no title). |
panel.first |
expression that should be evaluated after the plotting window has been set up but before the histogram is plotted. Defaults to adding horizontal grid lines. |
legend.types |
if a list (of arguments for |
tiles |
the observation region |
pop |
if |
cex.fun |
function which takes a vector of counts of events
at each unique location and returns a (vector of) |
points.args |
a list of (type-specific) graphical parameters
for |
legend.counts |
if a list (of arguments for
|
sp.layout |
optional list of additional layout items in case
|
For aggregate="time"
(i.e., epidataCSplot_time
) the data
of the histogram (as returned by hist
),
and for aggregate="space"
(i.e., epidataCSplot_space
)
NULL
, invisibly, or the trellis.object
generated by
spplot
(if pop
is non-NULL
).
Sebastian Meyer
data("imdepi") ## show the occurrence of events along time plot(imdepi, "time", main = "Histogram of event time points") plot(imdepi, "time", by = NULL, main = "Aggregated over both event types") ## show the distribution in space plot(imdepi, "space", lwd = 2, col = "lavender") ## with the district-specific population density in the background, ## a scale bar, and customized point style load(system.file("shapes", "districtsD.RData", package = "surveillance")) districtsD$log10popdens <- log10(districtsD$POPULATION/districtsD$AREA) keylabels <- (c(1,2,5) * rep(10^(1:3), each=3))[-1] plot(imdepi, "space", tiles = districtsD, pop = "log10popdens", ## modify point style for better visibility on gray background points.args = list(pch=c(1,3), col=c("orangered","blue"), lwd=2), ## metric scale bar, see proj4string(imdepi$W) sp.layout = layout.scalebar(imdepi$W, scale=100, labels=c("0","100 km")), ## gray scale for the population density and white borders col.regions = gray.colors(100, start=0.9, end=0.1), col = "white", ## color key is equidistant on log10(popdens) scale at = seq(1.3, 3.7, by=0.05), colorkey = list(labels=list(at=log10(keylabels), labels=keylabels), title=expression("Population density per " * km^2)))
data("imdepi") ## show the occurrence of events along time plot(imdepi, "time", main = "Histogram of event time points") plot(imdepi, "time", by = NULL, main = "Aggregated over both event types") ## show the distribution in space plot(imdepi, "space", lwd = 2, col = "lavender") ## with the district-specific population density in the background, ## a scale bar, and customized point style load(system.file("shapes", "districtsD.RData", package = "surveillance")) districtsD$log10popdens <- log10(districtsD$POPULATION/districtsD$AREA) keylabels <- (c(1,2,5) * rep(10^(1:3), each=3))[-1] plot(imdepi, "space", tiles = districtsD, pop = "log10popdens", ## modify point style for better visibility on gray background points.args = list(pch=c(1,3), col=c("orangered","blue"), lwd=2), ## metric scale bar, see proj4string(imdepi$W) sp.layout = layout.scalebar(imdepi$W, scale=100, labels=c("0","100 km")), ## gray scale for the population density and white borders col.regions = gray.colors(100, start=0.9, end=0.1), col = "white", ## color key is equidistant on log10(popdens) scale at = seq(1.3, 3.7, by=0.05), colorkey = list(labels=list(at=log10(keylabels), labels=keylabels), title=expression("Population density per " * km^2)))
"epidataCS"
The update
method for the "epidataCS"
class
may be used to modify the hyperparameters (
eps.t
)
and (
eps.s
), the indicator matrix qmatrix
determining
possible transmission between the event types, the numerical
accuracy nCircle2Poly
of the polygonal approximation, and
the endemic covariates from stgrid
(including the time intervals).
The update method will also update the auxiliary information contained
in an "epidataCS"
object accordingly, e.g., the vector of potential
sources of each event, the influence regions, or the endemic covariates
copied from the new stgrid
.
## S3 method for class 'epidataCS' update(object, eps.t, eps.s, qmatrix, nCircle2Poly, stgrid, ...)
## S3 method for class 'epidataCS' update(object, eps.t, eps.s, qmatrix, nCircle2Poly, stgrid, ...)
object |
an object of class |
eps.t |
numeric vector of length 1 or corresponding to the number of events in
|
eps.s |
numeric vector of length 1 or corresponding to the number of events in
|
qmatrix |
square indicator matrix (0/1 or TRUE/FALSE) for possible transmission between the event types. |
nCircle2Poly |
accuracy (number of edges) of the polygonal approximation of a circle. |
stgrid |
a new |
... |
unused (argument of the generic). |
The updated "epidataCS"
object.
Sebastian Meyer
class "epidataCS"
.
data("imdepi") ## assume different interaction ranges and simplify polygons imdepi2 <- update(imdepi, eps.t = 20, eps.s = Inf, nCircle2Poly = 16) (s <- summary(imdepi)) (s2 <- summary(imdepi2)) ## The update reduced the number of infectives (along time) ## because the length of the infectious periods is reduced. It also ## changed the set of potential sources of transmission for each ## event, since the interaction is shorter in time but wider in space ## (eps.s=Inf means interaction over the whole observation region). ## use a time-constant grid imdepi3 <- update(imdepi, stgrid = subset(imdepi$stgrid, BLOCK == 1, -stop)) (s3 <- summary(imdepi3)) # "1 time block"
data("imdepi") ## assume different interaction ranges and simplify polygons imdepi2 <- update(imdepi, eps.t = 20, eps.s = Inf, nCircle2Poly = 16) (s <- summary(imdepi)) (s2 <- summary(imdepi2)) ## The update reduced the number of infectives (along time) ## because the length of the infectious periods is reduced. It also ## changed the set of potential sources of transmission for each ## event, since the interaction is shorter in time but wider in space ## (eps.s=Inf means interaction over the whole observation region). ## use a time-constant grid imdepi3 <- update(imdepi, stgrid = subset(imdepi$stgrid, BLOCK == 1, -stop)) (s3 <- summary(imdepi3)) # "1 time block"
The fanplot()
function in surveillance wraps functionality of
the dedicated fanplot package, employing a different default style
and optionally adding point predictions and observed values.
fanplot(quantiles, probs, means = NULL, observed = NULL, start = 1, fan.args = list(), means.args = list(), observed.args = list(), key.args = NULL, xlim = NULL, ylim = NULL, log = "", xlab = "Time", ylab = "No. infected", add = FALSE, ...)
fanplot(quantiles, probs, means = NULL, observed = NULL, start = 1, fan.args = list(), means.args = list(), observed.args = list(), key.args = NULL, xlim = NULL, ylim = NULL, log = "", xlab = "Time", ylab = "No. infected", add = FALSE, ...)
quantiles |
a time x |
probs |
numeric vector of probabilities with values between 0 and 1. |
means |
(optional) numeric vector of point forecasts. |
observed |
(optional) numeric vector of observed values. |
start |
time index (x-coordinate) of the first prediction. |
fan.args |
a list of graphical parameters for the |
means.args |
a list of graphical parameters for |
observed.args |
a list of graphical parameters for |
key.args |
if a list, a color key (in |
xlim , ylim
|
axis ranges. |
log |
a character string specifying which axes are to be logarithmic,
e.g., |
xlab , ylab
|
axis labels. |
add |
logical indicating if the fan plot should be added to an existing plot. |
... |
further arguments are passed to |
NULL
(invisibly), with the side effect of drawing a fan chart.
Sebastian Meyer
the underlying fan
function in package
fanplot.
The function is used in plot.oneStepAhead
and
plot.hhh4sims
.
## artificial data example to illustrate the graphical options if (requireNamespace("fanplot")) { means <- c(18, 19, 20, 25, 26, 35, 34, 25, 19) y <- rlnorm(length(means), log(means), 0.5) quantiles <- sapply(1:99/100, qlnorm, log(means), seq(.5,.8,length.out=length(means))) ## default style with point predictions, color key and log-scale fanplot(quantiles = quantiles, probs = 1:99/100, means = means, observed = y, key.args = list(start = 1, space = .3), log = "y") ## with contour lines instead of a key, and different colors pal <- colorRampPalette(c("darkgreen", "gray93")) fanplot(quantiles = quantiles, probs = 1:99/100, observed = y, fan.args = list(fan.col = pal, ln = c(5,10,25,50,75,90,95)/100), observed.args = list(type = "b", pch = 19)) }
## artificial data example to illustrate the graphical options if (requireNamespace("fanplot")) { means <- c(18, 19, 20, 25, 26, 35, 34, 25, 19) y <- rlnorm(length(means), log(means), 0.5) quantiles <- sapply(1:99/100, qlnorm, log(means), seq(.5,.8,length.out=length(means))) ## default style with point predictions, color key and log-scale fanplot(quantiles = quantiles, probs = 1:99/100, means = means, observed = y, key.args = list(start = 1, space = .3), log = "y") ## with contour lines instead of a key, and different colors pal <- colorRampPalette(c("darkgreen", "gray93")) fanplot(quantiles = quantiles, probs = 1:99/100, observed = y, fan.args = list(fan.col = pal, ln = c(5,10,25,50,75,90,95)/100), observed.args = list(type = "b", pch = 19)) }
The function takes range
values of the surveillance time
series sts
and for each time point uses a Poisson GLM with overdispersion to
predict an upper bound on the number of counts according to the procedure by
Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed
number of counts. If the observation is above the bound, then an alarm is raised.
The implementation is illustrated in Salmon et al. (2016).
farringtonFlexible(sts, control = list( range = NULL, b = 5, w = 3, reweight = TRUE, weightsThreshold = 2.58, verbose = FALSE, glmWarnings = TRUE, alpha = 0.05, trend = TRUE, pThresholdTrend = 0.05, limit54 = c(5,4), powertrans = "2/3", fitFun = "algo.farrington.fitGLM.flexible", populationOffset = FALSE, noPeriods = 1, pastWeeksNotIncluded = NULL, thresholdMethod = "delta"))
farringtonFlexible(sts, control = list( range = NULL, b = 5, w = 3, reweight = TRUE, weightsThreshold = 2.58, verbose = FALSE, glmWarnings = TRUE, alpha = 0.05, trend = TRUE, pThresholdTrend = 0.05, limit54 = c(5,4), powertrans = "2/3", fitFun = "algo.farrington.fitGLM.flexible", populationOffset = FALSE, noPeriods = 1, pastWeeksNotIncluded = NULL, thresholdMethod = "delta"))
sts |
object of class |
control |
Control object given as a
|
The following steps are performed according to the Farrington et al. (1996) paper.
Fit of the initial model with intercept, time trend if trend
is TRUE
,
seasonal factor variable if noPeriod
is bigger than 1, and population offset if
populationOffset
is TRUE
. Initial estimation of mean and
overdispersion.
Calculation of the weights omega (correction for past outbreaks) if reweighting
is TRUE
.
The threshold for reweighting is defined in control
.
Refitting of the model
Revised estimation of overdispersion
Omission of the trend, if it is not significant
Repetition of the whole procedure
Calculation of the threshold value using the model to compute a quantile of the predictive distribution.
The method used depends on thresholdMethod
, this can either be:
One assumes that the prediction error (or a transformation of the prediction
error, depending on powertrans
), is normally distributed. The threshold is deduced from a quantile of
this normal distribution using the variance and estimate of the expected
count given by GLM, and the delta rule. The procedure takes into account both the estimation error (variance of the estimator
of the expected count in the GLM) and the prediction error (variance of the prediction error). This is the suggestion
in Farrington et al. (1996).
One assumes that the new count follows a negative binomial distribution parameterized by the expected count and the overdispersion estimated in the GLM. The threshold is deduced from a quantile of this discrete distribution. This process disregards the estimation error, though. This method was used in Noufaily, et al. (2012).
One also uses the assumption of the negative binomial sampling distribution but does not plug in the estimate of the expected count from the GLM, instead one uses a quantile from the asymptotic normal distribution of the expected count estimated in the GLM; in order to take into account both the estimation error and the prediction error.
Computation of exceedance score
Warning: monthly data containing the last day of each month as date should be analysed with epochAsDate=FALSE
in the sts
object. Otherwise February makes it impossible to find some reference time points.
An object of class sts
with the slots upperbound
and alarm
filled by appropriate output of the algorithm.
The control
slot of the input sts
is amended with the
following matrix elements, all with length(range)
rows:
Booleans indicating whether a time trend was fitted for this time point.
coefficient of the time trend in the GLM for this time point. If no trend was fitted it is equal to NA.
probability of observing a value at least equal to the observation under the null hypothesis .
expectation of the predictive distribution for each timepoint. It is only reported if the conditions for raising an alarm are met (enough cases).
input for the negative binomial distribution to get the upperbound as a quantile (either a plug-in from the GLM or a quantile from the asymptotic normal distribution of the estimator)
overdispersion of the GLM at each timepoint.
M. Salmon, M. Höhle
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
algo.farrington.fitGLM
,algo.farrington.threshold
data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) ### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS control1 <- list(range=282:312, noPeriods=1, b=4, w=3, weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05, alpha=0.1) control2 <- list(range=282:312, noPeriods=10, b=4, w=3, weightsThreshold=2.58, pastWeeksNotIncluded=26, pThresholdTrend=1, alpha=0.1) salm1 <- farringtonFlexible(salm,control=control1) salm2 <- farringtonFlexible(salm,control=control2) ### PLOT THE RESULTS y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE) plot(salm1, ylim=c(0,y.max), main='S. Newport in Germany', legend.opts=NULL) lines(1:(nrow(salm1)+1)-0.5, c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]), type="s",col='tomato4',lwd=2) lines(1:(nrow(salm2)+1)-0.5, c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]), type="s",col="blueviolet",lwd=2) legend("topleft", legend=c('Alarm','Upperbound with old options', 'Upperbound with new options'), pch=c(24,NA,NA),lty=c(NA,1,1), bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))
data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) ### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS control1 <- list(range=282:312, noPeriods=1, b=4, w=3, weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05, alpha=0.1) control2 <- list(range=282:312, noPeriods=10, b=4, w=3, weightsThreshold=2.58, pastWeeksNotIncluded=26, pThresholdTrend=1, alpha=0.1) salm1 <- farringtonFlexible(salm,control=control1) salm2 <- farringtonFlexible(salm,control=control2) ### PLOT THE RESULTS y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE) plot(salm1, ylim=c(0,y.max), main='S. Newport in Germany', legend.opts=NULL) lines(1:(nrow(salm1)+1)-0.5, c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]), type="s",col='tomato4',lwd=2) lines(1:(nrow(salm2)+1)-0.5, c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]), type="s",col="blueviolet",lwd=2) legend("topleft", legend=c('Alarm','Upperbound with old options', 'Upperbound with new options'), pch=c(24,NA,NA),lty=c(NA,1,1), bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))
Given a specification of the average run length in the (a)cceptance and (r)ejected setting determine the k and h values in a standard normal setting.
find.kh(ARLa = 500, ARLr = 7, sided = "one", method = "BFGS", verbose=FALSE)
find.kh(ARLa = 500, ARLr = 7, sided = "one", method = "BFGS", verbose=FALSE)
ARLa |
average run length in acceptance setting, aka. in control state. Specifies the number of observations before false alarm. |
ARLr |
average run length in rejection state, aka. out of control state. Specifies the number of observations before an increase is detected (i.e. detection delay) |
sided |
one-sided cusum scheme |
method |
Which method to use in the function |
verbose |
gives extra information about the root finding process |
Functions from the spc package are used in a simple univariate root finding problem.
Returns a list with reference value k and decision interval h.
if (requireNamespace("spc")) { find.kh(ARLa=500,ARLr=7,sided="one") find.kh(ARLa=500,ARLr=3,sided="one") }
if (requireNamespace("spc")) { find.kh(ARLa=500,ARLr=7,sided="one") find.kh(ARLa=500,ARLr=3,sided="one") }
Function to find a decision interval h
* for given reference value k
and desired ARL so that the
average run length for a Poisson or Binomial CUSUM with in-control
parameter
, reference value
k
and is approximately ,
i.e.
,
or larger, i.e.
.
findH(ARL0, theta0, s = 1, rel.tol = 0.03, roundK = TRUE, distr = c("poisson", "binomial"), digits = 1, FIR = FALSE, ...) hValues(theta0, ARL0, rel.tol=0.02, s = 1, roundK = TRUE, digits = 1, distr = c("poisson", "binomial"), FIR = FALSE, ...)
findH(ARL0, theta0, s = 1, rel.tol = 0.03, roundK = TRUE, distr = c("poisson", "binomial"), digits = 1, FIR = FALSE, ...) hValues(theta0, ARL0, rel.tol=0.02, s = 1, roundK = TRUE, digits = 1, distr = c("poisson", "binomial"), FIR = FALSE, ...)
ARL0 |
desired in-control ARL |
theta0 |
in-control parameter |
s |
change to detect, see details |
distr |
|
rel.tol |
relative tolerance, i.e. the search for |
digits |
the reference value |
roundK |
passed to |
FIR |
if |
... |
further arguments for the distribution function, i.e. number
of trials |
The out-of-control parameter used to determine the reference value k
is specified as:
for a Poisson variate
for a Binomial variate
findH
returns a vector and hValues
returns a matrix with elements
theta0 |
in-control parameter |
h |
decision interval |
k |
reference value |
ARL |
ARL for a CUSUM with parameters |
rel.tol |
corresponds to |
Calculates the reference value k
for a Poisson or binomial CUSUM
designed to detect a shift from to
findK(theta0, theta1, distr = c("poisson", "binomial"), roundK = FALSE, digits = 1, ...)
findK(theta0, theta1, distr = c("poisson", "binomial"), roundK = FALSE, digits = 1, ...)
theta0 |
in-control parameter |
theta1 |
out-of-control parameter |
distr |
|
digits |
the reference value |
roundK |
For discrete data and rational reference value there is only
a limited set of possible values that the CUSUM can take (and
therefore there is also only a limited set of ARLs).
If |
... |
further arguments for the distribution function, i.e. number of
trials |
Returns reference value k
.
Weekly number of influenza A & B cases in the 140 districts of the two Southern German states Bavaria and Baden-Wuerttemberg, for the years 2001 to 2008. These surveillance data have been analyzed originally by Paul and Held (2011) and more recently by Meyer and Held (2014).
data(fluBYBW)
data(fluBYBW)
An sts
object containing
observations starting from week 1 in 2001.
The population
slot contains the population fractions
of each district at 31.12.2001, obtained from the Federal Statistical
Office of Germany.
The map
slot contains an object of class
"SpatialPolygonsDataFrame"
.
Prior to surveillance version 1.6-0, data(fluBYBW)
contained a redundant last row (417) filled with zeroes only.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 6 March 2009.
Paul, M. and Held, L. (2011) Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118-1136.
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
data("fluBYBW") # Count time series plot plot(fluBYBW, type = observed ~ time) # Map of disease incidence (per 100000 inhabitants) for the year 2001 plot(fluBYBW, type = observed ~ unit, tps = 1:52, total.args = list(), population = fluBYBW@map$X31_12_01 / 100000) # the overall rate for 2001 shown in the bottom right corner is sum(observed(fluBYBW[1:52,])) / sum(fluBYBW@map$X31_12_01) * 100000 ## Not run: # Generating an animation takes a while. # Here we take the first 20 weeks of 2001 (runtime: ~3 minutes). # The full animation is available in Supplement A of Meyer and Held (2014) if (require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(fluBYBW, tps = 1:20), title="Evolution of influenza in Bayern and Baden-Wuerttemberg", ani.width=500, ani.height=600) setwd(oldwd) } ## End(Not run)
data("fluBYBW") # Count time series plot plot(fluBYBW, type = observed ~ time) # Map of disease incidence (per 100000 inhabitants) for the year 2001 plot(fluBYBW, type = observed ~ unit, tps = 1:52, total.args = list(), population = fluBYBW@map$X31_12_01 / 100000) # the overall rate for 2001 shown in the bottom right corner is sum(observed(fluBYBW[1:52,])) / sum(fluBYBW@map$X31_12_01) * 100000 ## Not run: # Generating an animation takes a while. # Here we take the first 20 weeks of 2001 (runtime: ~3 minutes). # The full animation is available in Supplement A of Meyer and Held (2014) if (require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(fluBYBW, tps = 1:20), title="Evolution of influenza in Bayern and Baden-Wuerttemberg", ani.width=500, ani.height=600) setwd(oldwd) } ## End(Not run)
An extension of format.Date
with additional formatting
strings for quarters. Used by linelist2sts
.
formatDate(x, format)
formatDate(x, format)
x |
a |
format |
a character string, see
|
a character vector representing the input date(s) x
following the format
specification.
formatDate(as.Date("2021-10-13"), "%G/%OQ/%q")
formatDate(as.Date("2021-10-13"), "%G/%OQ/%q")
Just yapf – yet another p-value formatter...
It is a wrapper around format.pval
,
such that by default eps = 1e-4
, scientific = FALSE
,
digits = if (p<10*eps) 1 else 2
, and nsmall = 2
.
formatPval(pv, eps = 1e-4, scientific = FALSE, ...)
formatPval(pv, eps = 1e-4, scientific = FALSE, ...)
pv |
a numeric vector (of p-values). |
eps |
a numerical tolerance, see |
scientific |
see |
... |
further arguments passed to |
The character vector of formatted p-values.
formatPval(c(0.9, 0.13567, 0.0432, 0.000546, 1e-8))
formatPval(c(0.9, 0.13567, 0.0432, 0.000546, 1e-8))
twinstim
as a Poisson-glm
An endemic-only twinstim
is equivalent to a Poisson
regression model for the aggregated number of events,
, by time-space-type cell. The rate of the
corresponding Poisson distribution is
,
where
is a multiplicative
offset. Thus, the
glm
function can be used to fit
an endemic-only twinstim
. However, wrapping in glm
is
usually slower.
glm_epidataCS(formula, data, ...)
glm_epidataCS(formula, data, ...)
formula |
an endemic model formula without response, comprising variables of
|
data |
an object of class |
... |
arguments passed to |
a glm
Sebastian Meyer
data("imdepi", "imdepifit") ## Fit an endemic-only twinstim() and an equivalent model wrapped in glm() fit_twinstim <- update(imdepifit, epidemic = ~0, siaf = NULL, subset = NULL, optim.args=list(control=list(trace=0)), verbose=FALSE) fit_glm <- glm_epidataCS(formula(fit_twinstim)$endemic, data = imdepi) ## Compare the coefficients cbind(twinstim = coef(fit_twinstim), glm = coef(fit_glm)) ### also compare to an equivalent endemic-only hhh4() fit ## first need to aggregate imdepi into an "sts" object load(system.file("shapes", "districtsD.RData", package="surveillance")) imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, tiles = districtsD, popcol.stgrid = "popdensity") ## determine the correct offset to get an equivalent model offset <- 2 * rep(with(subset(imdepi$stgrid, !duplicated(BLOCK)), stop - start), ncol(imdsts)) * sum(districtsD$POPULATION) * population(imdsts) ## fit the model using hhh4() fit_hhh4 <- hhh4(imdsts, control = list( end = list( f = addSeason2formula(~I(start/365-3.5), period=365, timevar="start"), offset = offset ), family = "Poisson", subset = 1:nrow(imdsts), data = list(start=with(subset(imdepi$stgrid, !duplicated(BLOCK)), start)))) summary(fit_hhh4) stopifnot(all.equal(coef(fit_hhh4), coef(fit_glm), check.attributes=FALSE))
data("imdepi", "imdepifit") ## Fit an endemic-only twinstim() and an equivalent model wrapped in glm() fit_twinstim <- update(imdepifit, epidemic = ~0, siaf = NULL, subset = NULL, optim.args=list(control=list(trace=0)), verbose=FALSE) fit_glm <- glm_epidataCS(formula(fit_twinstim)$endemic, data = imdepi) ## Compare the coefficients cbind(twinstim = coef(fit_twinstim), glm = coef(fit_glm)) ### also compare to an equivalent endemic-only hhh4() fit ## first need to aggregate imdepi into an "sts" object load(system.file("shapes", "districtsD.RData", package="surveillance")) imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, tiles = districtsD, popcol.stgrid = "popdensity") ## determine the correct offset to get an equivalent model offset <- 2 * rep(with(subset(imdepi$stgrid, !duplicated(BLOCK)), stop - start), ncol(imdsts)) * sum(districtsD$POPULATION) * population(imdsts) ## fit the model using hhh4() fit_hhh4 <- hhh4(imdsts, control = list( end = list( f = addSeason2formula(~I(start/365-3.5), period=365, timevar="start"), offset = offset ), family = "Poisson", subset = 1:nrow(imdsts), data = list(start=with(subset(imdepi$stgrid, !duplicated(BLOCK)), start)))) summary(fit_hhh4) stopifnot(all.equal(coef(fit_hhh4), coef(fit_glm), check.attributes=FALSE))
Number of Hepatitis A cases among adult (age>18) males in Berlin, 2001-2006. An increase is seen during 2006.
data("ha") data("ha.sts")
data("ha") data("ha.sts")
ha
is a disProg
object containing
observations starting from week 1 in 2001 to week 30 in 2006.
ha.sts
was generated from ha
via the converter function
disProg2sts
and includes a map of Berlin's districts.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 25 August 2006.
Robert Koch Institut, Epidemiologisches Bulletin 33/2006, p.290.
## deprecated "disProg" object data("ha") ha plot(aggregate(ha)) ## new-style "sts" object data("ha.sts") ha.sts plot(ha.sts, type = observed ~ time) # = plot(aggregate(ha.sts, by = "unit")) plot(ha.sts, type = observed ~ unit, labels = TRUE)
## deprecated "disProg" object data("ha") ha plot(aggregate(ha)) ## new-style "sts" object data("ha.sts") ha.sts plot(ha.sts, type = observed ~ time) # = plot(aggregate(ha.sts, by = "unit")) plot(ha.sts, type = observed ~ unit, labels = TRUE)
Data on the 188 cases in the measles outbreak among children in the
German city of Hagelloch (near Tübingen) 1861. The data were
originally collected by Dr. Albert Pfeilsticker (1863) and augmented and
re-analysed by Dr. Heike Oesterle (1992).
This dataset is used to illustrate the twinSIR
model class in
vignette("twinSIR")
.
data("hagelloch")
data("hagelloch")
Loading data("hagelloch")
gives two objects:
hagelloch
and hagelloch.df
.
The latter is the original data.frame
of 188 rows
with individual information for each infected child.
hagelloch
has been generated from hagelloch.df
via as.epidata
(see the Examples below) to obtain an
"epidata"
object for use with twinSIR
.
It contains the entire SIR event history of the outbreak
(but not all of the covariates).
The covariate information in hagelloch.df
is as follows:
patient number
patient name (as a factor)
family index
house number
age in years
gender of the individual (factor: male, female)
Date
of prodromes
Date
of rash
class (factor: preschool, 1st class, 2nd class)
Date
of death (with missings)
number of patient who is the putative source of infection (0 = unknown)
serial interval = number of days between dates of prodromes of infection source and infected person
complications (factor: no complications, bronchopneumonia, severe bronchitis, lobar pneumonia, pseudocroup, cerebral edema)
duration of prodromes in days
number of cases in family
number of initial cases
generation number of the case
day of max. fever (days after rush)
max. fever (degree Celsius)
x coordinate of house (in meters). Scaling in metres is obtained by multiplying the original coordinates by 2.5 (see details in Neal and Roberts (2004))
y coordinate of house (in meters). See also the above
description of x.loc
.
Time of prodromes (first symptoms) in days after the start of the epidemic (30 Oct 1861).
Time upon which the rash first appears.
Time of death, if available.
Time at which the infectious period of the individual is assumed to end. This unknown time is calculated as
where – as in Section 3.1 of Neal and Roberts (2004) – we use
.
Time at which the individual is assumed to become infectious. Actually this time is unknown, but we use
where as in Neal and Roberts (2004).
The time variables describe the transitions of the individual in an Susceptible-Infectious-Recovered (SIR) model. Note that in order to avoid ties in the event times resulting from daily interval censoring, the times have been jittered uniformly within the respective day. The time point 0.5 would correspond to noon of 30 Oct 1861.
The hagelloch
"epidata"
object only retains some of
the above covariates to save space. Apart from the usual
"epidata"
event columns, hagelloch
contains a number of
extra variables representing distance- and covariate-based weights for
the force of infection:
the number of currently infectious children in the same household (including the child itself if it is currently infectious).
the number of currently infectious children outside the household.
the number of children infectious during the respective time block and being members of class 1 and 2, respectively; but the value is 0 if the individual of the row is not herself a member of the respective class.
Such epidemic covariates can been computed by specifying suitable
f
and w
arguments in as.epidata
at
conversion (see the code below), or at a later step via the
update
-method for "epidata"
.
Thanks to Peter J. Neal, University of Manchester, for providing us with these data, which he again became from Niels Becker, Australian National University. To cite the data, the main references are Pfeilsticker (1863) and Oesterle (1992).
Pfeilsticker, A. (1863). Beiträge zur Pathologie der Masern mit besonderer Berücksichtigung der statistischen Verhältnisse, M.D. Thesis, Eberhard-Karls-Universität Tübingen. Available as https://archive.org/details/beitrgezurpatho00pfeigoog.
Oesterle, H. (1992). Statistische Reanalyse einer Masernepidemie 1861 in Hagelloch, M.D. Thesis, Eberhard-Karls-Universitäat Tübingen.
Neal, P. J. and Roberts, G. O (2004). Statistical inference and model selection for the 1861 Hagelloch measles epidemic, Biostatistics 5(2):249-261
data class: epidata
point process model: twinSIR
illustration with hagelloch
: vignette("twinSIR")
data("hagelloch") head(hagelloch.df) # original data documented in Oesterle (1992) head(as.data.frame(hagelloch)) # "epidata" event history format ## How the "epidata" 'hagelloch' was created from 'hagelloch.df' stopifnot(all.equal(hagelloch, as.epidata( hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ), keep.cols = c("SEX", "AGE", "CL")) )) ### Basic plots produced from hagelloch.df # Show case locations as in Neal & Roberts (different scaling) using # the data.frame (promoted to a SpatialPointsDataFrame) coordinates(hagelloch.df) <- c("x.loc","y.loc") plot(hagelloch.df, xlab="x [m]", ylab="x [m]", pch=15, axes=TRUE, cex=sqrt(multiplicity(hagelloch.df))) # Epicurve hist(as.numeric(hagelloch.df$tI), xlab="Time (days)", ylab="Cases", main="") ### "epidata" summary and plot methods (s <- summary(hagelloch)) head(s$byID) plot(s) ## Not run: # Show a dynamic illustration of the spread of the infection animate(hagelloch, time.spacing=0.1, sleep=1/100, legend.opts=list(x="topleft")) ## End(Not run)
data("hagelloch") head(hagelloch.df) # original data documented in Oesterle (1992) head(as.data.frame(hagelloch)) # "epidata" event history format ## How the "epidata" 'hagelloch' was created from 'hagelloch.df' stopifnot(all.equal(hagelloch, as.epidata( hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ), keep.cols = c("SEX", "AGE", "CL")) )) ### Basic plots produced from hagelloch.df # Show case locations as in Neal & Roberts (different scaling) using # the data.frame (promoted to a SpatialPointsDataFrame) coordinates(hagelloch.df) <- c("x.loc","y.loc") plot(hagelloch.df, xlab="x [m]", ylab="x [m]", pch=15, axes=TRUE, cex=sqrt(multiplicity(hagelloch.df))) # Epicurve hist(as.numeric(hagelloch.df$tI), xlab="Time (days)", ylab="Cases", main="") ### "epidata" summary and plot methods (s <- summary(hagelloch)) head(s$byID) plot(s) ## Not run: # Show a dynamic illustration of the spread of the infection animate(hagelloch, time.spacing=0.1, sleep=1/100, legend.opts=list(x="topleft")) ## End(Not run)
Weekly number of reported hepatitis A infections in Germany 2001-2004.
data(hepatitisA)
data(hepatitisA)
A disProg
object containing
observations starting from week 1 in 2001 to week 52 in 2004.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 11-01-2005.
data(hepatitisA) plot(hepatitisA)
data(hepatitisA) plot(hepatitisA)
Fits an autoregressive Poisson or negative binomial model
to a univariate or multivariate time series of counts.
The characteristic feature of hhh4
models is the additive
decomposition of the conditional mean into epidemic and
endemic components (Held et al, 2005).
Log-linear predictors of covariates and random intercepts are allowed
in all components; see the Details below.
A general introduction to the hhh4
modelling approach and its
implementation is given in the vignette("hhh4")
. Meyer et al
(2017, Section 5, available as vignette("hhh4_spacetime")
)
describe hhh4
models for areal time series of infectious
disease counts.
hhh4(stsObj, control = list( ar = list(f = ~ -1, offset = 1, lag = 1), ne = list(f = ~ -1, offset = 1, lag = 1, weights = neighbourhood(stsObj) == 1, scale = NULL, normalize = FALSE), end = list(f = ~ 1, offset = 1), family = c("Poisson", "NegBin1", "NegBinM"), subset = 2:nrow(stsObj), optimizer = list(stop = list(tol=1e-5, niter=100), regression = list(method="nlminb"), variance = list(method="nlminb")), verbose = FALSE, start = list(fixed=NULL, random=NULL, sd.corr=NULL), data = list(t = stsObj@epoch - min(stsObj@epoch)), keep.terms = FALSE ), check.analyticals = FALSE)
hhh4(stsObj, control = list( ar = list(f = ~ -1, offset = 1, lag = 1), ne = list(f = ~ -1, offset = 1, lag = 1, weights = neighbourhood(stsObj) == 1, scale = NULL, normalize = FALSE), end = list(f = ~ 1, offset = 1), family = c("Poisson", "NegBin1", "NegBinM"), subset = 2:nrow(stsObj), optimizer = list(stop = list(tol=1e-5, niter=100), regression = list(method="nlminb"), variance = list(method="nlminb")), verbose = FALSE, start = list(fixed=NULL, random=NULL, sd.corr=NULL), data = list(t = stsObj@epoch - min(stsObj@epoch)), keep.terms = FALSE ), check.analyticals = FALSE)
stsObj |
object of class |
control |
a list containing the model specification and control arguments:
The auxiliary function |
check.analyticals |
logical (or a subset of
|
An endemic-epidemic multivariate time-series model for infectious
disease counts from units
during
periods
was proposed by Held et al (2005) and was
later extended in a series of papers (Paul et al, 2008; Paul and Held,
2011; Held and Paul, 2012; Meyer and Held, 2014).
In its most general formulation, this so-called
hhh4
(or HHH or
or triple-H) model assumes that, conditional on past
observations,
has a Poisson or negative binomial
distribution with mean
In the case of a negative binomial model, the conditional
variance is
with overdispersion parameters
(possibly shared
across different units, e.g.,
).
Univariate time series of counts
are supported as well, in
which case
hhh4
can be regarded as an extension of
glm.nb
to account for autoregression.
See the Examples below for a comparison of an endemic-only
hhh4
model with a corresponding glm.nb
.
The three unknown quantities of the mean ,
in the autoregressive (
ar
) component,
in the neighbour-driven (
ne
) component, and
in the endemic (
end
) component,
are log-linear predictors incorporating time-/unit-specific
covariates. They may also contain unit-specific random intercepts
as proposed by Paul and Held (2011). The endemic mean is usually
modelled proportional to a unit-specific offset
(e.g., population numbers or fractions); it is possible to include
such multiplicative offsets in the epidemic components as well.
The
are transmission weights reflecting the flow of
infections from unit
to unit
. If weights vary over time
(prespecified as a 3-dimensional array
), the
ne
sum in the mean uses .
In spatial
hhh4
applications, the “units” refer to
geographical regions and the weights could be derived from movement
network data. Alternatively, the weights can be
estimated parametrically as a function of adjacency order (Meyer and
Held, 2014), see
W_powerlaw
.
(Penalized) Likelihood inference for such hhh4
models has been
established by Paul and Held (2011) with extensions for parametric
neighbourhood weights by Meyer and Held (2014).
Supplied with the analytical score function and Fisher information,
the function hhh4
by default uses the quasi-Newton algorithm
available through nlminb
to maximize the log-likelihood.
Convergence is usually fast even for a large number of parameters.
If the model contains random effects, the penalized and marginal
log-likelihoods are maximized alternately until convergence.
hhh4
returns an object of class "hhh4"
,
which is a list containing the following components:
coefficients |
named vector with estimated (regression) parameters of the model |
se |
estimated standard errors (for regression parameters) |
cov |
covariance matrix (for regression parameters) |
Sigma |
estimated variance-covariance matrix of random effects |
Sigma.orig |
estimated variance parameters on internal scale used for optimization |
Sigma.cov |
inverse of marginal Fisher information (on internal
scale), i.e., the asymptotic covariance matrix of |
call |
the matched call |
dim |
vector with number of fixed and random effects in the model |
loglikelihood |
(penalized) loglikelihood evaluated at the MLE |
margll |
(approximate) log marginal likelihood should the model contain random effects |
convergence |
logical. Did optimizer converge? |
fitted.values |
fitted mean values |
control |
control object of the fit |
terms |
the terms object used in the fit if |
stsObj |
the supplied |
lags |
named integer vector of length two containing the lags
used for the epidemic components |
nObs |
number of observations used for fitting the model |
nTime |
number of time points used for fitting the model |
nUnit |
number of units (e.g. areas) used for fitting the model |
runtime |
the |
Michaela Paul, Sebastian Meyer, Leonhard Held
Held, L., Höhle, M. and Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5 (3), 187-199. doi:10.1191/1471082X05st098oa
Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27 (29), 6250-6267. doi:10.1002/sim.4177
Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177
Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54 (6), 824-843. doi:10.1002/bimj.201200037
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
See the special functions fe
, ri
and the
examples below for how to specify unit-specific effects.
Further details on the modelling approach and illustrations of its
implementation can be found in vignette("hhh4")
and
vignette("hhh4_spacetime")
.
###################### ## Univariate examples ###################### ### weekly counts of salmonella agona cases, UK, 1990-1995 data("salmonella.agona") ## convert old "disProg" to new "sts" data class salmonella <- disProg2sts(salmonella.agona) salmonella plot(salmonella) ## generate formula for an (endemic) time trend and seasonality f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52) f.end ## specify a simple autoregressive negative binomial model model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit this model to the data res <- hhh4(salmonella, model1) ## summarize the model fit summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE) plot(res) plot(res, type = "season", components = "end") ### weekly counts of meningococcal infections, Germany, 2001-2006 data("influMen") fluMen <- disProg2sts(influMen) meningo <- fluMen[, "meningococcus"] meningo plot(meningo) ## again a simple autoregressive NegBin model with endemic seasonality meningoFit <- hhh4(stsObj = meningo, control = list( ar = list(f = ~1), end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)), family = "NegBin1" )) summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(meningoFit) plot(meningoFit, type = "season", components = "end") ######################## ## Multivariate examples ######################## ### bivariate analysis of influenza and meningococcal infections ### (see Paul et al, 2008) plot(fluMen, same.scale = FALSE) ## Fit a negative binomial model with ## - autoregressive component: disease-specific intercepts ## - neighbour-driven component: only transmission from flu to men ## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively ## - disease-specific overdispersion WfluMen <- neighbourhood(fluMen) WfluMen["meningococcus","influenza"] <- 0 WfluMen f.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)), S = c(3, 1), period = 52) f.end_fluMen fluMenFit <- hhh4(fluMen, control = list( ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)), ne = list(f = ~ 1, weights = WfluMen), end = list(f = f.end_fluMen), family = "NegBinM")) summary(fluMenFit, idx2Exp=1:3) plot(fluMenFit, type = "season", components = "end", unit = 1) plot(fluMenFit, type = "season", components = "end", unit = 2) ### weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") measlesWeserEms plot(measlesWeserEms) # note the two districts with zero cases ## we could fit the same simple model as for the salmonella cases above model1 <- list( ar = list(f = ~1), end = list(f = addSeason2formula(~1 + t, period = 52)), family = "NegBin1" ) measlesFit <- hhh4(measlesWeserEms, model1) summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) ## but we should probably at least use a population offset in the endemic ## component to reflect heterogeneous incidence levels of the districts, ## and account for spatial dependence (here just using first-order adjacency) measlesFit2 <- update(measlesFit, end = list(offset = population(measlesWeserEms)), ne = list(f = ~1, weights = neighbourhood(measlesWeserEms) == 1)) summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(measlesFit2, units = NULL, hide0s = TRUE) ## 'measlesFit2' corresponds to the 'measlesFit_basic' model in ## vignette("hhh4_spacetime"). See there for further analyses, ## including vaccination coverage as a covariate, ## spatial power-law weights, and random intercepts. ## Not run: ### last but not least, a more sophisticated (and time-consuming) ### analysis of weekly counts of influenza from 140 districts in ### Southern Germany (originally analysed by Paul and Held, 2011, ### and revisited by Held and Paul, 2012, and Meyer and Held, 2014) data("fluBYBW") plot(fluBYBW, type = observed ~ time) plot(fluBYBW, type = observed ~ unit, ## mean yearly incidence per 100.000 inhabitants (8 years) population = fluBYBW@map$X31_12_01 / 100000 * 8) ## For the full set of models for data("fluBYBW") as analysed by ## Paul and Held (2011), including predictive model assessement ## using proper scoring rules, see the (computer-intensive) ## demo("fluBYBW") script: demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance") demoscript #file.show(demoscript) ## Here we fit the improved power-law model of Meyer and Held (2014) ## - autoregressive component: random intercepts + S = 1 sine/cosine pair ## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair ## + population gravity with normalized power-law weights ## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs ## - random intercepts are iid but correlated between components f.S1 <- addSeason2formula( ~-1 + ri(type="iid", corr="all"), S = 1, period = 52) f.end.S3 <- addSeason2formula( ~-1 + ri(type="iid", corr="all") + I((t-208)/100), S = 3, period = 52) ## for power-law weights, we need adjaceny orders, which can be ## computed from the binary adjacency indicator matrix nbOrder1 <- neighbourhood(fluBYBW) neighbourhood(fluBYBW) <- nbOrder(nbOrder1) ## full model specification fluModel <- list( ar = list(f = f.S1), ne = list(f = update.formula(f.S1, ~ . + log(pop)), weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)), normalize = TRUE, log = TRUE)), end = list(f = f.end.S3, offset = population(fluBYBW)), family = "NegBin1", data = list(pop = population(fluBYBW)), optimizer = list(variance = list(method = "Nelder-Mead")), verbose = TRUE) ## CAVE: random effects considerably increase the runtime of model estimation ## (It is usually advantageous to first fit a model with simple intercepts ## to obtain reasonable start values for the other parameters.) set.seed(1) # because random intercepts are initialized randomly fluFit <- hhh4(fluBYBW, fluModel) summary(fluFit, idx2Exp = TRUE, amplitudeShift = TRUE) plot(fluFit, type = "fitted", total = TRUE) plot(fluFit, type = "season") range(plot(fluFit, type = "maxEV")) plot(fluFit, type = "maps", prop = TRUE) gridExtra::grid.arrange( grobs = lapply(c("ar", "ne", "end"), function (comp) plot(fluFit, type = "ri", component = comp, main = comp, exp = TRUE, sub = "multiplicative effect")), nrow = 1, ncol = 3) plot(fluFit, type = "neweights", xlab = "adjacency order") ## End(Not run) ######################################################################## ## An endemic-only "hhh4" model can also be estimated using MASS::glm.nb ######################################################################## ## weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") ## fit an endemic-only "hhh4" model ## with time covariates and a district-specific offset hhh4fit <- hhh4(measlesWeserEms, control = list( end = list(f = addSeason2formula(~1 + t, period = frequency(measlesWeserEms)), offset = population(measlesWeserEms)), ar = list(f = ~-1), ne = list(f = ~-1), family = "NegBin1", subset = 1:nrow(measlesWeserEms) )) summary(hhh4fit) ## fit the same model using MASS::glm.nb measlesWeserEmsData <- as.data.frame(measlesWeserEms, tidy = TRUE) measlesWeserEmsData$t <- c(hhh4fit$control$data$t) glmnbfit <- MASS::glm.nb( update(formula(hhh4fit)$end, observed ~ . + offset(log(population))), data = measlesWeserEmsData ) summary(glmnbfit) ## Note that the overdispersion parameter is parametrized inversely. ## The likelihood and point estimates are all the same. ## However, the variance estimates are different: in glm.nb, the parameters ## are estimated conditional on the overdispersion theta.
###################### ## Univariate examples ###################### ### weekly counts of salmonella agona cases, UK, 1990-1995 data("salmonella.agona") ## convert old "disProg" to new "sts" data class salmonella <- disProg2sts(salmonella.agona) salmonella plot(salmonella) ## generate formula for an (endemic) time trend and seasonality f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52) f.end ## specify a simple autoregressive negative binomial model model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit this model to the data res <- hhh4(salmonella, model1) ## summarize the model fit summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE) plot(res) plot(res, type = "season", components = "end") ### weekly counts of meningococcal infections, Germany, 2001-2006 data("influMen") fluMen <- disProg2sts(influMen) meningo <- fluMen[