Title: | Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena |
---|---|
Description: | Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>. |
Authors: | Michael Hoehle [aut, ths] , Sebastian Meyer [aut, cre] , Michaela Paul [aut], Leonhard Held [ctb, ths] , Howard Burkom [ctb], Thais Correa [ctb], Mathias Hofmann [ctb], Christian Lang [ctb], Juliane Manitz [ctb], Sophie Reichert [ctb], Andrea Riebler [ctb], Daniel Sabanes Bove [ctb], Maelle Salmon [ctb], Dirk Schumacher [ctb], Stefan Steiner [ctb], Mikko Virtanen [ctb], Wei Wei [ctb], Valentin Wimmer [ctb], R Core Team [ctb] (02zz1nj61, src/ks.c and a few code fragments of standard S3 methods) |
Maintainer: | Sebastian Meyer <[email protected]> |
License: | GPL-2 |
Version: | 1.24.1.9000 |
Built: | 2025-01-09 19:19:52 UTC |
Source: | https://github.com/r-forge/surveillance |
The R package surveillance implements statistical methods for the retrospective modeling and prospective monitoring of epidemic phenomena in temporal and spatio-temporal contexts. Focus is on (routinely collected) public health surveillance data, but the methods just as well apply to data from environmetrics, econometrics or the social sciences. As many of the monitoring methods rely on statistical process control methodology, the package is also relevant to quality control and reliability engineering.
The package implements many typical outbreak detection procedures such
as Stroup et al. (1989), Farrington et al. (1996), Rossi et al. (1999),
Rogerson and Yamada (2001), a Bayesian approach (Höhle, 2007),
negative binomial CUSUM methods (Höhle and Mazick, 2009), and a
detector based on generalized likelihood ratios (Höhle
and Paul, 2008), see wrap.algo
.
Also CUSUMs for the prospective change-point detection in binomial,
beta-binomial and multinomial time series are covered based on
generalized linear modeling, see categoricalCUSUM
.
This includes, e.g., paired comparison Bradley-Terry modeling described
in Höhle (2010), or paired binary CUSUM
(pairedbinCUSUM
) described by Steiner et al. (1999).
The package contains several real-world datasets, the ability
to simulate outbreak data, visualize the results of the monitoring in
temporal, spatial or spatio-temporal fashion. In dealing with time
series data, the fundamental data structure of the package is the S4
class sts
wrapping observations, monitoring results and
date handling for multivariate time series.
A recent overview of the available monitoring procedures is
given by Salmon et al. (2016).
For the retrospective analysis of epidemic spread, the package
provides three endemic-epidemic modeling frameworks with
tools for visualization, likelihood inference, and simulation.
The function hhh4
offers inference methods for the
(multivariate) count time series models of Held et al. (2005), Paul et
al. (2008), Paul and Held (2011), Held and Paul (2012), and Meyer and
Held (2014). See vignette("hhh4")
for a general introduction
and vignette("hhh4_spacetime")
for a discussion and
illustration of spatial hhh4
models.
Self-exciting point processes are modeled through endemic-epidemic
conditional intensity functions.
twinSIR
(Höhle, 2009) models the
susceptible-infectious-recovered (SIR) event history of a
fixed population, e.g, epidemics across farms or networks;
see vignette("twinSIR")
for an illustration.
twinstim
(Meyer et al., 2012) fits spatio-temporal point
process models to point patterns of infective events, e.g.,
time-stamped geo-referenced surveillance data on infectious disease
occurrence; see vignette("twinstim")
for an illustration.
A recent overview of the implemented space-time modeling frameworks
for epidemic phenomena is given by Meyer et al. (2017).
Substantial contributions of code by: Leonhard Held, Howard Burkom, Thais Correa, Mathias Hofmann, Christian Lang, Juliane Manitz, Sophie Reichert, Andrea Riebler, Daniel Sabanes Bove, Maelle Salmon, Dirk Schumacher, Stefan Steiner, Mikko Virtanen, Wei Wei, Valentin Wimmer.
Furthermore, the authors would like to thank the following people for ideas, discussions, testing and feedback: Doris Altmann, Johannes Bracher, Caterina De Bacco, Johannes Dreesman, Johannes Elias, Marc Geilhufe, Jim Hester, Kurt Hornik, Mayeul Kauffmann, Junyi Lu, Lore Merdrignac, Tim Pollington, Marcos Prates, André Victor Ribeiro Amaral, Brian D. Ripley, François Rousseu, Barry Rowlingson, Christopher W. Ryan, Klaus Stark, Yann Le Strat, André Michael Toschke, Wei Wei, George Wood, Achim Zeileis, Bing Zhang.
Michael Hoehle, Sebastian Meyer, Michaela Paul
Maintainer: Sebastian Meyer [email protected]
citation(package="surveillance")
gives the two main software
references for the modeling (Meyer et al., 2017) and the monitoring
(Salmon et al., 2016) functionalities:
Meyer S, Held L, Höhle M (2017). “Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance.” Journal of Statistical Software, 77(11), 1–55. doi:10.18637/jss.v077.i11.
Salmon M, Schumacher D, Höhle M (2016). “Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance.” Journal of Statistical Software, 70(10), 1–35. doi:10.18637/jss.v070.i10.
Further references are listed in surveillance:::REFERENCES
.
If you use the surveillance package in your own work, please do cite the corresponding publications.
https://surveillance.R-forge.R-project.org/
## Additional documentation and illustrations of the methods are ## available in the form of package vignettes and demo scripts: vignette(package = "surveillance") demo(package = "surveillance")
## Additional documentation and illustrations of the methods are ## available in the form of package vignettes and demo scripts: vignette(package = "surveillance") demo(package = "surveillance")
A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.
data(abattoir)
data(abattoir)
The object of class "sts"
contains an artificial data set
inspired by meat inspection data used by Danish Pig Production,
Denmark. For each week the number of pigs with positive audit reports
is recorded together with the total number of audits made that week.
Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.
data("abattoir") plot(abattoir) population(abattoir)
data("abattoir") plot(abattoir) population(abattoir)
"sts"
Objects
Add a nicely formatted x-axis to time series plots related to the
"sts"
class. This utility function is, e.g., used
by stsplot_time1
and plotHHH4_fitted1
.
addFormattedXAxis(x, epochsAsDate = FALSE, xaxis.tickFreq = list("%Q"=atChange), xaxis.labelFreq = xaxis.tickFreq, xaxis.labelFormat = "%G\n\n%OQ", ...)
addFormattedXAxis(x, epochsAsDate = FALSE, xaxis.tickFreq = list("%Q"=atChange), xaxis.labelFreq = xaxis.tickFreq, xaxis.labelFormat = "%G\n\n%OQ", ...)
x |
an object of class |
epochsAsDate |
a logical indicating if the old ( |
xaxis.labelFormat , xaxis.tickFreq , xaxis.labelFreq
|
see the details below. |
... |
further arguments passed to |
The setting epochsAsDate = TRUE
enables very flexible formatting of the x-axis and its
annotations using the xaxis.tickFreq
, xaxis.labelFreq
and xaxis.labelFormat
arguments. The first two are named lists containing
pairs with the name being a strftime
single
conversion specification and the second part is a function which based
on this conversion returns a subset of the rows in the sts
objects. The subsetting function has the following header:
function(x,xm1)
, where x
is a vector containing
the result of applying the conversion in name
to the epochs of
the sts
object and xm1
is the scalar result when
applying the conversion to the natural element just before the first
epoch. Please note that the input to the subsetting function is converted
using as.numeric
before calling the function. Hence, the
conversion specification needs to result in a string convertible to integer.
Three predefined subsetting functions exist:
atChange
, at2ndChange
and atMedian
, which
are used to make a tick at each (each 2nd for at2ndChange
)
change and at the median index computed on all having the same value,
respectively:
atChange <- function(x,xm1) which(diff(c(xm1,x)) != 0) at2ndChange <- function(x,xm1) which(diff(c(xm1,x) %/% 2) != 0) atMedian <- function(x,xm1) tapply(seq_along(x), INDEX=x, quantile, prob=0.5, type=3)
By defining own functions here, one can obtain an arbitrary degree of flexibility.
Finally, xaxis.labelFormat
is a strftime
compatible formatting string., e.g. the default value is
"%G\n\n%OQ"
, which means ISO year and quarter (in roman
letters) stacked on top of each other.
NULL
(invisibly). The function is called for its side effects.
Michael Höhle with contributions by Sebastian Meyer
the examples in stsplot_time1
and plotHHH4_fitted1
This function helps to construct a formula
object that
can be used in a call to hhh4
to model
seasonal variation via a sum of sine and cosine terms.
addSeason2formula(f = ~1, S = 1, period = 52, timevar = "t")
addSeason2formula(f = ~1, S = 1, period = 52, timevar = "t")
f |
formula that the seasonal terms should be added to,
defaults to an intercept |
S |
number of sine and cosine terms. If |
period |
period of the season, defaults to 52 for weekly data. |
timevar |
the time variable in the model. Defaults to |
The function adds the seasonal terms
for to an existing formula
f
.
Note the following equivalence when interpreting the coefficients of the seasonal terms:
with amplitude
and phase shift
.
The amplitude and phase shift can be obtained from a fitted
hhh4
model via coef(..., amplitudeShift = TRUE)
,
see coef.hhh4
.
Returns a formula
with the seasonal terms added and
its environment set to .GlobalEnv
.
Note that to use the resulting formula in hhh4
,
a time variable named as specified by the argument timevar
must
be available.
M. Paul, with contributions by S. Meyer
# add 2 sine/cosine terms to a model with intercept and linear trend addSeason2formula(f = ~ 1 + t, S = 2) # the same for monthly data addSeason2formula(f = ~ 1 + t, S = 2, period = 12) # different number of seasons for a bivariate time series addSeason2formula(f = ~ 1, S = c(3, 1), period = 52)
# add 2 sine/cosine terms to a model with intercept and linear trend addSeason2formula(f = ~ 1 + t, S = 2) # the same for monthly data addSeason2formula(f = ~ 1 + t, S = 2, period = 12) # different number of seasons for a bivariate time series addSeason2formula(f = ~ 1, S = c(3, 1), period = 52)
"sts"
Object Over Time or Across UnitsAggregate the matrix slots of an "sts"
object.
Either the time series is aggregated so a new sampling frequency of
nfreq
observations per year is obtained (i.e., as in
aggregate.ts
), or the aggregation is over all
columns (units).
## S4 method for signature 'sts' aggregate(x, by = "time", nfreq = "all", ...)
## S4 method for signature 'sts' aggregate(x, by = "time", nfreq = "all", ...)
x |
an object of class |
by |
a string being either |
nfreq |
new sampling frequency for |
... |
unused (argument of the generic). |
an object of class "sts"
.
Aggregation over units fills the upperbound slot with
NA
s and the map
slot is left as-is, but the object
cannot be plotted by unit any longer.
The populationFrac
slot is aggregated just like observed
.
Population fractions are recomputed if and only if x
is no
multinomialTS
and already contains population fractions.
This might not be intended, especially for aggregation over time.
data("ha.sts") dim(ha.sts) dim(aggregate(ha.sts, by = "unit")) dim(aggregate(ha.sts, nfreq = 13))
data("ha.sts") dim(ha.sts) dim(aggregate(ha.sts, by = "unit")) dim(aggregate(ha.sts, nfreq = 13))
Evaluation of timepoints with the Bayes subsystem 1, 2, 3 or a self defined Bayes subsystem.
algo.bayesLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes(disProgObj, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes1(disProgObj, control = list(range = range)) algo.bayes2(disProgObj, control = list(range = range)) algo.bayes3(disProgObj, control = list(range = range))
algo.bayesLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes(disProgObj, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05)) algo.bayes1(disProgObj, control = list(range = range)) algo.bayes2(disProgObj, control = list(range = range)) algo.bayes3(disProgObj, control = list(range = range))
disProgObj |
object of class disProg (including the observed and the state chain) |
timePoint |
time point which should be evaluated in
|
control |
control object: |
Using the reference values the quantile of the
predictive posterior distribution is calculated as a threshold.
An alarm is given if the actual value is bigger or equal than this threshold.
It is possible to show using analytical computations that the predictive
posterior in this case is the negative
binomial distribution. Note:
algo.rki
or algo.farrington
use two-sided prediction intervals – if one wants to compare with
these procedures it is necessary to use an alpha, which is half the
one used for these procedures.
Note also that algo.bayes
calls
algo.bayesLatestTimepoint
for the values specified in
range
and for the system specified in control
.
algo.bayes1
, algo.bayes2
, algo.bayes3
call
algo.bayesLatestTimepoint
for the values specified in
range
for the Bayes 1 system, Bayes 2 system or Bayes 3 system.
"Bayes 1"
reference values from 6 weeks. Alpha is fixed a
t 0.05.
"Bayes 2"
reference values from 6 weeks ago and
13 weeks of the previous year (symmetrical around the
same week as the current one in the previous year). Alpha is fixed at 0.05.
"Bayes 3"
18 reference values. 9 from the year ago
and 9 from two years ago (also symmetrical around the
comparable week). Alpha is fixed at 0.05.
The procedure is now able to handle NA
's in the reference
values. In the summation and when counting the number of observed
reference values these are simply not counted.
survRes |
|
M. Höhle, A. Riebler, C. Lang
Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor's thesis.
algo.call
, algo.rkiLatestTimepoint
and algo.rki
for
the RKI system.
disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test for bayes 1 the latest timepoint algo.bayesLatestTimepoint(disProg) # Test week 200 to 208 for outbreaks with a selfdefined bayes algo.bayes(disProg, control = list(range = 200:208, b = 1, w = 5, actY = TRUE,alpha=0.05)) # The same for bayes 1 to bayes 3 algo.bayes1(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes2(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes3(disProg, control = list(range = 200:208,alpha=0.05))
disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test for bayes 1 the latest timepoint algo.bayesLatestTimepoint(disProg) # Test week 200 to 208 for outbreaks with a selfdefined bayes algo.bayes(disProg, control = list(range = 200:208, b = 1, w = 5, actY = TRUE,alpha=0.05)) # The same for bayes 1 to bayes 3 algo.bayes1(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes2(disProg, control = list(range = 200:208,alpha=0.05)) algo.bayes3(disProg, control = list(range = 200:208,alpha=0.05))
Transmission of a object of class disProg to the specified surveillance algorithm.
algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki", range = range, b = 2, w = 4, actY = TRUE), list(funcName = "rki", range = range, b = 2, w = 5, actY = TRUE)))
algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki", range = range, b = 2, w = 4, actY = TRUE), list(funcName = "rki", range = range, b = 2, w = 5, actY = TRUE)))
disProgObj |
object of class disProg, which includes the state chain and the observed |
control |
specifies which surveillance algorithm should be used with their parameters.
The parameter |
a list of survRes objects generated by the specified surveillance algorithm
algo.rki
, algo.bayes
, algo.farrington
# Create a test object disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProg, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) # show selected survRes objects names(survRes) plot(survRes[["rki(6,6,0)"]]) survRes[["bayes(5,5,1)"]]
# Create a test object disProg <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProg, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) # show selected survRes objects names(survRes) plot(survRes[["rki(6,6,0)"]]) survRes[["bayes(5,5,1)"]]
Surveillance using the CDC Algorithm
algo.cdcLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 5, m = 1, alpha=0.025)) algo.cdc(disProgObj, control = list(range = range, b= 5, m=1, alpha = 0.025))
algo.cdcLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 5, m = 1, alpha=0.025)) algo.cdc(disProgObj, control = list(range = range, b= 5, m=1, alpha = 0.025))
disProgObj |
object of class disProg (including the observed and the state chain). |
timePoint |
time point which should be evaluated in |
control |
control object: |
Using the reference values for calculating an upper limit, alarm is
given if the actual value is bigger than a computed threshold.
algo.cdc
calls algo.cdcLatestTimepoint
for the values
specified in range
and for the system specified in
control
. The threshold is calculated from the predictive
distribution, i.e.
which corresponds to Equation 8-1 in Farrington and Andrews (2003).
Note that an aggregation into 4-week blocks occurs in
algo.cdcLatestTimepoint
and m
denotes number of 4-week
blocks (months) to use as reference values. This function currently
does the same for monthly data (not correct!)
algo.cdcLatestTimepoint
returns a list of class survRes
(surveillance result), which
includes the alarm value (alarm = 1, no alarm = 0) for recognizing an
outbreak, the threshold value for recognizing the alarm and
the input object of class disProg.
algo.cdc
gives a list of class survRes
which
includes the vector of alarm values for every timepoint in
range
, the vector of threshold values for every timepoint
in range
for the system specified by b
, w
,
the range and the input object of class disProg.
M. Höhle
Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurrence of notifiable diseases surveillance data. Statistics in Medicine 8, 323–329. doi:10.1002/sim.4780080312
Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.
algo.rkiLatestTimepoint
,algo.bayesLatestTimepoint
and algo.bayes
for the Bayes system.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 500, A = 1,alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined cdc algo.cdc(disProgObj, control = list(range = 400:500,alpha=0.025))
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 500, A = 1,alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined cdc algo.cdc(disProgObj, control = list(range = 400:500,alpha=0.025))
Comparison of specified surveillance algorithms using quality values.
algo.compare(survResList)
algo.compare(survResList)
survResList |
a list of survRes objects to compare via quality values. |
Matrix with values from algo.quality
, i.e. quality
values for every surveillance algorithm found in survResults
.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) algo.compare(survRes)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from any methods in range = 200:400 range <- 200:400 survRes <- algo.call(disProgObj, control = list( list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range), list(funcName = "rki", range = range, b = 3, w = 2, actY = FALSE), list(funcName = "rki", range = range, b = 2, w = 9, actY = TRUE), list(funcName = "bayes1", range = range), list(funcName = "bayes2", range = range), list(funcName = "bayes3", range = range), list(funcName = "bayes", range = range, b = 1, w = 5, actY = TRUE,alpha=0.05) )) algo.compare(survRes)
Approximate one-side CUSUM method for a Poisson variate based on the cumulative sum of the deviation between a reference value k and the transformed observed values. An alarm is raised if the cumulative sum equals or exceeds a prespecified decision boundary h. The function can handle time varying expectations.
algo.cusum(disProgObj, control = list(range = range, k = 1.04, h = 2.26, m = NULL, trans = "standard", alpha = NULL))
algo.cusum(disProgObj, control = list(range = range, k = 1.04, h = 2.26, m = NULL, trans = "standard", alpha = NULL))
disProgObj |
object of class disProg (including the observed and the state chain) |
control |
control object:
|
algo.cusum
gives a list of class "survRes"
which includes the
vector of alarm values for every timepoint in range
and the vector
of cumulative sums for every timepoint in range
for the system
specified by k
and h
, the range and the input object of
class "disProg"
.
The upperbound
entry shows for each time instance the number of diseased individuals
it would have taken the cusum to signal. Once the CUSUM signals no resetting is applied, i.e.
signals occurs until the CUSUM statistic again returns below the threshold.
In case control$m="glm"
was used, the returned
control$m.glm
entry contains the fitted "glm"
object.
This implementation is experimental, but will not be developed further.
M. Paul and M. Höhle
G. Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122
D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986
# Xi ~ Po(5), i=1,...,500 set.seed(321) stsObj <- sts(observed = rpois(500,lambda=5)) # there should be no alarms as mean doesn't change res <- cusum(stsObj, control = list(range = 100:500, trans = "anscombe")) plot(res, xaxis.labelFormat = NULL) # simulated data disProgObj <- sim.pointSource(p = 1, r = 1, length = 250, A = 0, alpha = log(5), beta = 0, phi = 10, frequency = 10, state = NULL, K = 0) plot(disProgObj) # Test weeks 200 to 250 for outbreaks surv0 <- algo.cusum(disProgObj, control = list(range = 200:250)) plot(surv0, xaxis.years = FALSE) # alternatively, using the newer "sts" interface stsObj <- disProg2sts(disProgObj) surv <- cusum(stsObj, control = list(range = 200:250)) plot(surv) stopifnot(upperbound(surv) == surv0$upperbound)
# Xi ~ Po(5), i=1,...,500 set.seed(321) stsObj <- sts(observed = rpois(500,lambda=5)) # there should be no alarms as mean doesn't change res <- cusum(stsObj, control = list(range = 100:500, trans = "anscombe")) plot(res, xaxis.labelFormat = NULL) # simulated data disProgObj <- sim.pointSource(p = 1, r = 1, length = 250, A = 0, alpha = log(5), beta = 0, phi = 10, frequency = 10, state = NULL, K = 0) plot(disProgObj) # Test weeks 200 to 250 for outbreaks surv0 <- algo.cusum(disProgObj, control = list(range = 200:250)) plot(surv0, xaxis.years = FALSE) # alternatively, using the newer "sts" interface stsObj <- disProg2sts(disProgObj) surv <- cusum(stsObj, control = list(range = 200:250)) plot(surv) stopifnot(upperbound(surv) == surv0$upperbound)
Implements the procedure of Farrington et al. (1996).
At each time point of the specified range
, a GLM is fitted to
predict the counts. This is then compared to the observed
counts. If the observation is above a specific quantile of
the prediction interval, then an alarm is raised.
# original interface for a single "disProg" time series algo.farrington(disProgObj, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, plot=FALSE, alpha=0.05, trend=TRUE, limit54=c(5,4), powertrans="2/3", fitFun="algo.farrington.fitGLM.fast")) # wrapper for "sts" data, possibly multivariate farrington(sts, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, alpha=0.05), ...)
# original interface for a single "disProg" time series algo.farrington(disProgObj, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, plot=FALSE, alpha=0.05, trend=TRUE, limit54=c(5,4), powertrans="2/3", fitFun="algo.farrington.fitGLM.fast")) # wrapper for "sts" data, possibly multivariate farrington(sts, control=list( range=NULL, b=5, w=3, reweight=TRUE, verbose=FALSE, alpha=0.05), ...)
disProgObj |
an object of class |
control |
list of control parameters
|
sts |
an object of class |
... |
arguments for |
The following steps are performed according to the Farrington et al. (1996) paper.
fit of the initial model and initial estimation of mean and overdispersion.
calculation of the weights omega (correction for past outbreaks)
refitting of the model
revised estimation of overdispersion
rescaled model
omission of the trend, if it is not significant
repetition of the whole procedure
calculation of the threshold value
computation of exceedance score
For algo.farrington
, a list object of class "survRes"
with elements alarm
, upperbound
, trend
,
disProgObj
, and control
.
For farrington
, the input "sts"
object with updated
alarm
, upperbound
and control
slots, and subsetted
to control$range
.
M. Höhle
A statistical algorithm for the early detection of outbreaks of infectious disease, Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996), J. R. Statist. Soc. A, 159, 547-563.
algo.farrington.fitGLM
,
algo.farrington.threshold
An improved Farrington algorithm is available as function
farringtonFlexible
.
#load "disProg" data data("salmonella.agona") #Do surveillance for the last 42 weeks n <- length(salmonella.agona$observed) control <- list(b=4,w=3,range=(n-42):n,reweight=TRUE, verbose=FALSE,alpha=0.01) res <- algo.farrington(salmonella.agona,control=control) plot(res) #Generate Poisson counts and create an "sts" object set.seed(123) x <- rpois(520,lambda=1) stsObj <- sts(observed=x, frequency=52) if (surveillance.options("allExamples")) { #Compare timing of the two possible fitters for algo.farrington range <- 312:520 system.time( sts1 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM.fast"), verbose=FALSE)) system.time( sts2 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM"), verbose=FALSE)) #Check if results are the same stopifnot(upperbound(sts1) == upperbound(sts2)) }
#load "disProg" data data("salmonella.agona") #Do surveillance for the last 42 weeks n <- length(salmonella.agona$observed) control <- list(b=4,w=3,range=(n-42):n,reweight=TRUE, verbose=FALSE,alpha=0.01) res <- algo.farrington(salmonella.agona,control=control) plot(res) #Generate Poisson counts and create an "sts" object set.seed(123) x <- rpois(520,lambda=1) stsObj <- sts(observed=x, frequency=52) if (surveillance.options("allExamples")) { #Compare timing of the two possible fitters for algo.farrington range <- 312:520 system.time( sts1 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM.fast"), verbose=FALSE)) system.time( sts2 <- farrington(stsObj, control=list(range=range, fitFun="algo.farrington.fitGLM"), verbose=FALSE)) #Check if results are the same stopifnot(upperbound(sts1) == upperbound(sts2)) }
Weights are assigned according to the Anscombe residuals
algo.farrington.assign.weights(s, weightsThreshold=1)
algo.farrington.assign.weights(s, weightsThreshold=1)
s |
Vector of standardized Anscombe residuals |
weightsThreshold |
A scalar indicating when observations are seen as outlier. In the original Farrington proposal the value was 1 (default value), in the improved version this value is suggested to be 2.58. |
Weights according to the residuals
The function fits a Poisson regression model (GLM) with mean predictor
as specified by the Farrington procedure. If requested, Anscombe residuals are computed based on an initial fit and a 2nd fit is made using weights, where base counts suspected to be caused by earlier outbreaks are downweighted.
algo.farrington.fitGLM(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.fast(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.populationOffset(response, wtime, population, timeTrend=TRUE,reweight=TRUE, ...)
algo.farrington.fitGLM(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.fast(response, wtime, timeTrend = TRUE, reweight = TRUE, ...) algo.farrington.fitGLM.populationOffset(response, wtime, population, timeTrend=TRUE,reweight=TRUE, ...)
response |
The vector of observed base counts |
wtime |
Vector of week numbers corresponding to |
timeTrend |
Boolean whether to fit the |
reweight |
Fit twice – 2nd time with Anscombe residuals |
population |
Population size. Possibly used as offset, i.e. in
This provides a way to adjust the Farrington procedure to the case of greatly varying populations. Note: This is an experimental implementation with methodology not covered by the original paper. |
... |
Used to catch additional arguments, currently not used. |
Compute weights from an initial fit and rescale using
Anscombe based residuals as described in the
anscombe.residuals
function.
Note that algo.farrington.fitGLM
uses the glm
routine
for fitting. A faster alternative is provided by
algo.farrington.fitGLM.fast
which uses the glm.fit
function directly (thanks to Mikko Virtanen). This saves
computational overhead and increases speed for 500 monitored time
points by a factor of approximately two. However, some of the
routine glm
functions might not work on the output of this
function. Which function is used for algo.farrington
can be
controlled by the control$fitFun
argument.
an object of class GLM with additional fields wtime
,
response
and phi
. If the glm
returns without
convergence NULL
is returned.
anscombe.residuals
,algo.farrington
Depending on the current transformation ,
is used to compute a prediction interval. The prediction variance consists of a component due to the variance of having a single observation and a prediction variance.
algo.farrington.threshold(pred,phi,alpha=0.01,skewness.transform="none",y)
algo.farrington.threshold(pred,phi,alpha=0.01,skewness.transform="none",y)
pred |
A GLM prediction object |
phi |
Current overdispersion parameter (superfluous?) |
alpha |
Quantile level in Gaussian based CI, i.e. an |
skewness.transform |
Skewness correction, i.e. one of
|
y |
Observed number |
Vector of length four with lower and upper bounds of an
confidence interval (first two
arguments) and corresponding quantile of observation
y
together with the median of the predictive distribution.
Count data regression charts for the monitoring of surveillance time series as proposed by Höhle and Paul (2008). The implementation is described in Salmon et al. (2016).
algo.glrnb(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4)) algo.glrpois(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4))
algo.glrnb(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4)) algo.glrpois(disProgObj, control = list(range=range, c.ARL=5, mu0=NULL, Mtilde=1, M=-1, change="intercept", theta=NULL, dir=c("inc","dec"), ret=c("cases","value"), xMax=1e4))
disProgObj |
object of class |
control |
A list controlling the behaviour of the algorithm
|
This function implements the seasonal count data chart based on generalized likelihood ratio (GLR) as described in the Höhle and Paul (2008) paper. A moving-window generalized likelihood ratio detector is used, i.e. the detector has the form
where instead of the GLR statistic is
computed for all
. To
achieve the typical behaviour from
use
Mtilde=1
and M=-1
.
So is the time point where the GLR statistic is above the
threshold the first time: An alarm is given and the surveillance is
reset starting from time
. Note that the same
c.ARL
as before is used, but if mu0
is different at
compared to time
the run length
properties differ. Because
c.ARL
to obtain a specific ARL can
only be obtained my Monte Carlo simulation there is no good way to
update c.ARL
automatically at the moment. Also, FIR GLR-detectors
might be worth considering.
In case is.null(theta)
and alpha>0
as well as
ret="cases"
then a brute-force search is conducted for each time
point in range in order to determine the number of cases necessary
before an alarm is sounded. In case no alarm was sounded so far by time
, the function increases
until an alarm is sounded any
time before time point
. If no alarm is sounded by
xMax
, a return value
of 1e99 is given. Similarly, if an alarm was sounded by time the
function counts down instead. Note: This is slow experimental code!
At the moment, window limited “intercept
” charts have not been
extensively tested and are at the moment not supported. As speed is
not an issue here this doesn't bother too much. Therefore, a value of
M=-1
is always used in the intercept charts.
algo.glrpois
simply calls algo.glrnb
with
control$alpha
set to 0.
algo.glrnb
returns a list of class
survRes
(surveillance result), which includes the alarm
value for recognizing an outbreak (1 for alarm, 0 for no alarm),
the threshold value for recognizing the alarm and the input object
of class disProg. The upperbound
slot of the object are
filled with the current value or with the number of
cases that are necessary to produce an alarm at any time point
. Both lead to the same alarm timepoints, but
"cases"
has an obvious interpretation.
M. Höhle with contributions by V. Wimmer
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
##Simulate data and apply the algorithm S <- 1 ; t <- 1:120 ; m <- length(t) beta <- c(1.5,0.6,0.6) omega <- 2*pi/52 #log mu_{0,t} base <- beta[1] + beta[2] * cos(omega*t) + beta[3] * sin(omega*t) #Generate example data with changepoint and tau=tau tau <- 100 kappa <- 0.4 mu0 <- exp(base) mu1 <- exp(base + kappa) ## Poisson example #Generate data set.seed(42) x <- rpois(length(t),mu0*(exp(kappa)^(t>=tau))) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, change="intercept",ret="value",dir="inc") glr.ts <- glrpois(s.ts,control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) lr.ts <- glrpois(s.ts,control=c(cntrl,theta=0.4)) plot(lr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #using the legacy interface for "disProg" data lr.ts0 <- algo.glrpois(sts2disProg(s.ts), control=c(cntrl,theta=0.4)) stopifnot(upperbound(lr.ts) == lr.ts0$upperbound) ## NegBin example #Generate data set.seed(42) alpha <- 0.2 x <- rnbinom(length(t),mu=mu0*(exp(kappa)^(t>=tau)),size=1/alpha) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run GLR based detection cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="value",dir="inc") glr.ts <- glrnb(s.ts, control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #CUSUM LR detection with backcalculated number of cases cntrl2 = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="cases",dir="inc",theta=1.2) glr.ts2 <- glrnb(s.ts, control=cntrl2) plot(glr.ts2, xaxis.labelFormat=NULL)
##Simulate data and apply the algorithm S <- 1 ; t <- 1:120 ; m <- length(t) beta <- c(1.5,0.6,0.6) omega <- 2*pi/52 #log mu_{0,t} base <- beta[1] + beta[2] * cos(omega*t) + beta[3] * sin(omega*t) #Generate example data with changepoint and tau=tau tau <- 100 kappa <- 0.4 mu0 <- exp(base) mu1 <- exp(base + kappa) ## Poisson example #Generate data set.seed(42) x <- rpois(length(t),mu0*(exp(kappa)^(t>=tau))) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, change="intercept",ret="value",dir="inc") glr.ts <- glrpois(s.ts,control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) lr.ts <- glrpois(s.ts,control=c(cntrl,theta=0.4)) plot(lr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #using the legacy interface for "disProg" data lr.ts0 <- algo.glrpois(sts2disProg(s.ts), control=c(cntrl,theta=0.4)) stopifnot(upperbound(lr.ts) == lr.ts0$upperbound) ## NegBin example #Generate data set.seed(42) alpha <- 0.2 x <- rnbinom(length(t),mu=mu0*(exp(kappa)^(t>=tau)),size=1/alpha) s.ts <- sts(observed=x, state=(t>=tau)) #Plot the data plot(s.ts, xaxis.labelFormat=NULL) #Run GLR based detection cntrl = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="value",dir="inc") glr.ts <- glrnb(s.ts, control=cntrl) plot(glr.ts, xaxis.labelFormat=NULL, dx.upperbound=0.5) #CUSUM LR detection with backcalculated number of cases cntrl2 = list(range=t,c.ARL=5, Mtilde=1, mu0=mu0, alpha=alpha, change="intercept",ret="cases",dir="inc",theta=1.2) glr.ts2 <- glrnb(s.ts, control=cntrl2) plot(glr.ts2, xaxis.labelFormat=NULL)
This function implements on-line HMM detection of outbreaks based on
the retrospective procedure described in Le Strat and Carret (1999).
Using the function msm
(from package msm) a specified HMM
is estimated, the decoding problem, i.e. the most probable state
configuration, is found by the Viterbi algorithm and the most
probable state of the last observation is recorded. On-line
detection is performed by sequentially repeating this procedure.
Warning: This function can be very slow - a more efficient implementation would be nice!
algo.hmm(disProgObj, control = list(range=range, Mtilde=-1, noStates=2, trend=TRUE, noHarmonics=1, covEffectEqual=FALSE, saveHMMs = FALSE, extraMSMargs=list()))
algo.hmm(disProgObj, control = list(range=range, Mtilde=-1, noStates=2, trend=TRUE, noHarmonics=1, covEffectEqual=FALSE, saveHMMs = FALSE, extraMSMargs=list()))
disProgObj |
object of class disProg (including the observed and the state chain) |
control |
control object:
|
For each time point t the reference values values are extracted. If the number of requested values is larger than the number of possible values the latter is used. Now the following happens on these reference values:
A noStates
-State Hidden Markov Model (HMM) is used based on
the Poisson distribution with linear predictor on the log-link
scale. I.e.
where
and noHarmonics
and depending on the
sampling frequency of the surveillance data. In the above
is
used, because the first week is always saved as
t=1
, i.e. we
want to ensure that the first observation corresponds to cos(0) and
sin(0).
If covEffectEqual
then all covariate effects parameters are
equal for the states, i.e. for all
.
In case more complicated HMM models are to be fitted it is possible to
modify the msm
code used in this function. Using
e.g. AIC
one can select between different models (see the
msm package for further details).
Using the Viterbi algorithms the most probable state configuration
is obtained for the reference values and if the most probable
configuration for the last reference value (i.e. time t) equals
control$noOfStates
then an alarm is given.
Note: The HMM is re-fitted from scratch every time, sequential updating schemes of the HMM would increase speed considerably! A major advantage of the approach is that outbreaks in the reference values are handled automatically.
algo.hmm
gives a list of class survRes
which includes the
vector of alarm values for every timepoint in range
. No
upperbound
can be specified and is put equal to zero.
The resulting object contains a list control$hmms
, which
contains the "msm"
objects with the fitted HMMs
(if saveHMMs=TRUE
).
M. Höhle
Y. Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478
I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70
#Simulate outbreak data from HMM set.seed(123) counts <- sim.pointSource(p = 0.98, r = 0.8, length = 3*52, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.5) ## Not run: #Do surveillance using a two state HMM without trend component and #the effect of the harmonics being the same in both states. A sliding #window of two years is used to fit the HMM surv <- algo.hmm(counts, control=list(range=(2*52):length(counts$observed), Mtilde=2*52,noStates=2,trend=FALSE, covEffectsEqual=TRUE,extraMSMargs=list())) plot(surv,legend.opts=list(x="topright")) ## End(Not run) if (require("msm")) { #Retrospective use of the function, i.e. monitor only the last time point #but use option saveHMMs to store the output of the HMM fitting surv <- algo.hmm(counts,control=list(range=length(counts$observed),Mtilde=-1,noStates=2, trend=FALSE,covEffectsEqual=TRUE, saveHMMs=TRUE)) #Compute most probable state using the viterbi algorithm - 1 is "normal", 2 is "outbreak". viterbi.msm(surv$control$hmms[[1]])$fitted #How often correct? tab <- cbind(truth=counts$state + 1 , hmm=viterbi.msm(surv$control$hmm[[1]])$fitted) table(tab[,1],tab[,2]) }
#Simulate outbreak data from HMM set.seed(123) counts <- sim.pointSource(p = 0.98, r = 0.8, length = 3*52, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.5) ## Not run: #Do surveillance using a two state HMM without trend component and #the effect of the harmonics being the same in both states. A sliding #window of two years is used to fit the HMM surv <- algo.hmm(counts, control=list(range=(2*52):length(counts$observed), Mtilde=2*52,noStates=2,trend=FALSE, covEffectsEqual=TRUE,extraMSMargs=list())) plot(surv,legend.opts=list(x="topright")) ## End(Not run) if (require("msm")) { #Retrospective use of the function, i.e. monitor only the last time point #but use option saveHMMs to store the output of the HMM fitting surv <- algo.hmm(counts,control=list(range=length(counts$observed),Mtilde=-1,noStates=2, trend=FALSE,covEffectsEqual=TRUE, saveHMMs=TRUE)) #Compute most probable state using the viterbi algorithm - 1 is "normal", 2 is "outbreak". viterbi.msm(surv$control$hmms[[1]])$fitted #How often correct? tab <- cbind(truth=counts$state + 1 , hmm=viterbi.msm(surv$control$hmm[[1]])$fitted) table(tab[,1],tab[,2]) }
Frisen and Andersson (2009) method for semiparametric surveillance of outbreaks
algo.outbreakP(disProgObj, control = list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5))
algo.outbreakP(disProgObj, control = list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5))
disProgObj |
object of class disProg (including the observed and the state chain). |
control |
A list controlling the behaviour of the algorithm
|
A generalized likelihood ratio test based on the Poisson distribution is implemented where the means of the in-control and out-of-control states are computed by isotonic regression.
where is the estimated mean obtained by
uni-modal regression under the assumption of one change-point and
is the estimated result when there is no
change-point (i.e. this is just the mean of all observations). Note
that the contrasted hypothesis assume all means are equal until the
change-point, i.e. this detection method is especially suited for
detecting a shift from a relative constant mean. Hence, this is less
suited for detection in diseases with strong seasonal endemic
component. Onset of influenza detection is an example where this
method works particular well.
In case control$ret == "cases"
then a brute force numerical
search for the number needed before alarm (NNBA) is performed. That
is, given the past observations, what's the minimum number which would
have caused an alarm? Note: Computing this might take a while because
the search is done by sequentially increasing/decreasing the last
observation by one for each time point in control$range
and
then calling the workhorse function of the algorithm again. The argument
control$maxUpperboundCases
controls the upper limit of this
search (default is 1e5).
Currently, even though the statistic has passed the threshold, the NNBA
is still computed. After a few time instances what typically happens is
that no matter the observed value we would have an alarm at this time point. In this case the value of NNBA is set to NA
. Furthermore, the first time
point is always NA
, unless k<1
.
algo.outbreakP
gives a list of class survRes
which
includes the vector of alarm values for every time-point in
range
, the vector of threshold values for every time-point
in range
.
M. Höhle – based on Java code by M. Frisen and L. Schiöler
The code is an extended R port of the Java code by Marianne
Frisén and Linus Schiöler from the
Computer Assisted Search For Epidemics (CASE) project,
formerly available from https://case.folkhalsomyndigheten.se/
under the GNU GPL License v3.
An additional feature of the R code is that it contains a search for NNBA (see details).
Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.
Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.
#Use data from outbreakP manual (http://www.hgu.gu.se/item.aspx?id=16857) y <- matrix(c(1,0,3,1,2,3,5,4,7,3,5,8,16,23,33,34,48),ncol=1) #Generate sts object with these observations mysts <- sts(y, alarm=y*0) #Run the algorithm and present results #Only the value of outbreakP statistic upperbound(outbreakP(mysts, control=list(range=1:length(y),k=100, ret="value"))) #Graphical illustration with number-needed-before-alarm (NNBA) upperbound. res <- outbreakP(mysts, control=list(range=1:length(y),k=100, ret="cases")) plot(res,dx.upperbound=0,lwd=c(1,1,3),legend.opts=list(legend=c("Infected", "NNBA","Outbreak","Alarm"),horiz=TRUE))
#Use data from outbreakP manual (http://www.hgu.gu.se/item.aspx?id=16857) y <- matrix(c(1,0,3,1,2,3,5,4,7,3,5,8,16,23,33,34,48),ncol=1) #Generate sts object with these observations mysts <- sts(y, alarm=y*0) #Run the algorithm and present results #Only the value of outbreakP statistic upperbound(outbreakP(mysts, control=list(range=1:length(y),k=100, ret="value"))) #Graphical illustration with number-needed-before-alarm (NNBA) upperbound. res <- outbreakP(mysts, control=list(range=1:length(y),k=100, ret="cases")) plot(res,dx.upperbound=0,lwd=c(1,1,3),legend.opts=list(legend=c("Infected", "NNBA","Outbreak","Alarm"),horiz=TRUE))
Computation of the quality values for a surveillance system output.
algo.quality(sts, penalty = 20)
algo.quality(sts, penalty = 20)
sts |
object of class |
penalty |
the maximal penalty for the lag |
The lag is defined as follows:
In the state chain just the beginnings of an outbreak chain (outbreaks directly
following each other) are considered. In the alarm chain, the range from the beginning
of an outbreak until min(next outbreak beginning, penalty)
timepoints is considered. The penalty
timepoints were
chosen, to provide an upper bound on the penalty for not discovering
an outbreak. Now the difference between the first alarm by the system
and the defined beginning is denoted “the lag”.
Additionally outbreaks found by the system are not
punished. At the end, the mean of the lags for every outbreak chain is returned
as summary lag.
an object of class "algoQV"
, which is
a list of quality values:
TP |
Number of correct found outbreaks. |
FP |
Number of false found outbreaks. |
TN |
Number of correct found non outbreaks. |
FN |
Number of false found non outbreaks. |
sens |
True positive rate, meaning TP/(FN + TP). |
spec |
True negative rate, meaning TN/(TN + FP). |
dist |
Euclidean distance between (1-spec, sens) to (0,1). |
lag |
Lag of the outbreak recognizing by the system. |
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the list of quality values quality <- algo.quality(survResObj) quality # the list is printed in matrix form # Format as an "xtable", which is printed with LaTeX markup (by default) library("xtable") xtable(quality)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the list of quality values quality <- algo.quality(survResObj) quality # the list is printed in matrix form # Format as an "xtable", which is printed with LaTeX markup (by default) library("xtable") xtable(quality)
Evaluation of timepoints with the detection algorithms used by the RKI
algo.rkiLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 2, w = 4, actY = FALSE)) algo.rki(disProgObj, control = list(range = range, b = 2, w = 4, actY = FALSE)) algo.rki1(disProgObj, control = list(range = range)) algo.rki2(disProgObj, control = list(range = range)) algo.rki3(disProgObj, control = list(range = range))
algo.rkiLatestTimepoint(disProgObj, timePoint = NULL, control = list(b = 2, w = 4, actY = FALSE)) algo.rki(disProgObj, control = list(range = range, b = 2, w = 4, actY = FALSE)) algo.rki1(disProgObj, control = list(range = range)) algo.rki2(disProgObj, control = list(range = range)) algo.rki3(disProgObj, control = list(range = range))
disProgObj |
object of class disProg (including the observed and the state chain). |
timePoint |
time point which should be evaluated in |
control |
control object: |
Using the reference values for calculating an upper limit (threshold),
alarm is given if the actual value is bigger than a computed threshold.
algo.rki
calls algo.rkiLatestTimepoint
for the values specified
in range
and for the system specified in control
.
algo.rki1
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 1 system.
algo.rki2
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 2 system.
algo.rki3
calls algo.rkiLatestTimepoint
for the values specified
in range
for the RKI 3 system.
"RKI 1"
reference values from 6 weeks ago
"RKI 2"
reference values from 6 weeks ago and
13 weeks of the year ago (symmetrical around the
comparable week).
"RKI 3"
18 reference values. 9 from the year ago
and 9 from two years ago (also symmetrical around the
comparable week).
algo.rkiLatestTimepoint
returns a list of class survRes
(surveillance result), which
includes the alarm value (alarm = 1, no alarm = 0) for recognizing an
outbreak, the threshold value for recognizing the alarm and
the input object of class disProg.
algo.rki
gives a list of class survRes
which includes the vector
of alarm values for every timepoint in range
, the vector of threshold values
for every timepoint in range
for the system specified by b
, w
and
actY
, the range and the input object of class disProg.
algo.rki1
returns the same for the RKI 1 system, algo.rki2
for the RKI 2 system and algo.rki3
for the RKI 3 system.
M. Höhle, A. Riebler, Christian Lang
algo.bayesLatestTimepoint
and algo.bayes
for
the Bayes system.
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined rki algo.rki(disProgObj, control = list(range = 200:208, b = 1, w = 5, actY = TRUE)) # The same for rki 1 to rki 3 algo.rki1(disProgObj, control = list(range = 200:208)) algo.rki2(disProgObj, control = list(range = 200:208)) algo.rki3(disProgObj, control = list(range = 200:208)) # Test for rki 1 the latest timepoint algo.rkiLatestTimepoint(disProgObj)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Test week 200 to 208 for outbreaks with a selfdefined rki algo.rki(disProgObj, control = list(range = 200:208, b = 1, w = 5, actY = TRUE)) # The same for rki 1 to rki 3 algo.rki1(disProgObj, control = list(range = 200:208)) algo.rki2(disProgObj, control = list(range = 200:208)) algo.rki3(disProgObj, control = list(range = 200:208)) # Test for rki 1 the latest timepoint algo.rkiLatestTimepoint(disProgObj)
Modified Poisson CUSUM method that allows for a time-varying in-control parameter
as proposed by Rogerson and Yamada (2004). The
same approach can be applied to binomial data if
distribution="binomial"
is specified.
algo.rogerson(disProgObj, control = list(range = range, theta0t = NULL, ARL0 = NULL, s = NULL, hValues = NULL, distribution = c("poisson","binomial"), nt = NULL, FIR=FALSE, limit = NULL, digits = 1))
algo.rogerson(disProgObj, control = list(range = range, theta0t = NULL, ARL0 = NULL, s = NULL, hValues = NULL, distribution = c("poisson","binomial"), nt = NULL, FIR=FALSE, limit = NULL, digits = 1))
disProgObj |
object of class |
control |
list with elements
|
The CUSUM for a sequence of Poisson or binomial
variates is computed as
where and
;
and
are time-varying
reference values and decision intervals.
An alarm is given at time
if
.
If FIR=TRUE
, the CUSUM starts
with a head start value at time
.
After an alarm is given, the FIR CUSUM starts again at this head start value.
The procedure after the CUSUM gives an alarm can be determined by limit
.
Suppose that the CUSUM signals at time , i.e.
.
For numeric values of
limit
, the CUSUM is bounded
above after an alarm is given,
i.e. is set to
.
Using
limit
=0 corresponds to
resetting to zero after an alarm as proposed in the original
formulation of the CUSUM. If
FIR=TRUE
,
is reset to
(i.e.
limit
= ).
If
limit=NULL
, no resetting occurs after an alarm is given.
Returns an object of class survRes
with elements
alarm |
indicates whether the CUSUM signaled at time |
upperbound |
CUSUM values |
disProgObj |
|
control |
list with the alarm threshold |
algo.rogerson
is a univariate CUSUM method. If the data are
available in several regions (i.e. observed
is a matrix),
multiple univariate CUSUMs are applied to each region.
Rogerson, P. A. and Yamada, I. Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts. Morbidity and Mortality Weekly Report, 2004, 53/Supplement, 79-85
# simulate data (seasonal Poisson) set.seed(123) t <- 1:300 lambda <- exp(-0.5 + 0.4 * sin(2*pi*t/52) + 0.6 * cos(2*pi*t/52)) data <- sts(observed = rpois(length(lambda), lambda)) # determine a matrix with h values hVals <- hValues(theta0 = 10:150/100, ARL0=500, s = 1, distr = "poisson") # convert to legacy "disProg" class and apply modified Poisson CUSUM disProgObj <- sts2disProg(data) res <- algo.rogerson(disProgObj, control=c(hVals, list(theta0t=lambda, range=1:300))) plot(res, xaxis.years = FALSE)
# simulate data (seasonal Poisson) set.seed(123) t <- 1:300 lambda <- exp(-0.5 + 0.4 * sin(2*pi*t/52) + 0.6 * cos(2*pi*t/52)) data <- sts(observed = rpois(length(lambda), lambda)) # determine a matrix with h values hVals <- hValues(theta0 = 10:150/100, ARL0=500, s = 1, distr = "poisson") # convert to legacy "disProg" class and apply modified Poisson CUSUM disProgObj <- sts2disProg(data) res <- algo.rogerson(disProgObj, control=c(hVals, list(theta0t=lambda, range=1:300))) plot(res, xaxis.years = FALSE)
Summary table generation for several disease chains.
algo.summary(compMatrices)
algo.summary(compMatrices)
compMatrices |
list of matrices constructed by algo.compare. |
As lag the mean of all single lags is returned. TP values, FN values,
TN values and FP values are summed up. dist
, sens
and
spec
are new computed on the basis of the new TP value, FN value,
TN value and FP value.
a matrix summing up the singular input matrices
# Create a test object disProgObj1 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) disProgObj2 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 5) disProgObj3 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 17) # Let this object be tested from any methods in range = 200:400 range <- 200:400 control <- list(list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range)) compMatrix1 <- algo.compare(algo.call(disProgObj1, control=control)) compMatrix2 <- algo.compare(algo.call(disProgObj2, control=control)) compMatrix3 <- algo.compare(algo.call(disProgObj3, control=control)) algo.summary( list(a=compMatrix1, b=compMatrix2, c=compMatrix3) )
# Create a test object disProgObj1 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) disProgObj2 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 5) disProgObj3 <- sim.pointSource(p = 0.99, r = 0.5, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 17) # Let this object be tested from any methods in range = 200:400 range <- 200:400 control <- list(list(funcName = "rki1", range = range), list(funcName = "rki2", range = range), list(funcName = "rki3", range = range)) compMatrix1 <- algo.compare(algo.call(disProgObj1, control=control)) compMatrix2 <- algo.compare(algo.call(disProgObj2, control=control)) compMatrix3 <- algo.compare(algo.call(disProgObj3, control=control)) algo.summary( list(a=compMatrix1, b=compMatrix2, c=compMatrix3) )
Two model fits are compared using standard all.equal
-methods
after discarding certain elements considered irrelevant for the equality
of the fits, e.g., the runtime and the call.
## S3 method for class 'twinstim' all.equal(target, current, ..., ignore = NULL) ## S3 method for class 'hhh4' all.equal(target, current, ..., ignore = NULL)
## S3 method for class 'twinstim' all.equal(target, current, ..., ignore = NULL) ## S3 method for class 'hhh4' all.equal(target, current, ..., ignore = NULL)
target , current
|
the model fits to be compared. |
... |
further arguments for standard
|
ignore |
an optional character vector of elements to ignore when
comparing the two fitted objects. The following elements are always
ignored: |
Either TRUE
or a character vector describing differences between
the target
and the current
model fit.
Sebastian Meyer
Generic function for animation of R objects.
animate(object, ...)
animate(object, ...)
object |
The object to animate. |
... |
Arguments to be passed to methods, such as graphical parameters or time interval options for the snapshots. |
The methods animate.epidata
, animate.epidataCS
,
and animate.sts
for the animation of surveillance data.
Compute Anscombe residuals from a fitted glm
,
which makes them approximately standard normal distributed.
anscombe.residuals(m, phi)
anscombe.residuals(m, phi)
m |
a fitted |
phi |
the current estimated overdispersion |
The standardized Anscombe residuals of m
McCullagh & Nelder, Generalized Linear Models, 1989
Calculates the average run length (ARL) for an upward CUSUM scheme for discrete distributions (i.e. Poisson and binomial) using the Markov chain approach.
arlCusum(h=10, k=3, theta=2.4, distr=c("poisson","binomial"), W=NULL, digits=1, ...)
arlCusum(h=10, k=3, theta=2.4, distr=c("poisson","binomial"), W=NULL, digits=1, ...)
h |
decision interval |
k |
reference value |
theta |
distribution parameter for the cumulative distribution function
(cdf) |
distr |
|
W |
Winsorizing value |
digits |
|
... |
further arguments for the distribution function, i.e. number of trials |
Returns a list with the ARL of the regular (zero-start)
and the fast initial response (FIR)
CUSUM scheme with reference value k
, decision interval h
for
, where F is the Poisson or binomial CDF.
ARL |
one-sided ARL of the regular (zero-start) CUSUM scheme |
FIR.ARL |
one-sided ARL of the FIR CUSUM scheme with head start
|
Based on the FORTRAN code of
Hawkins, D. M. (1992). Evaluation of Average Run Lengths of Cumulative Sum Charts for an Arbitrary Data Distribution. Communications in Statistics - Simulation and Computation, 21(4), p. 1001-1020.
The function is an implementation of the non-parametric
back-projection of incidence cases to exposure cases described in
Becker et al. (1991). The method back-projects exposure times
from a univariate time series containing the number of symptom onsets per time
unit. Here, the delay between exposure and symptom onset for an
individual is seen as a realization of a random variable governed by a
known probability mass function.
The back-projection function calculates the expected number of exposures
for each time unit under the assumption of a
Poisson distribution, but without any parametric assumption on how the
evolve in time.
Furthermore, the function contains a bootstrap based procedure, as
given in Yip et al (2011), which allows an indication of uncertainty
in the estimated . The procedure is
equivalent to the suggestion in Becker and Marschner (1993). However,
the present implementation in
backprojNP
allows only a
univariate time series, i.e. simultaneous age groups as in Becker and
Marschner (1993) are not possible.
The method in Becker et al. (1991) was originally developed for the back-projection of AIDS incidence, but it is equally useful for analysing the epidemic curve in outbreak situations of a disease with long incubation time, e.g. in order to qualitatively investigate the effect of intervention measures.
backprojNP(sts, incu.pmf, control = list(k = 2, eps = rep(0.005,2), iter.max=rep(250,2), Tmark = nrow(sts), B = -1, alpha = 0.05, verbose = FALSE, lambda0 = NULL, eq3a.method = c("R","C"), hookFun = function(stsbp) {}), ...)
backprojNP(sts, incu.pmf, control = list(k = 2, eps = rep(0.005,2), iter.max=rep(250,2), Tmark = nrow(sts), B = -1, alpha = 0.05, verbose = FALSE, lambda0 = NULL, eq3a.method = c("R","C"), hookFun = function(stsbp) {}), ...)
sts |
an object of class |
incu.pmf |
Probability mass function (PMF) of the incubation
time. The PMF is specified as a vector or matrix with the value of
the PMF evaluated at |
control |
A list with named arguments controlling the functionality of the non-parametric back-projection.
|
... |
Additional arguments are sent to the hook function. |
Becker et al. (1991) specify a non-parametric back-projection algorithm based on the Expectation-Maximization-Smoothing (EMS) algorithm.
In the present implementation the algorithm iterates until
This is a slight adaptation of the proposals in Becker et
al. (1991). If is the length of
then one can
avoid instability of the algorithm near the end by considering only
the
's with index
.
See the references for further information.
backprojNP
returns an object of "stsBP"
.
The method is still experimental. A proper plot routine for
stsBP
objects is currently missing.
Michael Höhle with help by
Daniel Sabanés Bové
and Sebastian Meyer for eq3a.method = "C"
Becker NG, Watson LF and Carlin JB (1991), A method for non-parametric back-projection and its application to AIDS data, Statistics in Medicine, 10:1527-1542.
Becker NG and Marschner IC (1993), A method for estimating the age-specific relative risk of HIV infection from AIDS incidence data, Biometrika, 80(1):165-178.
Yip PSF, Lam KF, Xu Y, Chau PH, Xu J, Chang W, Peng Y, Liu Z, Xie X and Lau HY (2011), Reconstruction of the Infection Curve for SARS Epidemic in Beijing, China Using a Back-Projection Method, Communications in Statistics - Simulation and Computation, 37(2):425-433.
Associations of Age and Sex on Clinical Outcome and Incubation Period of Shiga toxin-producing Escherichia coli O104:H4 Infections, 2011 (2013), Werber D, King LA, Müller L, Follin P, Buchholz U, Bernard H, Rosner BM, Ethelberg S, de Valk H, Höhle M, American Journal of Epidemiology, 178(6):984-992.
#Generate an artificial outbreak of size n starting at time t0 and being of length n <- 1e3 ; t0 <- 23 ; l <- 10 #PMF of the incubation time is an interval censored gamma distribution #with mean 15 truncated at 25. dmax <- 25 inc.pmf <- c(0,(pgamma(1:dmax,15,1.4) - pgamma(0:(dmax-1),15,1.4))/pgamma(dmax,15,1.4)) #Function to sample from the incubation time rincu <- function(n) { sample(0:dmax, size=n, replace=TRUE, prob=inc.pmf) } #Sample time of exposure and length of incubation time set.seed(123) exposureTimes <- t0 + sample(x=0:(l-1),size=n,replace=TRUE) symptomTimes <- exposureTimes + rincu(n) #Time series of exposure (truth) and symptom onset (observed) X <- table( factor(exposureTimes,levels=1:(max(symptomTimes)+dmax))) Y <- table( factor(symptomTimes,levels=1:(max(symptomTimes)+dmax))) #Convert Y to an sts object Ysts <- sts(Y) #Plot the outbreak plot(Ysts, xaxis.labelFormat=NULL, legend=NULL) #Add true number of exposures to the plot lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Helper function to show the EM step plotIt <- function(cur.sts) { plot(cur.sts,xaxis.labelFormat=NULL, legend.opts=NULL,ylim=c(0,140)) } #Call non-parametric back-projection function with hook function but #without bootstrapped confidence intervals bpnp.control <- list(k=0,eps=rep(0.005,2),iter.max=rep(250,2),B=-1,hookFun=plotIt,verbose=TRUE) #Fast C version (use argument: eq3a.method="C")! sts.bp <- backprojNP(Ysts, incu.pmf=inc.pmf, control=modifyList(bpnp.control,list(eq3a.method="C")), ylim=c(0,max(X,Y))) #Show result plot(sts.bp,xaxis.labelFormat=NULL,legend=NULL,lwd=c(1,1,2),lty=c(1,1,1),main="") lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Do the convolution for the expectation mu <- matrix(0,ncol=ncol(sts.bp),nrow=nrow(sts.bp)) #Loop over all series for (j in 1:ncol(sts.bp)) { #Loop over all time points for (t in 1:nrow(sts.bp)) { #Convolution, note support of inc.pmf starts at zero (move idx by 1) i <- seq_len(t) mu[t,j] <- sum(inc.pmf[t-i+1] * upperbound(sts.bp)[i,j],na.rm=TRUE) } } #Show the fit lines(1:nrow(sts.bp)-0.5,mu[,1],col="green",type="s",lwd=3) #Non-parametric back-projection including bootstrap CIs bpnp.control2 <- modifyList(bpnp.control, list(hookFun=NULL, k=2, B=10, # in practice, use B >= 1000 ! eq3a.method="C")) sts.bp2 <- backprojNP(Ysts, incu.pmf=inc.pmf, control=bpnp.control2) ###################################################################### # Plot the result. This is currently a manual routine. # ToDo: Need to specify a plot method for stsBP objects which also # shows the CI. # # Parameters: # stsBP - object of class stsBP which is to be plotted. ###################################################################### plot.stsBP <- function(stsBP) { maxy <- max(observed(stsBP),upperbound(stsBP),stsBP@ci,na.rm=TRUE) plot(upperbound(stsBP),type="n",ylim=c(0,maxy), ylab="Cases",xlab="time") if (!all(is.na(stsBP@ci))) { polygon( c(1:nrow(stsBP),rev(1:nrow(stsBP))), c(stsBP@ci[2,,1],rev(stsBP@ci[1,,1])),col="lightgray") } lines(upperbound(stsBP),type="l",lwd=2) legend(x="topright",c(expression(lambda[t])),lty=c(1),col=c(1),fill=c(NA),border=c(NA),lwd=c(2)) invisible() } #Plot the result of k=0 and add truth for comparison. No CIs available plot.stsBP(sts.bp) lines(1:length(Y),X,col=2,type="h") #Same for k=2 plot.stsBP(sts.bp2) lines(1:length(Y),X,col=2,type="h")
#Generate an artificial outbreak of size n starting at time t0 and being of length n <- 1e3 ; t0 <- 23 ; l <- 10 #PMF of the incubation time is an interval censored gamma distribution #with mean 15 truncated at 25. dmax <- 25 inc.pmf <- c(0,(pgamma(1:dmax,15,1.4) - pgamma(0:(dmax-1),15,1.4))/pgamma(dmax,15,1.4)) #Function to sample from the incubation time rincu <- function(n) { sample(0:dmax, size=n, replace=TRUE, prob=inc.pmf) } #Sample time of exposure and length of incubation time set.seed(123) exposureTimes <- t0 + sample(x=0:(l-1),size=n,replace=TRUE) symptomTimes <- exposureTimes + rincu(n) #Time series of exposure (truth) and symptom onset (observed) X <- table( factor(exposureTimes,levels=1:(max(symptomTimes)+dmax))) Y <- table( factor(symptomTimes,levels=1:(max(symptomTimes)+dmax))) #Convert Y to an sts object Ysts <- sts(Y) #Plot the outbreak plot(Ysts, xaxis.labelFormat=NULL, legend=NULL) #Add true number of exposures to the plot lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Helper function to show the EM step plotIt <- function(cur.sts) { plot(cur.sts,xaxis.labelFormat=NULL, legend.opts=NULL,ylim=c(0,140)) } #Call non-parametric back-projection function with hook function but #without bootstrapped confidence intervals bpnp.control <- list(k=0,eps=rep(0.005,2),iter.max=rep(250,2),B=-1,hookFun=plotIt,verbose=TRUE) #Fast C version (use argument: eq3a.method="C")! sts.bp <- backprojNP(Ysts, incu.pmf=inc.pmf, control=modifyList(bpnp.control,list(eq3a.method="C")), ylim=c(0,max(X,Y))) #Show result plot(sts.bp,xaxis.labelFormat=NULL,legend=NULL,lwd=c(1,1,2),lty=c(1,1,1),main="") lines(1:length(Y)+0.2,X,col="red",type="h",lty=2) #Do the convolution for the expectation mu <- matrix(0,ncol=ncol(sts.bp),nrow=nrow(sts.bp)) #Loop over all series for (j in 1:ncol(sts.bp)) { #Loop over all time points for (t in 1:nrow(sts.bp)) { #Convolution, note support of inc.pmf starts at zero (move idx by 1) i <- seq_len(t) mu[t,j] <- sum(inc.pmf[t-i+1] * upperbound(sts.bp)[i,j],na.rm=TRUE) } } #Show the fit lines(1:nrow(sts.bp)-0.5,mu[,1],col="green",type="s",lwd=3) #Non-parametric back-projection including bootstrap CIs bpnp.control2 <- modifyList(bpnp.control, list(hookFun=NULL, k=2, B=10, # in practice, use B >= 1000 ! eq3a.method="C")) sts.bp2 <- backprojNP(Ysts, incu.pmf=inc.pmf, control=bpnp.control2) ###################################################################### # Plot the result. This is currently a manual routine. # ToDo: Need to specify a plot method for stsBP objects which also # shows the CI. # # Parameters: # stsBP - object of class stsBP which is to be plotted. ###################################################################### plot.stsBP <- function(stsBP) { maxy <- max(observed(stsBP),upperbound(stsBP),stsBP@ci,na.rm=TRUE) plot(upperbound(stsBP),type="n",ylim=c(0,maxy), ylab="Cases",xlab="time") if (!all(is.na(stsBP@ci))) { polygon( c(1:nrow(stsBP),rev(1:nrow(stsBP))), c(stsBP@ci[2,,1],rev(stsBP@ci[1,,1])),col="lightgray") } lines(upperbound(stsBP),type="l",lwd=2) legend(x="topright",c(expression(lambda[t])),lty=c(1),col=c(1),fill=c(NA),border=c(NA),lwd=c(2)) invisible() } #Plot the result of k=0 and add truth for comparison. No CIs available plot.stsBP(sts.bp) lines(1:length(Y),X,col=2,type="h") #Same for k=2 plot.stsBP(sts.bp2) lines(1:length(Y),X,col=2,type="h")
Given a prime number factorization x
, bestCombination
partitions x
into two groups, such that the product of the numbers
in group one is as similar as possible to the product
of the numbers of group two. This is useful in magic.dim
.
bestCombination(x)
bestCombination(x)
x |
prime number factorization |
a vector c(prod(set1),prod(set2))
The function takes range
values of a univariate surveillance time
series sts
and for each time point uses a negative binomial
regression model to compute the predictive posterior distribution for
the current observation. The
quantile of this predictive distribution is then
used as bound: If the actual observation is above the bound an alarm
is raised.
The Bayesian Outbreak Detection Algorithm (
boda
) is due to
Manitz and Höhle (2013) and its implementation is
illustrated in Salmon et al. (2016).
However, boda
should be considered as an experiment, see the
Warning section below!
boda(sts, control = list( range=NULL, X=NULL, trend=FALSE, season=FALSE, prior=c('iid','rw1','rw2'), alpha=0.05, mc.munu=100, mc.y=10, verbose=FALSE, samplingMethod=c('joint','marginals'), quantileMethod=c("MC","MM") ))
boda(sts, control = list( range=NULL, X=NULL, trend=FALSE, season=FALSE, prior=c('iid','rw1','rw2'), alpha=0.05, mc.munu=100, mc.y=10, verbose=FALSE, samplingMethod=c('joint','marginals'), quantileMethod=c("MC","MM") ))
sts |
object of class sts (including the |
control |
Control object given as a
|
This function is currently experimental!! It also heavily depends on the INLA package so changes there might affect the operational ability of this function. Since the computations for the Bayesian GAM are quite involved do not expect this function to be particularly fast.
Results are not reproducible if INLA uses parallelization (as by default);
set INLA::inla.setOption(num.threads = "1:1")
to avoid that,
then do set.seed
as usual.
This function requires the R package INLA, which is currently
not available from CRAN. It can be obtained from INLA's own
repository via
install.packages("INLA", repos="https://inla.r-inla-download.org/R/stable")
.
J. Manitz, M. Höhle, M. Salmon
Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
## Not run: ## running this example takes a couple of minutes #Load the campylobacteriosis data for Germany data("campyDE") #Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) #Define monitoring period # range <- which(epoch(cam.sts)>=as.Date("2007-01-01")) # range <- which(epoch(cam.sts)>=as.Date("2011-12-10")) range <- tail(1:nrow(cam.sts),n=2) control <- list(range=range, X=NULL, trend=TRUE, season=TRUE, prior='iid', alpha=0.025, mc.munu=100, mc.y=10, samplingMethod = "joint") #Apply the boda algorithm in its simples form, i.e. spline is #described by iid random effects and no extra covariates library("INLA") # needs to be attached cam.boda1 <- boda(cam.sts, control=control) plot(cam.boda1, xlab='time [weeks]', ylab='No. reported', dx.upperbound=0) ## End(Not run)
## Not run: ## running this example takes a couple of minutes #Load the campylobacteriosis data for Germany data("campyDE") #Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) #Define monitoring period # range <- which(epoch(cam.sts)>=as.Date("2007-01-01")) # range <- which(epoch(cam.sts)>=as.Date("2011-12-10")) range <- tail(1:nrow(cam.sts),n=2) control <- list(range=range, X=NULL, trend=TRUE, season=TRUE, prior='iid', alpha=0.025, mc.munu=100, mc.y=10, samplingMethod = "joint") #Apply the boda algorithm in its simples form, i.e. spline is #described by iid random effects and no extra covariates library("INLA") # needs to be attached cam.boda1 <- boda(cam.sts, control=control) plot(cam.boda1, xlab='time [weeks]', ylab='No. reported', dx.upperbound=0) ## End(Not run)
The function takes range
values of the surveillance time
series sts
and for each time point uses a Bayesian model of the negative binomial family with
log link inspired by the work of Noufaily et al. (2012) and of Manitz and Höhle (2014). It allows delay-corrected aberration detection as explained in Salmon et al. (2015). A reportingTriangle
has to be provided in the control
slot.
bodaDelay(sts, control = list( range = NULL, b = 5, w = 3, mc.munu = 100, mc.y = 10, pastAberrations = TRUE, verbose = FALSE, alpha = 0.05, trend = TRUE, limit54 = c(5,4), inferenceMethod = c("asym","INLA"), quantileMethod = c("MC","MM"), noPeriods = 1, pastWeeksNotIncluded = NULL, delay = FALSE))
bodaDelay(sts, control = list( range = NULL, b = 5, w = 3, mc.munu = 100, mc.y = 10, pastAberrations = TRUE, verbose = FALSE, alpha = 0.05, trend = TRUE, limit54 = c(5,4), inferenceMethod = c("asym","INLA"), quantileMethod = c("MC","MM"), noPeriods = 1, pastWeeksNotIncluded = NULL, delay = FALSE))
sts |
sts-object to be analysed. Needs to have a reporting triangle. |
control |
list of control arguments:
|
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
Salmon, M., Schumacher, D., Stark, K., Höhle, M. (2015): Bayesian outbreak detection in the presence of reporting delays. Biometrical Journal, 57 (6), 1051-1067.
## Not run: data("stsNewport") salm.Normal <- list() salmDelayAsym <- list() for (week in 43:45){ listWeeks <- as.Date(row.names(stsNewport@control$reportingTriangle$n)) dateObs <- listWeeks[isoWeekYear(listWeeks)$ISOYear==2011 & isoWeekYear(listWeeks)$ISOWeek==week] stsC <- sts_observation(stsNewport, dateObservation=dateObs, cut=TRUE) inWeeks <- with(isoWeekYear(epoch(stsC)), ISOYear == 2011 & ISOWeek >= 40 & ISOWeek <= 48) rangeTest <- which(inWeeks) alpha <- 0.07 # Control slot for Noufaily method controlNoufaily <- list(range=rangeTest,noPeriods=10, b=4,w=3,weightsThreshold=2.58,pastWeeksNotIncluded=26, pThresholdTrend=1,thresholdMethod="nbPlugin",alpha=alpha*2, limit54=c(0,50)) # Control slot for the Proposed algorithm with D=0 correction controlNormal <- list(range = rangeTest, b = 4, w = 3, reweight = TRUE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=FALSE) # Control slot for the Proposed algorithm with D=10 correction controlDelayNorm <- list(range = rangeTest, b = 4, w = 3, reweight = FALSE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=TRUE,inferenceMethod="asym") set.seed(1) salm.Normal[[week]] <- farringtonFlexible(stsC, controlNoufaily) salmDelayAsym[[week]] <- bodaDelay(stsC, controlDelayNorm) } opar <- par(mfrow=c(2,3)) lapply(salmDelayAsym[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) lapply(salm.Normal[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) par(opar) ## End(Not run)
## Not run: data("stsNewport") salm.Normal <- list() salmDelayAsym <- list() for (week in 43:45){ listWeeks <- as.Date(row.names(stsNewport@control$reportingTriangle$n)) dateObs <- listWeeks[isoWeekYear(listWeeks)$ISOYear==2011 & isoWeekYear(listWeeks)$ISOWeek==week] stsC <- sts_observation(stsNewport, dateObservation=dateObs, cut=TRUE) inWeeks <- with(isoWeekYear(epoch(stsC)), ISOYear == 2011 & ISOWeek >= 40 & ISOWeek <= 48) rangeTest <- which(inWeeks) alpha <- 0.07 # Control slot for Noufaily method controlNoufaily <- list(range=rangeTest,noPeriods=10, b=4,w=3,weightsThreshold=2.58,pastWeeksNotIncluded=26, pThresholdTrend=1,thresholdMethod="nbPlugin",alpha=alpha*2, limit54=c(0,50)) # Control slot for the Proposed algorithm with D=0 correction controlNormal <- list(range = rangeTest, b = 4, w = 3, reweight = TRUE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=FALSE) # Control slot for the Proposed algorithm with D=10 correction controlDelayNorm <- list(range = rangeTest, b = 4, w = 3, reweight = FALSE, mc.munu=10000, mc.y=100, verbose = FALSE, alpha = alpha, trend = TRUE, limit54=c(0,50), noPeriods = 10, pastWeeksNotIncluded = 26, delay=TRUE,inferenceMethod="asym") set.seed(1) salm.Normal[[week]] <- farringtonFlexible(stsC, controlNoufaily) salmDelayAsym[[week]] <- bodaDelay(stsC, controlDelayNorm) } opar <- par(mfrow=c(2,3)) lapply(salmDelayAsym[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) lapply(salm.Normal[c(43,44,45)],plot, legend=NULL, main="", ylim=c(0,35)) par(opar) ## End(Not run)
The implemented calibration tests for Poisson or negative binomial
predictions of count data are based on proper scoring rules and
described in detail in Wei and Held (2014).
The following proper scoring rules are available:
Dawid-Sebastiani score ("dss"
),
logarithmic score ("logs"
),
ranked probability score ("rps"
).
calibrationTest(x, ...) ## Default S3 method: calibrationTest(x, mu, size = NULL, which = c("dss", "logs", "rps"), tolerance = 1e-4, method = 2, ...)
calibrationTest(x, ...) ## Default S3 method: calibrationTest(x, mu, size = NULL, which = c("dss", "logs", "rps"), tolerance = 1e-4, method = 2, ...)
x |
the observed counts. All involved functions are vectorized and also accept matrices or arrays. |
mu |
the means of the predictive distributions for the
observations |
size |
either |
which |
a character string indicating which proper scoring rule to apply. |
tolerance |
absolute tolerance for the null expectation and variance of
|
method |
selection of the |
... |
unused (argument of the generic). |
an object of class "htest"
,
which is a list with the following components:
method |
a character string indicating the type of test
performed (including |
data.name |
a character string naming the supplied |
statistic |
the |
parameter |
the number of predictions underlying the test, i.e., |
p.value |
the p-value for the test. |
If the gsl package is installed, its implementations of the
Bessel and hypergeometric functions are used when calculating the null
expectation and variance of the rps
.
These functions are faster and yield more accurate results (especially
for larger mu
).
Sebastian Meyer and Wei Wei
Wei, W. and Held, L. (2014): Calibration tests for count data. Test, 23, 787-805.
mu <- c(0.1, 1, 3, 6, pi, 100) size <- 0.1 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) calibrationTest(y, mu = mu, size = size) # p = 0.99 calibrationTest(y, mu = mu, size = 1) # p = 4.3e-05 calibrationTest(y, mu = 1, size = size) # p = 0.6959 calibrationTest(y, mu = 1, size = size, which = "rps") # p = 0.1286
mu <- c(0.1, 1, 3, 6, pi, 100) size <- 0.1 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) calibrationTest(y, mu = mu, size = size) # p = 0.99 calibrationTest(y, mu = mu, size = 1) # p = 4.3e-05 calibrationTest(y, mu = 1, size = size) # p = 0.6959 calibrationTest(y, mu = 1, size = size, which = "rps") # p = 0.1286
Weekly number of reported campylobacteriosis cases in Germany, 2002-2011, together with the corresponding absolute humidity (in g/m^3) that week. The absolute humidity was computed according to the procedure by Dengler (1997) using the means of representative weather station data from the German Climate service.
data(campyDE)
data(campyDE)
A data.frame
containing the following columns
date
Date
instance containing the Monday of the
reporting week.
case
Number of reported cases that week.
state
Boolean indicating whether there is external knowledge about an outbreak that week
hum
Mean absolute humidity (in g/m^3) of that week as measured by a single representative weather station.
l1.hum
-l5.hum
Lagged version (lagged by 1-5) of
the hum
covariate.
Boolean indicating whether the reporting week corresponds to the first two weeks of the year (TRUE) or not (FALSE). Note: The first week of a year is here defined as the first reporting week, which has its corresponding Monday within new year.
Boolean indicating whether the reporting week
corresponds to the last two weeks of the year (TRUE) or not
(FALSE). Note: This are the first two weeks before the
newyears
weeks.
Boolean indicating whether the reporting week corresponds to the W21-W30 period of increased gastroenteritis awareness during the O104:H4 STEC outbreak.
The data on campylobacteriosis cases have been queried from the Survstat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/).
Data for the computation of absolute humidity were obtained from the German Climate Service (Deutscher Wetterdienst), Climate data of Germany, available at https://www.dwd.de.
A complete data description and an analysis of the data can be found in Manitz and Höhle (2013).
Manitz, J. and Höhle, M. (2013): Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biometrical Journal, 55(4), 509-526.
# Load the data data("campyDE") # O104 period is W21-W30 in 2011 stopifnot(all(campyDE$O104period == ( (campyDE$date >= as.Date("2011-05-23")) & (campyDE$date < as.Date("2011-07-31")) ))) # Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) # Plot the result plot(cam.sts)
# Load the data data("campyDE") # O104 period is W21-W30 in 2011 stopifnot(all(campyDE$O104period == ( (campyDE$date >= as.Date("2011-05-23")) & (campyDE$date < as.Date("2011-07-31")) ))) # Make an sts object from the data.frame cam.sts <- sts(epoch=campyDE$date, observed=campyDE$case, state=campyDE$state) # Plot the result plot(cam.sts)
Function to process sts
object by binomial, beta-binomial
or multinomial CUSUM as described by Höhle (2010).
Logistic, multinomial logistic, proportional
odds or Bradley-Terry regression models are used to specify in-control
and out-of-control parameters.
The implementation is illustrated in Salmon et al. (2016).
categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL, pi1=NULL, dfun=NULL, ret=c("cases","value")),...)
categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL, pi1=NULL, dfun=NULL, ret=c("cases","value")),...)
stsObj |
Object of class |
control |
Control object containing several items
|
... |
Additional arguments to send to |
The function allows the monitoring of categorical time series as described by regression models for binomial, beta-binomial or multinomial data. The later includes e.g. multinomial logistic regression models, proportional odds models or Bradley-Terry models for paired comparisons. See the Höhle (2010) reference for further details about the methodology.
Once an alarm is found the CUSUM scheme is reset (to zero) and monitoring continues from there.
An sts
object with observed
, alarm
,
etc. slots trimmed to the control$range
indices.
M. Höhle
Höhle, M. (2010): Online Change-Point Detection in Categorical Time Series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Physica-Verlag.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
## IGNORE_RDIFF_BEGIN have_GAMLSS <- require("gamlss") ## IGNORE_RDIFF_END if (have_GAMLSS) { ########################################################################### #Beta-binomial CUSUM for a small example containing the time-varying #number of positive test out of a time-varying number of total #test. ####################################### #Load meat inspection data data("abattoir") #Use GAMLSS to fit beta-bin regression model phase1 <- 1:(2*52) phase2 <- (max(phase1)+1) : nrow(abattoir) #Fit beta-binomial model using GAMLSS abattoir.df <- as.data.frame(abattoir) #Replace the observed and epoch column names to something more convenient dict <- c("observed"="y", "epoch"="t", "population"="n") replace <- dict[colnames(abattoir.df)] colnames(abattoir.df)[!is.na(replace)] <- replace[!is.na(replace)] m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + + sin(2*pi/52*t) + cos(2*pi/52*t) + + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1, family=BB(sigma.link="log"), data=abattoir.df[phase1,c("n","y","t")]) #CUSUM parameters R <- 2 #detect a doubling of the odds for a test being positive h <- 4 #threshold of the cusum #Compute in-control and out of control mean pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response") pi1 <- plogis(qlogis(pi0)+log(R)) #Create matrix with in control and out of control proportions. #Categories are D=1 and D=0, where the latter is the reference category pi0m <- rbind(pi0, 1-pi0) pi1m <- rbind(pi1, 1-pi1) ###################################################################### # Use the multinomial surveillance function. To this end it is necessary # to create a new abattoir object containing counts and proportion for # each of the k=2 categories. For binomial data this appears a bit # redundant, but generalizes easier to k>2 categories. ###################################################################### abattoir2 <- sts(epoch=1:nrow(abattoir), start=c(2006,1), freq=52, observed=cbind(abattoir@observed, abattoir@populationFrac-abattoir@observed), populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac), state=matrix(0,nrow=nrow(abattoir),ncol=2), multinomialTS=TRUE) ###################################################################### #Function to use as dfun in the categoricalCUSUM #(just a wrapper to the dBB function). Note that from v 3.0-1 the #first argument of dBB changed its name from "y" to "x"! ###################################################################### mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) { return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log)) } #Create control object for multinom cusum and use the categoricalCUSUM #method control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases", dfun=mydBB.cusum) surv <- categoricalCUSUM(abattoir2, control=control, sigma=exp(m.bbin$sigma.coef)) #Show results plot(surv[,1],dx.upperbound=0) lines(pi0,col="green") lines(pi1,col="red") #Index of the alarm which.max(alarms(surv[,1])) }
## IGNORE_RDIFF_BEGIN have_GAMLSS <- require("gamlss") ## IGNORE_RDIFF_END if (have_GAMLSS) { ########################################################################### #Beta-binomial CUSUM for a small example containing the time-varying #number of positive test out of a time-varying number of total #test. ####################################### #Load meat inspection data data("abattoir") #Use GAMLSS to fit beta-bin regression model phase1 <- 1:(2*52) phase2 <- (max(phase1)+1) : nrow(abattoir) #Fit beta-binomial model using GAMLSS abattoir.df <- as.data.frame(abattoir) #Replace the observed and epoch column names to something more convenient dict <- c("observed"="y", "epoch"="t", "population"="n") replace <- dict[colnames(abattoir.df)] colnames(abattoir.df)[!is.na(replace)] <- replace[!is.na(replace)] m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + + sin(2*pi/52*t) + cos(2*pi/52*t) + + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1, family=BB(sigma.link="log"), data=abattoir.df[phase1,c("n","y","t")]) #CUSUM parameters R <- 2 #detect a doubling of the odds for a test being positive h <- 4 #threshold of the cusum #Compute in-control and out of control mean pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response") pi1 <- plogis(qlogis(pi0)+log(R)) #Create matrix with in control and out of control proportions. #Categories are D=1 and D=0, where the latter is the reference category pi0m <- rbind(pi0, 1-pi0) pi1m <- rbind(pi1, 1-pi1) ###################################################################### # Use the multinomial surveillance function. To this end it is necessary # to create a new abattoir object containing counts and proportion for # each of the k=2 categories. For binomial data this appears a bit # redundant, but generalizes easier to k>2 categories. ###################################################################### abattoir2 <- sts(epoch=1:nrow(abattoir), start=c(2006,1), freq=52, observed=cbind(abattoir@observed, abattoir@populationFrac-abattoir@observed), populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac), state=matrix(0,nrow=nrow(abattoir),ncol=2), multinomialTS=TRUE) ###################################################################### #Function to use as dfun in the categoricalCUSUM #(just a wrapper to the dBB function). Note that from v 3.0-1 the #first argument of dBB changed its name from "y" to "x"! ###################################################################### mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) { return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log)) } #Create control object for multinom cusum and use the categoricalCUSUM #method control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases", dfun=mydBB.cusum) surv <- categoricalCUSUM(abattoir2, control=control, sigma=exp(m.bbin$sigma.coef)) #Show results plot(surv[,1],dx.upperbound=0) lines(pi0,col="green") lines(pi1,col="red") #Index of the alarm which.max(alarms(surv[,1])) }
twinSIR
or twinstim
Transform the residual process (cf. the
residuals
methods for classes
"twinSIR"
and "twinstim"
) such that the transformed
residuals should be uniformly distributed if the fitted model
well describes the true conditional intensity function. Graphically
check this using ks.plot.unif
.
The transformation for the residuals tau
is
1 - exp(-diff(c(0,tau)))
(cf. Ogata, 1988).
Another plot inspects the serial correlation between the transformed
residuals (scatterplot between and
).
checkResidualProcess(object, plot = 1:2, mfrow = c(1,length(plot)), ...)
checkResidualProcess(object, plot = 1:2, mfrow = c(1,length(plot)), ...)
object |
|
plot |
logical (or integer index) vector indicating if (which) plots of the
transformed residuals should be produced. The |
mfrow |
see |
... |
further arguments passed to |
A list (returned invisibly, if plot = TRUE
) with the following
components:
the residual process obtained by
residuals(object)
.
the transformed residuals which should be distributed as U(0,1).
the result of the ks.test
for the uniform
distribution of U
.
Sebastian Meyer
Ogata, Y. (1988) Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83, 9-27
ks.plot.unif
and the
residuals
-method for classes
"twinSIR"
and "twinstim"
.
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) # a simplistic model ## extract the "residual process", i.e., the fitted cumulative intensities residuals(fit) ## assess goodness of fit based on these residuals checkResidualProcess(fit) # could be better
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) # a simplistic model ## extract the "residual process", i.e., the fitted cumulative intensities residuals(fit) ## assess goodness of fit based on these residuals checkResidualProcess(fit) # could be better
lapply
Use lapply
if the input is a list and otherwise apply the
function directly to the input and wrap the result in a list.
The function is implemented as
if (is.list(X)) lapply(X, FUN, ...) else list(FUN(X, ...))
clapply(X, FUN, ...)
clapply(X, FUN, ...)
X |
a list or a single |
FUN |
the function to be applied to (each element of) |
... |
optional arguments to |
a list (of length 1 if X
is not a list).
S3-generic function to use with models which contain several groups of
coefficients in their coefficient vector. The coeflist
methods
are intended to list the coefficients by group. The default method
simply split
s the coefficient vector given the number of
coefficients by group.
coeflist(x, ...) ## Default S3 method: coeflist(x, npars, ...)
coeflist(x, ...) ## Default S3 method: coeflist(x, npars, ...)
x |
a model with groups of coefficients or, for the default method, a vector of coefficients. |
npars |
a named vector specifying the number of coefficients per group. |
... |
potential further arguments (currently ignored). |
a list of coefficients
Sebastian Meyer
## the default method just 'split's the coefficient vector coefs <- c(a = 1, b = 3, dispersion = 0.5) npars <- c(regression = 2, variance = 1) coeflist(coefs, npars)
## the default method just 'split's the coefficient vector coefs <- c(a = 1, b = 3, dispersion = 0.5) npars <- c(regression = 2, variance = 1) coeflist(coefs, npars)
The dataset from Steiner et al. (1999) on A synthetic dataset from the Danish meat inspection – useful for illustrating the beta-binomial CUSUM.
data(deleval)
data(deleval)
Steiner et al. (1999) use data from de Leval et al. (1994) to illustrate monitoring of failure rates of a surgical procedure for a bivariate outcome.
Over a period of six years an arterial switch operation was performed
on 104 newborn babies. Since the death rate from this surgery was
relatively low the idea of surgical "near miss" was introduced. It is
defined as the need to reinstitute cardiopulmonary bypass after a trial
period of weaning. The object of class sts
contains the
recordings of near misses and deaths from the surgery for the 104
newborn babies of the study.
The data could also be handled by a multinomial CUSUM model.
Steiner, S. H., Cook, R. J., and Farewell, V. T. (1999), Monitoring paired binary surgical outcomes using cumulative sum charts, Statistics in Medicine, 18, pp. 69–86.
De Leval, Marc R., Franiois, K., Bull, C., Brawn, W. B. and Spiegelhalter, D. (1994), Analysis of a cluster of surgical failures, Journal of Thoracic and Cardiovascular Surgery, March, pp. 914–924.
data("deleval") plot(deleval, xaxis.labelFormat=NULL,ylab="Response",xlab="Patient number")
data("deleval") plot(deleval, xaxis.labelFormat=NULL,ylab="Response",xlab="Patient number")
Generates a polygon representing a disc/circle (in planar
coordinates) as an object of one of three possible
classes: "Polygon"
from package sp,
"owin"
from package spatstat.geom, or
"gpc.poly"
from gpclib (if available).
discpoly(center, radius, npoly = 64, class = c("Polygon", "owin", "gpc.poly"), hole = FALSE)
discpoly(center, radius, npoly = 64, class = c("Polygon", "owin", "gpc.poly"), hole = FALSE)
center |
numeric vector of length 2 (center coordinates of the circle). |
radius |
single numeric value (radius of the circle). |
npoly |
single integer. Number of edges of the polygonal approximation. |
class |
class of the resulting polygon (partial name
matching applies). For |
hole |
logical. Does the resulting polygon represent a hole? |
A polygon of class class
representing a
circle/disc with npoly
edges accuracy.
If class="gpc.poly"
and this S4 class is not yet registered
in the current R session (by loading gpclib beforehand), only the
pts
slot of a "gpc.poly"
is returned with a warning.
disc
in package spatstat.geom.
## Construct circles with increasing accuracy and of different spatial classes disc1 <- discpoly(c(0,0), 5, npoly=4, class = "owin") disc2 <- discpoly(c(0,0), 5, npoly=16, class = "Polygon") disc3 <- discpoly(c(0,0), 5, npoly=64, class = "gpc.poly") # may warn ## Look at the results print(disc1) plot(disc1, axes=TRUE, main="", border=2) str(disc2) lines(disc2, col=3) str(disc3) # a list or a formal "gpc.poly" (if gpclib is available) if (is(disc3, "gpc.poly")) { plot(disc3, add=TRUE, poly.args=list(border=4)) } else { lines(disc3[[1]], col=4) } ## to only _draw_ a circle symbols(0, 0, circles=5, inches=FALSE, add=TRUE, fg=5)
## Construct circles with increasing accuracy and of different spatial classes disc1 <- discpoly(c(0,0), 5, npoly=4, class = "owin") disc2 <- discpoly(c(0,0), 5, npoly=16, class = "Polygon") disc3 <- discpoly(c(0,0), 5, npoly=64, class = "gpc.poly") # may warn ## Look at the results print(disc1) plot(disc1, axes=TRUE, main="", border=2) str(disc2) lines(disc2, col=3) str(disc3) # a list or a formal "gpc.poly" (if gpclib is available) if (is(disc3, "gpc.poly")) { plot(disc3, add=TRUE, poly.args=list(border=4)) } else { lines(disc3[[1]], col=4) } ## to only _draw_ a circle symbols(0, 0, circles=5, inches=FALSE, add=TRUE, fg=5)
A small helper function to convert a disProg
object to become
an object of the S4 class sts
and vice versa. In the future the
sts
should replace the disProg
class, but for now this
function allows for conversion between the two formats.
disProg2sts(disProgObj, map=NULL) sts2disProg(sts)
disProg2sts(disProgObj, map=NULL) sts2disProg(sts)
disProgObj |
an object of class |
map |
an optional |
sts |
an object of class |
an object of class "sts"
or "disProg"
, respectively.
data(ha) print(disProg2sts(ha)) class(sts2disProg(disProg2sts(ha)))
data(ha) print(disProg2sts(ha)) class(sts2disProg(disProg2sts(ha)))
The function takes range
values of the surveillance time
series sts
and for each time point computes a threshold for the number of counts
based on values from the recent past.
This is then compared to the observed
number of counts. If the observation is above a specific quantile of
the prediction interval, then an alarm is raised. This method is especially useful
for data without many historic values, since it only needs counts from the recent past.
earsC(sts, control = list(range = NULL, method = "C1", baseline = 7, minSigma = 0, alpha = 0.001))
earsC(sts, control = list(range = NULL, method = "C1", baseline = 7, minSigma = 0, alpha = 0.001))
sts |
object of class sts (including the |
control |
Control object
|
The three methods are different in terms of baseline used for calculation of the expected value and in terms of method for calculating the expected value:
in C1 and C2 the expected value is the moving average of counts over the sliding window of the baseline and the prediction interval depends on the standard derivation of the observed counts in this window. They can be considered as Shewhart control charts with a small sample used for calculations.
in C3 the expected value is based on the sum over 3 timepoints (assessed timepoints and the two previous timepoints) of the discrepancy between observations and predictions, predictions being calculated with the C2 method. This method has similarities with a CUSUM method due to it adding discrepancies between predictions and observations over several timepoints, but is not a CUSUM (sum over 3 timepoints, not accumulation over a whole range), even if it sometimes is presented as such.
Here is what the function does for each method, see the literature sources for further details:
For C1 the baseline are the baseline
(default 7) timepoints before the assessed timepoint t,
t-baseline
to t-1. The expected value is the mean of the baseline. An approximate
(two-sided) prediction interval is calculated based on the
assumption that the difference between the expected value and the observed
value divided by the standard derivation of counts over the sliding window,
called
, follows a standard normal distribution in the absence
of outbreaks:
where
and
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
C2 is very similar to C1 apart from a 2-day lag in the baseline definition.
In other words the baseline for C2 is baseline
(Default: 7) timepoints with a 2-day lag before the monitored
timepoint t, i.e. to
. The expected value is the mean of the baseline. An approximate
(two-sided)
prediction interval is calculated based on the
assumption that the difference between the expected value and the observed
value divided by the standard derivation of counts over the sliding window,
called
, follows a standard normal distribution in the absence
of outbreaks:
where
and
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
C3 is quite different from the two other methods, but it is based on C2.
Indeed it uses from timepoint t and the two previous timepoints.
This means the baseline consists of the timepoints
to
.
The statistic
is the sum of discrepancies between observations and
predictions.
Then under the null hypothesis of no outbreak,
An alarm is raised if
with the
quantile of the standard normal distribution.
The upperbound is then defined by:
An object of class sts
with the slots upperbound
and alarm
filled
by the chosen method.
M. Salmon, H. Burkom
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
#Sim data and convert to sts object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) stsObj <- disProg2sts( disProgObj) # Call earsC function and show result res1 <- earsC(stsObj, control = list(range = 20:208, method="C1")) plot(res1, legend.opts=list(horiz=TRUE, x="topright")) # Compare C3 upperbounds depending on alpha res3 <- earsC(stsObj, control = list(range = 20:208,method="C3",alpha = 0.001)) plot(upperbound(res3), type='l') res3 <- earsC(stsObj, control = list(range = 20:208,method="C3")) lines(upperbound(res3), col='red')
#Sim data and convert to sts object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) stsObj <- disProg2sts( disProgObj) # Call earsC function and show result res1 <- earsC(stsObj, control = list(range = 20:208, method="C1")) plot(res1, legend.opts=list(horiz=TRUE, x="topright")) # Compare C3 upperbounds depending on alpha res3 <- earsC(stsObj, control = list(range = 20:208,method="C3",alpha = 0.001)) plot(upperbound(res3), type='l') res3 <- earsC(stsObj, control = list(range = 20:208,method="C3")) lines(upperbound(res3), col='red')
The function as.epidata
is used to generate objects
of class "epidata"
. Objects of this class are
specific data frames containing the event history of an epidemic together
with some additional attributes. These objects are the basis for fitting
spatio-temporal epidemic intensity models with the function
twinSIR
. Their implementation is illustrated in Meyer
et al. (2017, Section 4), see vignette("twinSIR")
.
Note that the spatial information itself, i.e.
the positions of the individuals, is assumed to be constant over time.
Besides epidemics following the SIR compartmental model, also data from SI,
SIRS and SIS epidemics may be supplied.
as.epidata(data, ...) ## S3 method for class 'data.frame' as.epidata(data, t0, tE.col, tI.col, tR.col, id.col, coords.cols, f = list(), w = list(), D = dist, max.time = NULL, keep.cols = TRUE, ...) ## Default S3 method: as.epidata(data, id.col, start.col, stop.col, atRiskY.col, event.col, Revent.col, coords.cols, f = list(), w = list(), D = dist, .latent = FALSE, ...) ## S3 method for class 'epidata' print(x, ...) ## S3 method for class 'epidata' x[i, j, drop] ## S3 method for class 'epidata' update(object, f = list(), w = list(), D = dist, ...)
as.epidata(data, ...) ## S3 method for class 'data.frame' as.epidata(data, t0, tE.col, tI.col, tR.col, id.col, coords.cols, f = list(), w = list(), D = dist, max.time = NULL, keep.cols = TRUE, ...) ## Default S3 method: as.epidata(data, id.col, start.col, stop.col, atRiskY.col, event.col, Revent.col, coords.cols, f = list(), w = list(), D = dist, .latent = FALSE, ...) ## S3 method for class 'epidata' print(x, ...) ## S3 method for class 'epidata' x[i, j, drop] ## S3 method for class 'epidata' update(object, f = list(), w = list(), D = dist, ...)
data |
For the |
t0 , max.time
|
observation period. In the resulting |
tE.col , tI.col , tR.col
|
single numeric or character indexes of the time columns in
|
id.col |
single numeric or character index of the |
start.col |
single index of the |
stop.col |
single index of the |
atRiskY.col |
single index of the |
event.col |
single index of the |
Revent.col |
single index of the |
coords.cols |
indexes of the |
f |
a named list of vectorized functions for a
distance-based force of infection.
The functions must interact elementwise on a (distance) matrix |
w |
a named list of vectorized functions for extra
covariate-based weights |
D |
either a function to calculate the distances between the individuals
with locations taken from |
keep.cols |
logical indicating if all columns in |
.latent |
(internal) logical indicating whether to allow for latent periods (EXPERIMENTAL). Otherwise (default), the function verifies that an event (i.e., switching to the I state) only happens when the respective individual is at risk (i.e., in the S state). |
x , object
|
an object of class |
... |
arguments passed to |
i , j , drop
|
arguments passed to |
The print
method for objects of class "epidata"
simply prints
the data frame with a small header containing the time range of the observed
epidemic and the number of infected individuals. Usually, the data frames
are quite long, so the summary method summary.epidata
might be
useful. Also, indexing/subsetting "epidata"
works exactly as for
data.frame
s, but there is an own method, which
assures consistency of the resulting "epidata"
or drops this class, if
necessary.
The update
-method can be used to add or replace distance-based
(f
) or covariate-based (w
) epidemic variables in an
existing "epidata"
object.
SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.
a data.frame
with the columns "BLOCK"
, "id"
,
"start"
, "stop"
, "atRiskY"
, "event"
,
"Revent"
and the coordinate columns (with the original names from
data
), which are all obligatory. These columns are followed by any
remaining columns of the input data
. Last but not least, the newly
generated columns with epidemic variables corresponding to the functions
in the list f
are appended, if length(f)
> 0.
The data.frame
is given the additional attributes
"eventTimes" |
numeric vector of infection time points (sorted chronologically). |
"timeRange" |
numeric vector of length 2: |
"coords.cols" |
numeric vector containing the column indices of the coordinate columns in the resulting data frame. |
"f" |
this equals the argument |
"w" |
this equals the argument |
The column name "BLOCK"
is a reserved name. This column will be
added automatically at conversion and the resulting data frame will be
sorted by this column and by id. Also the names "id"
, "start"
,
"stop"
, "atRiskY"
, "event"
and "Revent"
are
reserved for the respective columns only.
Sebastian Meyer
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
The hagelloch
data as an example.
The plot
and the
summary
method for class "epidata"
.
Furthermore, the function animate.epidata
for the animation of
epidemics.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
Function simEpidata
for the simulation of epidemic data.
data("hagelloch") # see help("hagelloch") for a description head(hagelloch.df) ## convert the original data frame to an "epidata" event history myEpi <- as.epidata(hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), keep.cols = c("SEX", "AGE", "CL")) str(myEpi) head(as.data.frame(myEpi)) # "epidata" has event history format summary(myEpi) # see 'summary.epidata' plot(myEpi) # see 'plot.epidata' and also 'animate.epidata' ## add distance- and covariate-based weights for the force of infection ## in a twinSIR model, see vignette("twinSIR") for a description myEpi <- update(myEpi, f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ) ) ## this is now identical to the prepared hagelloch "epidata" stopifnot(all.equal(myEpi, hagelloch))
data("hagelloch") # see help("hagelloch") for a description head(hagelloch.df) ## convert the original data frame to an "epidata" event history myEpi <- as.epidata(hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), keep.cols = c("SEX", "AGE", "CL")) str(myEpi) head(as.data.frame(myEpi)) # "epidata" has event history format summary(myEpi) # see 'summary.epidata' plot(myEpi) # see 'plot.epidata' and also 'animate.epidata' ## add distance- and covariate-based weights for the force of infection ## in a twinSIR model, see vignette("twinSIR") for a description myEpi <- update(myEpi, f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ) ) ## this is now identical to the prepared hagelloch "epidata" stopifnot(all.equal(myEpi, hagelloch))
Function for the animation of epidemic data, i.e. objects inheriting from
class "epidata"
. This only works with 1- or 2-dimensional coordinates
and is not useful if some individuals share the same coordinates
(overlapping). There are two types of animation, see argument
time.spacing
. Besides the direct plotting in the R session, it is
also possible to generate a sequence of graphics files to create animations
outside R.
## S3 method for class 'summary.epidata' animate(object, main = "An animation of the epidemic", pch = 19, col = c(3, 2, gray(0.6)), time.spacing = NULL, sleep = quote(5/.nTimes), legend.opts = list(), timer.opts = list(), end = NULL, generate.snapshots = NULL, ...) ## S3 method for class 'epidata' animate(object, ...)
## S3 method for class 'summary.epidata' animate(object, main = "An animation of the epidemic", pch = 19, col = c(3, 2, gray(0.6)), time.spacing = NULL, sleep = quote(5/.nTimes), legend.opts = list(), timer.opts = list(), end = NULL, generate.snapshots = NULL, ...) ## S3 method for class 'epidata' animate(object, ...)
object |
an object inheriting from class |
main |
a main title for the plot, see also |
pch , col
|
vectors of length 3 specifying the point symbols and colors for
susceptible, infectious and removed individuals (in this order).
The vectors are recycled if necessary.
By default, susceptible individuals are marked as filled green circles,
infectious individuals as filled red circles and removed individuals as
filled gray circles. Note that the symbols are iteratively drawn
(overlaid) in the same plotting region as time proceeds.
For information about the possible values of |
time.spacing |
time interval for the animation steps. If |
sleep |
time in seconds to |
legend.opts |
either a list of arguments passed to the
|
timer.opts |
either a list of arguments passed to the
Note that the argument |
end |
ending time of the animation in case of |
generate.snapshots |
By default (
will store the animation steps in pdf-files in the current
working directory, where the file names each end with the time point
represented by the corresponding plot. Because the variables |
... |
further graphical parameters passed to the basic call of |
Sebastian Meyer
summary.epidata
for the data, on which the plot is based.
plot.epidata
for plotting the evolution of an epidemic by
the numbers of susceptible, infectious and removed individuals.
The contributed R package animation.
data("hagelloch") (s <- summary(hagelloch)) # plot the ordering of the events only animate(s) # or: animate(hagelloch) # with timer (animate only up to t=10) animate(s, time.spacing=0.1, end=10, sleep=0.01, legend.opts=list(x="topleft")) # Such an animation can be saved in various ways using tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML({ par(bg="white") # default "transparent" is grey in some browsers animate(s, time.spacing=1, sleep=0, legend.opts=list(x="topleft"), generate.snapshots="epiani") }, use.dev=FALSE, img.name="epiani", ani.width=600, interval=0.5) setwd(oldwd) }
data("hagelloch") (s <- summary(hagelloch)) # plot the ordering of the events only animate(s) # or: animate(hagelloch) # with timer (animate only up to t=10) animate(s, time.spacing=0.1, end=10, sleep=0.01, legend.opts=list(x="topleft")) # Such an animation can be saved in various ways using tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML({ par(bg="white") # default "transparent" is grey in some browsers animate(s, time.spacing=1, sleep=0, legend.opts=list(x="topleft"), generate.snapshots="epiani") }, use.dev=FALSE, img.name="epiani", ani.width=600, interval=0.5) setwd(oldwd) }
"epidata"
Objects
This function modifies an object inheriting from class "epidata"
such
that it features the specified stop time points. For this purpose, the time
interval in the event history into which the new stop falls will be split
up into two parts, one block for the time period until the new stop – where
no infection or removal occurs – and the other block for the time period
from the new stop to the end of the original interval.
Main application is to enable the use of knots
in twinSIR
, which
are not existing stop time points in the "epidata"
object.
intersperse(epidata, stoptimes, verbose = FALSE)
intersperse(epidata, stoptimes, verbose = FALSE)
epidata |
an object inheriting from class |
stoptimes |
a numeric vector of time points inside the observation period of the
|
verbose |
logical indicating if a |
an object of the same class as epidata
with additional time blocks
for any new stoptimes
.
Sebastian Meyer
as.epidata.epidataCS
where this function is used.
data("hagelloch") subset(hagelloch, start < 25 & stop > 25 & id %in% 9:13, select = 1:7) # there is no "stop" time at 25, but we can add this extra stop nrow(hagelloch) moreStopsEpi <- intersperse(hagelloch, stoptimes = 25) nrow(moreStopsEpi) subset(moreStopsEpi, (stop == 25 | start == 25) & id %in% 9:13, select = 1:7)
data("hagelloch") subset(hagelloch, start < 25 & stop > 25 & id %in% 9:13, select = 1:7) # there is no "stop" time at 25, but we can add this extra stop nrow(hagelloch) moreStopsEpi <- intersperse(hagelloch, stoptimes = 25) nrow(moreStopsEpi) subset(moreStopsEpi, (stop == 25 | start == 25) & id %in% 9:13, select = 1:7)
Functions for plotting the evolution of epidemics. The plot
methods for class
es "epidata"
and
"summary.epidata"
plots the numbers of susceptible, infectious and
recovered (= removed) individuals by step functions along the time axis. The
function stateplot
shows individual state changes along the time axis.
## S3 method for class 'summary.epidata' plot(x, lty = c(2, 1, 3), lwd = 2, col = c("#1B9E77", "#D95F02", "#7570B3"), col.hor = col, col.vert = col, xlab = "Time", ylab = "Number of individuals", xlim = NULL, ylim = NULL, legend.opts = list(), do.axis4 = NULL, panel.first = grid(), rug.opts = list(), which.rug = c("infections", "removals", "susceptibility", "all"), ...) ## S3 method for class 'epidata' plot(x, ...) stateplot(x, id, ...)
## S3 method for class 'summary.epidata' plot(x, lty = c(2, 1, 3), lwd = 2, col = c("#1B9E77", "#D95F02", "#7570B3"), col.hor = col, col.vert = col, xlab = "Time", ylab = "Number of individuals", xlim = NULL, ylim = NULL, legend.opts = list(), do.axis4 = NULL, panel.first = grid(), rug.opts = list(), which.rug = c("infections", "removals", "susceptibility", "all"), ...) ## S3 method for class 'epidata' plot(x, ...) stateplot(x, id, ...)
x |
an object inheriting from class |
lty , lwd
|
vectors of length 3 containing the line types and widths, respectively, for
the numbers of susceptible, infectious and removed individuals (in this
order). By default, all lines have width 1 and the line types are dashed
(susceptible), solid (infectious) and dotted (removed), respectively. To
omit the drawing of a specific line, just set the corresponding entry in
|
col , col.hor , col.vert
|
vectors of length 3 containing the line colors for the numbers of
susceptible, infectious and removed individuals (in this order).
|
xlab , ylab
|
axis labels, default to "Time" and "Number of individuals", respectively. |
xlim , ylim
|
the x and y limits of the plot in the form |
legend.opts |
if this is a list (of arguments for the
|
do.axis4 |
logical indicating if the final numbers of susceptible and removed
individuals should be indicated on the right axis. The default |
panel.first |
an expression to be evaluated after the plot axes are set up but before any plotting takes place. By default, a standard grid is drawn. |
rug.opts |
either a list of arguments passed to the function |
which.rug |
By default, tick marks are drawn at the time points of infections.
Alternatively, one can choose to mark only |
id |
single character string or factor of length 1 specifying the individual for
which the |
... |
For |
plot.summary.epidata
(and plot.epidata
) invisibly returns the
matrix used for plotting, which contains the evolution of the three
counters.stateplot
invisibly returns the function, which was plotted,
typically of class "stepfun"
, but maybe of class "function"
,
if no events have been observed for the individual in question (then the
function always returns the initial state). The vertical axis of
stateplot
can range from 1 to 3, where 1 corresponds to
Susceptible, 2 to Infectious and 3 to Removed.
Sebastian Meyer
summary.epidata
for the data, on which the plots are based.
animate.epidata
for the animation of epidemics.
data("hagelloch") (s <- summary(hagelloch)) # rudimentary stateplot stateplot(s, id = "187") # evolution of the epidemic plot(s)
data("hagelloch") (s <- summary(hagelloch)) # rudimentary stateplot stateplot(s, id = "187") # evolution of the epidemic plot(s)
The summary
method for class
"epidata"
gives an overview of the epidemic. Its
print
method shows the type of the epidemic, the time range, the
total number of individuals, the initially and never infected individuals and
the size of the epidemic. An excerpt of the returned counters
data
frame is also printed (see the Value section below).
## S3 method for class 'epidata' summary(object, ...) ## S3 method for class 'summary.epidata' print(x, ...)
## S3 method for class 'epidata' summary(object, ...) ## S3 method for class 'summary.epidata' print(x, ...)
object |
an object inheriting from class |
x |
an object inheriting from class |
... |
unused (argument of the generic). |
A list with the following components:
type |
character string. Compartmental type of the epidemic, i.e. one of "SIR", "SI", "SIS" or "SIRS". |
size |
integer. Size of the epidemic, i.e. the number of initially susceptible individuals, which became infected during the course of the epidemic. |
initiallyInfected |
factor (with the same levels as the |
neverInfected |
factor (with the same levels as the |
coordinates |
numeric matrix of individual coordinates with as many rows as there are
individuals and one column for each spatial dimension. The row names of
the matrix are the |
byID |
data frame with time points of infection and optionally removal and
re-susceptibility (depending on the |
counters |
data frame containing all events (S, I and R) ordered by time. The columns
are |
Sebastian Meyer
as.epidata
for generating objects of class "epidata"
.
data("hagelloch") s <- summary(hagelloch) s # uses the print method for summary.epidata names(s) # components of the list 's' # positions of the individuals plot(s$coordinates) # events by id head(s$byID)
data("hagelloch") s <- summary(hagelloch) s # uses the print method for summary.epidata names(s) # components of the list 's' # positions of the individuals plot(s$coordinates) # events by id head(s$byID)
Data structure for continuous spatio-temporal event
data, e.g. individual case reports of an infectious disease.
Apart from the actual events
, the class simultaneously
holds a spatio-temporal grid of endemic covariates (similar to
disease mapping) and a representation of the observation region.
The "epidataCS"
class is the basis for fitting
spatio-temporal endemic-epidemic intensity models with the function
twinstim
(Meyer et al., 2012).
The implementation is described in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes), nCircle2Poly = 32L, T = NULL, clipper = "polyclip", verbose = interactive()) ## S3 method for class 'epidataCS' print(x, n = 6L, digits = getOption("digits"), ...) ## S3 method for class 'epidataCS' nobs(object, ...) ## S3 method for class 'epidataCS' head(x, n = 6L, ...) ## S3 method for class 'epidataCS' tail(x, n = 6L, ...) ## S3 method for class 'epidataCS' x[i, j, ..., drop = TRUE] ## S3 method for class 'epidataCS' subset(x, subset, select, drop = TRUE, ...) ## S3 method for class 'epidataCS' marks(x, coords = TRUE, ...) ## S3 method for class 'epidataCS' summary(object, ...) ## S3 method for class 'summary.epidataCS' print(x, ...) ## S3 method for class 'epidataCS' as.stepfun(x, ...) getSourceDists(object, dimension = c("space", "time"))
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes), nCircle2Poly = 32L, T = NULL, clipper = "polyclip", verbose = interactive()) ## S3 method for class 'epidataCS' print(x, n = 6L, digits = getOption("digits"), ...) ## S3 method for class 'epidataCS' nobs(object, ...) ## S3 method for class 'epidataCS' head(x, n = 6L, ...) ## S3 method for class 'epidataCS' tail(x, n = 6L, ...) ## S3 method for class 'epidataCS' x[i, j, ..., drop = TRUE] ## S3 method for class 'epidataCS' subset(x, subset, select, drop = TRUE, ...) ## S3 method for class 'epidataCS' marks(x, coords = TRUE, ...) ## S3 method for class 'epidataCS' summary(object, ...) ## S3 method for class 'summary.epidataCS' print(x, ...) ## S3 method for class 'epidataCS' as.stepfun(x, ...) getSourceDists(object, dimension = c("space", "time"))
events |
a
The |
stgrid |
a
The remaining columns are endemic covariates.
Note that the column name |
W |
an object of class |
qmatrix |
a square indicator matrix (0/1 or |
nCircle2Poly |
accuracy (number of edges) of the polygonal approximation of a circle,
see |
T |
end of observation period (i.e. last |
clipper |
polygon clipping engine to use for calculating the
|
verbose |
logical indicating if status messages should be printed
during input checking and |
x |
an object of class |
n |
a single integer. If positive, the first ( |
digits |
minimum number of significant digits to be printed in values. |
i , j , drop
|
arguments passed to the
|
... |
unused (arguments of the generics) with a few exceptions:
The |
subset , select
|
arguments used to subset the |
coords |
logical indicating if the data frame of event marks
returned by |
object |
an object of class |
dimension |
the distances of all events to their potential source
events can be computed in either the |
The function as.epidataCS
is used to generate objects of class
"epidataCS"
, which is the data structure required for
twinstim
models.
The [
-method for class "epidataCS"
ensures that the subsetted object will be valid, for instance, it
updates the auxiliary list of potential transmission paths stored
in the object. The [
-method is used in
subset.epidataCS
, which is implemented similar to
subset.data.frame
.
The print
method for "epidataCS"
prints some metadata
of the epidemic, e.g., the observation period, the dimensions of the
spatio-temporal grid, the types of events, and the total number of
events. By default, it also prints the first n = 6
rows of the
events
.
An object of class "epidataCS"
is a list containing the
following components:
events |
a
|
stgrid |
a |
W |
a |
qmatrix |
see the above description of the argument. The
|
The nobs
-method returns the number of events.
The head
and tail
methods subset the epidemic data using
the extraction method ([
), i.e. they return an object of class
"epidataCS"
, which only contains (all but) the first/last
n
events.
For the "epidataCS"
class, the method of the generic function
marks
defined by the spatstat.geom package
returns a data.frame
of the event marks (actually also
including time and location of the events), disregarding endemic
covariates and the auxiliary columns from the events
component
of the "epidataCS"
object.
The summary
method (which has again a print
method)
returns a list of metadata, event data, the tables of tiles and types,
a step function of the number of infectious individuals over time
($counter
), i.e., the result of the
as.stepfun
-method for "epidataCS"
, and the number
of potential sources of transmission for each event ($nSources
)
which is based on the given maximum interaction ranges eps.t
and eps.s
.
Since the observation region W
defines the integration domain
in the point process likelihood,
the more detailed the polygons of W
are the longer it will
take to fit a twinstim
. You are advised to
sacrifice some shape details for speed by reducing the polygon
complexity, for example via the mapshaper
JavaScript library
wrapped by the R package rmapshaper, or via
simplify.owin
.
Sebastian Meyer
Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
vignette("twinstim")
.
plot.epidataCS
for plotting, and
animate.epidataCS
for the animation of such an epidemic.
There is also an update
method for the
"epidataCS"
class.
To re-extract the events
point pattern from "epidataCS"
,
use as(object, "SpatialPointsDataFrame")
.
It is possible to convert an "epidataCS"
point pattern to
an "epidata"
object (as.epidata.epidataCS
),
or to aggregate the events into an "sts"
object
(epidataCS2sts
).
## load "imdepi" example data (which is an object of class "epidataCS") data("imdepi") ## print and summary print(imdepi, n=5, digits=2) print(s <- summary(imdepi)) plot(s$counter, # same as 'as.stepfun(imdepi)' xlab = "Time [days]", ylab="Number of infectious individuals", main=paste("Time course of the number of infectious individuals", "assuming an infectious period of 30 days", sep="\n")) plot(table(s$nSources), xlab="Number of \"close\" infective individuals", ylab="Number of events", main=paste("Distribution of the number of potential sources", "assuming an interaction range of 200 km and 30 days", sep="\n")) ## the summary object contains further information str(s) ## a histogram of the spatial distances to potential source events ## (i.e., to events of the previous eps.t=30 days within eps.s=200 km) sourceDists_space <- getSourceDists(imdepi, "space") hist(sourceDists_space); rug(sourceDists_space) ## internal structure of an "epidataCS"-object str(imdepi, max.level=4) ## see help("imdepi") for more info on the data set ## extraction methods subset the 'events' component imdepi[101:200,] head(imdepi, n=1) # only first event tail(imdepi, n=4) # only last 4 events subset(imdepi, type=="B") # only events of type B ## see help("plot.epidataCS") for convenient plot-methods for "epidataCS" ### ### reconstruct the "imdepi" object ### ## observation region load(system.file("shapes", "districtsD.RData", package="surveillance"), verbose = TRUE) ## extract point pattern of events from the "imdepi" data ## a) as a data frame with coordinate columns via marks() eventsData <- marks(imdepi) ## b) as a Spatial object via the coerce-method events <- as(imdepi, "SpatialPointsDataFrame") ## plot observation region with events plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]") points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type)) legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2) summary(events) ## space-time grid with endemic covariates head(stgrid <- imdepi$stgrid[,-1]) ## reconstruct the "imdepi" object from its components myimdepi <- as.epidataCS(events = events, stgrid = stgrid, W = stateD, qmatrix = diag(2), nCircle2Poly = 16) ## This reconstructed object should be equal to 'imdepi' as long as the internal ## structures of the embedded classes ("owin", "SpatialPolygons", ...), and ## the calculation of the influence regions by "polyclip" have not changed: all.equal(imdepi, myimdepi)
## load "imdepi" example data (which is an object of class "epidataCS") data("imdepi") ## print and summary print(imdepi, n=5, digits=2) print(s <- summary(imdepi)) plot(s$counter, # same as 'as.stepfun(imdepi)' xlab = "Time [days]", ylab="Number of infectious individuals", main=paste("Time course of the number of infectious individuals", "assuming an infectious period of 30 days", sep="\n")) plot(table(s$nSources), xlab="Number of \"close\" infective individuals", ylab="Number of events", main=paste("Distribution of the number of potential sources", "assuming an interaction range of 200 km and 30 days", sep="\n")) ## the summary object contains further information str(s) ## a histogram of the spatial distances to potential source events ## (i.e., to events of the previous eps.t=30 days within eps.s=200 km) sourceDists_space <- getSourceDists(imdepi, "space") hist(sourceDists_space); rug(sourceDists_space) ## internal structure of an "epidataCS"-object str(imdepi, max.level=4) ## see help("imdepi") for more info on the data set ## extraction methods subset the 'events' component imdepi[101:200,] head(imdepi, n=1) # only first event tail(imdepi, n=4) # only last 4 events subset(imdepi, type=="B") # only events of type B ## see help("plot.epidataCS") for convenient plot-methods for "epidataCS" ### ### reconstruct the "imdepi" object ### ## observation region load(system.file("shapes", "districtsD.RData", package="surveillance"), verbose = TRUE) ## extract point pattern of events from the "imdepi" data ## a) as a data frame with coordinate columns via marks() eventsData <- marks(imdepi) ## b) as a Spatial object via the coerce-method events <- as(imdepi, "SpatialPointsDataFrame") ## plot observation region with events plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]") points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type)) legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2) summary(events) ## space-time grid with endemic covariates head(stgrid <- imdepi$stgrid[,-1]) ## reconstruct the "imdepi" object from its components myimdepi <- as.epidataCS(events = events, stgrid = stgrid, W = stateD, qmatrix = diag(2), nCircle2Poly = 16) ## This reconstructed object should be equal to 'imdepi' as long as the internal ## structures of the embedded classes ("owin", "SpatialPolygons", ...), and ## the calculation of the influence regions by "polyclip" have not changed: all.equal(imdepi, myimdepi)
"epidataCS"
to "epidata"
or "sts"
Continuous-time continuous-space epidemic data stored in an object of
class "epidataCS"
can be aggregated in space or in space
and time yielding an object of class "epidata"
or
"sts"
for use of twinSIR
or
hhh4
modelling, respectively.
## aggregation in space and time over 'stgrid' for use of 'hhh4' models epidataCS2sts(object, freq, start, neighbourhood, tiles = NULL, popcol.stgrid = NULL, popdensity = TRUE) ## aggregation in space for use of 'twinSIR' models ## S3 method for class 'epidataCS' as.epidata(data, tileCentroids, eps = 0.001, ...)
## aggregation in space and time over 'stgrid' for use of 'hhh4' models epidataCS2sts(object, freq, start, neighbourhood, tiles = NULL, popcol.stgrid = NULL, popdensity = TRUE) ## aggregation in space for use of 'twinSIR' models ## S3 method for class 'epidataCS' as.epidata(data, tileCentroids, eps = 0.001, ...)
object , data
|
an object of class |
freq , start
|
see the description of the |
neighbourhood |
binary adjacency or neighbourhood-order matrix of the regions
( |
tiles |
object inheriting from |
popcol.stgrid |
single character or numeric value indexing the
column in |
popdensity |
logical indicating if the column referenced by
|
tileCentroids |
a coordinate matrix of the region centroids (i.e., the result of
|
eps |
numeric scalar for breaking tied removal and infection times between different
individuals (tiles), which might occur during conversion from
|
... |
unused (argument of the generic). |
Conversion to "sts"
only makes sense if the time
intervals (BLOCK
s) of the stgrid
are regularly spaced
(to give freq
intervals per year). Note that events of the
prehistory (not covered by stgrid
) are not included in the
resulting sts
object.
Some comments on the conversion to "epidata"
:
the conversion results into SIS epidemics only,
i.e. the at-risk indicator is set to 1 immediately after
recovery. A tile is considered infective if at least one individual
within the tile is infective, otherwise it is susceptible.
The lengths of the infectious periods are taken from
data$events$eps.t
. There will be no f
columns in the resulting
"epidata"
. These must be generated by a subsequent call to
as.epidata
with desired f
.
epidataCS2sts
: an object of class "sts"
representing the multivariate time-series of the number of
cases aggregated over stgrid
.
as.epidata.epidataCS
: an object of class
"epidata"
representing an SIS epidemic in form of a
multivariate point process (one for each region/tile
).
Sebastian Meyer
linkS4class{sts}
and hhh4
.
data("imdepi") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## convert imdepi point pattern into multivariate time series imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, # not needed here tiles = districtsD) ## check the overall number of events by district stopifnot(all.equal(colSums(observed(imdsts)), c(table(imdepi$events$tile)))) ## compare plots of monthly number of cases opar <- par(mfrow = c(2, 1)) plot(imdepi, "time") plot(imdsts, type = observed ~ time) par(opar) ## plot number of cases by district in Bavaria (municipality keys 09xxx) imd09 <- imdsts[, grep("^09", colnames(imdsts), value = TRUE), drop = TRUE] plot(imd09, type = observed ~ unit) ## also test conversion to an SIS event history ("epidata") of the "tiles" if (requireNamespace("intervals")) { imdepi_short <- subset(imdepi, time < 50) # to reduce the runtime imdepi_short$stgrid <- subset(imdepi_short$stgrid, start < 50) imdepidata <- as.epidata(imdepi_short, tileCentroids = coordinates(districtsD)) summary(imdepidata) }
data("imdepi") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## convert imdepi point pattern into multivariate time series imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, # not needed here tiles = districtsD) ## check the overall number of events by district stopifnot(all.equal(colSums(observed(imdsts)), c(table(imdepi$events$tile)))) ## compare plots of monthly number of cases opar <- par(mfrow = c(2, 1)) plot(imdepi, "time") plot(imdsts, type = observed ~ time) par(opar) ## plot number of cases by district in Bavaria (municipality keys 09xxx) imd09 <- imdsts[, grep("^09", colnames(imdsts), value = TRUE), drop = TRUE] plot(imd09, type = observed ~ unit) ## also test conversion to an SIS event history ("epidata") of the "tiles" if (requireNamespace("intervals")) { imdepi_short <- subset(imdepi, time < 50) # to reduce the runtime imdepi_short$stgrid <- subset(imdepi_short$stgrid, start < 50) imdepidata <- as.epidata(imdepi_short, tileCentroids = coordinates(districtsD)) summary(imdepidata) }
Function for the animation of continuous-time continuous-space
epidemic data, i.e. objects inheriting from class "epidataCS"
.
There are three types of animation, see argument time.spacing
.
Besides the on-screen plotting in the interactive R session, it is possible
and recommended to redirect the animation to an off-screen graphics
device using the contributed R package animation. For instance,
the animation can be watched and navigated in a web browser via
saveHTML
(see Examples).
## S3 method for class 'epidataCS' animate(object, interval = c(0,Inf), time.spacing = NULL, nmax = NULL, sleep = NULL, legend.opts = list(), timer.opts = list(), pch = 15:18, col.current = "red", col.I = "#C16E41", col.R = "#B3B3B3", col.influence = NULL, main = NULL, verbose = interactive(), ...)
## S3 method for class 'epidataCS' animate(object, interval = c(0,Inf), time.spacing = NULL, nmax = NULL, sleep = NULL, legend.opts = list(), timer.opts = list(), pch = 15:18, col.current = "red", col.I = "#C16E41", col.R = "#B3B3B3", col.influence = NULL, main = NULL, verbose = interactive(), ...)
object |
an object inheriting from class |
interval |
time range of the animation. |
time.spacing |
time interval for the animation steps. |
nmax |
maximum number of snapshots to generate. The default |
sleep |
numeric scalar specifying the artificial pause in seconds between two
time points (using |
pch , col
|
vectors of length equal to the number of event types specifying the point symbols and colors for events to plot (in this order). The vectors are recycled if necessary. |
legend.opts |
either a list of arguments passed to the |
timer.opts |
either a list of arguments passed to the
Note that the argument |
col.current |
color of events when occurring (new). |
col.I |
color once infectious. |
col.R |
color event has once “recovered”. If |
col.influence |
color with which the influence region is drawn. Use
|
main |
optional main title placed above the map. |
verbose |
logical specifying if a (textual) progress bar should
be shown during snapshot generation. This is especially useful if
the animation is produced within |
... |
further graphical parameters passed to the |
Sebastian Meyer with documentation contributions by Michael Höhle
plot.epidataCS
for plotting the numbers of events by time
(aggregated over space) or the locations of the events in the
observation region W
(aggregated over time).
The contributed R package animation.
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Not run: # Animate the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, nmax=Inf, sleep=0.1) # Sequential animation of type B events during the first year animate(imdepiB, interval=c(0,365), time.spacing=NULL, sleep=0.1) # Animate the whole time range but with nmax=20 snapshots only animate(imdepiB, time.spacing=NA, nmax=20, sleep=0.1) ## End(Not run) # Such an animation can be saved in various ways using the tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(imdepiB, interval = c(0,365), time.spacing = 7), nmax = Inf, interval = 0.2, loop = FALSE, title = "Animation of the first year of type B events") setwd(oldwd) }
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Not run: # Animate the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, nmax=Inf, sleep=0.1) # Sequential animation of type B events during the first year animate(imdepiB, interval=c(0,365), time.spacing=NULL, sleep=0.1) # Animate the whole time range but with nmax=20 snapshots only animate(imdepiB, time.spacing=NA, nmax=20, sleep=0.1) ## End(Not run) # Such an animation can be saved in various ways using the tools of # the animation package, e.g., saveHTML() if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(imdepiB, interval = c(0,365), time.spacing = 7), nmax = Inf, interval = 0.2, loop = FALSE, title = "Animation of the first year of type B events") setwd(oldwd) }
"epidataCS"
Monte Carlo tests for space-time interaction (epitest
)
use the distribution of some test statistic under the null hypothesis
of no space-time interaction. For this purpose, the function
permute.epidataCS
randomly permutes the time or space labels of
the events.
permute.epidataCS(x, what = c("time", "space"), keep)
permute.epidataCS(x, what = c("time", "space"), keep)
x |
an object of class |
what |
character string determining what to permute: time points (default) or locations. |
keep |
optional logical expression to be evaluated in the context
of |
the permuted "epidataCS"
object.
Sebastian Meyer
data("imdepi") set.seed(3) permepi <- permute.epidataCS(imdepi, what = "time", keep = time <= 30) print(imdepi, n = 8) print(permepi, n = 8) ## the first 6 events are kept (as are all row.names), ## the time labels of the remaining events are shuffled ## (and events then again sorted by time), ## the marginal temporal distribution is unchanged
data("imdepi") set.seed(3) permepi <- permute.epidataCS(imdepi, what = "time", keep = time <= 30) print(imdepi, n = 8) print(permepi, n = 8) ## the first 6 events are kept (as are all row.names), ## the time labels of the remaining events are shuffled ## (and events then again sorted by time), ## the marginal temporal distribution is unchanged
The plot
method for class "epidataCS"
either plots the
number of events along the time axis (epidataCSplot_time
) as a
hist()
, or the locations of the events in the observation region
W
(epidataCSplot_space
).
The spatial plot can be enriched with tile-specific color levels to
indicate attributes such as the population (using spplot
).
## S3 method for class 'epidataCS' plot(x, aggregate = c("time", "space"), subset, by = type, ...) epidataCSplot_time(x, subset, by = type, t0.Date = NULL, breaks = "stgrid", freq = TRUE, col = rainbow(nTypes), cumulative = list(), add = FALSE, mar = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = NULL, main = NULL, panel.first = abline(h=axTicks(2), lty=2, col="grey"), legend.types = list(), ...) epidataCSplot_space(x, subset, by = type, tiles = x$W, pop = NULL, cex.fun = sqrt, points.args = list(), add = FALSE, legend.types = list(), legend.counts = list(), sp.layout = NULL, ...)
## S3 method for class 'epidataCS' plot(x, aggregate = c("time", "space"), subset, by = type, ...) epidataCSplot_time(x, subset, by = type, t0.Date = NULL, breaks = "stgrid", freq = TRUE, col = rainbow(nTypes), cumulative = list(), add = FALSE, mar = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = NULL, main = NULL, panel.first = abline(h=axTicks(2), lty=2, col="grey"), legend.types = list(), ...) epidataCSplot_space(x, subset, by = type, tiles = x$W, pop = NULL, cex.fun = sqrt, points.args = list(), add = FALSE, legend.types = list(), legend.counts = list(), sp.layout = NULL, ...)
x |
an object of class |
aggregate |
character, one of |
subset |
logical expression indicating a subset of events to consider for
plotting: missing values are taken as false. Note that the
expression is evaluated in the data frame of event marks
( |
... |
in the basic |
by |
an expression evaluated in |
t0.Date |
the beginning of the observation period
|
breaks |
a specification of the histogram break points, see
|
freq |
see |
col |
fill colour for the bars of the histogram, defaults to
the vector of |
cumulative |
if a list (of style options),
lines for the cumulative number of events (per type) will be
added to the plot. Possible options are |
add |
logical (default: |
mar |
see |
xlim , ylim
|
|
xlab , ylab
|
axis labels (with sensible defaults). |
main |
main title of the plot (defaults to no title). |
panel.first |
expression that should be evaluated after the plotting window has been set up but before the histogram is plotted. Defaults to adding horizontal grid lines. |
legend.types |
if a list (of arguments for |
tiles |
the observation region |
pop |
if |
cex.fun |
function which takes a vector of counts of events
at each unique location and returns a (vector of) |
points.args |
a list of (type-specific) graphical parameters
for |
legend.counts |
if a list (of arguments for
|
sp.layout |
optional list of additional layout items in case
|
For aggregate="time"
(i.e., epidataCSplot_time
) the data
of the histogram (as returned by hist
),
and for aggregate="space"
(i.e., epidataCSplot_space
)
NULL
, invisibly, or the trellis.object
generated by
spplot
(if pop
is non-NULL
).
Sebastian Meyer
data("imdepi") ## show the occurrence of events along time plot(imdepi, "time", main = "Histogram of event time points") plot(imdepi, "time", by = NULL, main = "Aggregated over both event types") ## show the distribution in space plot(imdepi, "space", lwd = 2, col = "lavender") ## with the district-specific population density in the background, ## a scale bar, and customized point style load(system.file("shapes", "districtsD.RData", package = "surveillance")) districtsD$log10popdens <- log10(districtsD$POPULATION/districtsD$AREA) keylabels <- (c(1,2,5) * rep(10^(1:3), each=3))[-1] plot(imdepi, "space", tiles = districtsD, pop = "log10popdens", ## modify point style for better visibility on gray background points.args = list(pch=c(1,3), col=c("orangered","blue"), lwd=2), ## metric scale bar, see proj4string(imdepi$W) sp.layout = layout.scalebar(imdepi$W, scale=100, labels=c("0","100 km")), ## gray scale for the population density and white borders col.regions = gray.colors(100, start=0.9, end=0.1), col = "white", ## color key is equidistant on log10(popdens) scale at = seq(1.3, 3.7, by=0.05), colorkey = list(labels=list(at=log10(keylabels), labels=keylabels), title=expression("Population density per " * km^2)))
data("imdepi") ## show the occurrence of events along time plot(imdepi, "time", main = "Histogram of event time points") plot(imdepi, "time", by = NULL, main = "Aggregated over both event types") ## show the distribution in space plot(imdepi, "space", lwd = 2, col = "lavender") ## with the district-specific population density in the background, ## a scale bar, and customized point style load(system.file("shapes", "districtsD.RData", package = "surveillance")) districtsD$log10popdens <- log10(districtsD$POPULATION/districtsD$AREA) keylabels <- (c(1,2,5) * rep(10^(1:3), each=3))[-1] plot(imdepi, "space", tiles = districtsD, pop = "log10popdens", ## modify point style for better visibility on gray background points.args = list(pch=c(1,3), col=c("orangered","blue"), lwd=2), ## metric scale bar, see proj4string(imdepi$W) sp.layout = layout.scalebar(imdepi$W, scale=100, labels=c("0","100 km")), ## gray scale for the population density and white borders col.regions = gray.colors(100, start=0.9, end=0.1), col = "white", ## color key is equidistant on log10(popdens) scale at = seq(1.3, 3.7, by=0.05), colorkey = list(labels=list(at=log10(keylabels), labels=keylabels), title=expression("Population density per " * km^2)))
"epidataCS"
The update
method for the "epidataCS"
class
may be used to modify the hyperparameters (
eps.t
)
and (
eps.s
), the indicator matrix qmatrix
determining
possible transmission between the event types, the numerical
accuracy nCircle2Poly
of the polygonal approximation, and
the endemic covariates from stgrid
(including the time intervals).
The update method will also update the auxiliary information contained
in an "epidataCS"
object accordingly, e.g., the vector of potential
sources of each event, the influence regions, or the endemic covariates
copied from the new stgrid
.
## S3 method for class 'epidataCS' update(object, eps.t, eps.s, qmatrix, nCircle2Poly, stgrid, ...)
## S3 method for class 'epidataCS' update(object, eps.t, eps.s, qmatrix, nCircle2Poly, stgrid, ...)
object |
an object of class |
eps.t |
numeric vector of length 1 or corresponding to the number of events in
|
eps.s |
numeric vector of length 1 or corresponding to the number of events in
|
qmatrix |
square indicator matrix (0/1 or TRUE/FALSE) for possible transmission between the event types. |
nCircle2Poly |
accuracy (number of edges) of the polygonal approximation of a circle. |
stgrid |
a new |
... |
unused (argument of the generic). |
The updated "epidataCS"
object.
Sebastian Meyer
class "epidataCS"
.
data("imdepi") ## assume different interaction ranges and simplify polygons imdepi2 <- update(imdepi, eps.t = 20, eps.s = Inf, nCircle2Poly = 16) (s <- summary(imdepi)) (s2 <- summary(imdepi2)) ## The update reduced the number of infectives (along time) ## because the length of the infectious periods is reduced. It also ## changed the set of potential sources of transmission for each ## event, since the interaction is shorter in time but wider in space ## (eps.s=Inf means interaction over the whole observation region). ## use a time-constant grid imdepi3 <- update(imdepi, stgrid = subset(imdepi$stgrid, BLOCK == 1, -stop)) (s3 <- summary(imdepi3)) # "1 time block"
data("imdepi") ## assume different interaction ranges and simplify polygons imdepi2 <- update(imdepi, eps.t = 20, eps.s = Inf, nCircle2Poly = 16) (s <- summary(imdepi)) (s2 <- summary(imdepi2)) ## The update reduced the number of infectives (along time) ## because the length of the infectious periods is reduced. It also ## changed the set of potential sources of transmission for each ## event, since the interaction is shorter in time but wider in space ## (eps.s=Inf means interaction over the whole observation region). ## use a time-constant grid imdepi3 <- update(imdepi, stgrid = subset(imdepi$stgrid, BLOCK == 1, -stop)) (s3 <- summary(imdepi3)) # "1 time block"
The fanplot()
function in surveillance wraps functionality of
the dedicated fanplot package, employing a different default style
and optionally adding point predictions and observed values.
fanplot(quantiles, probs, means = NULL, observed = NULL, start = 1, fan.args = list(), means.args = list(), observed.args = list(), key.args = NULL, xlim = NULL, ylim = NULL, log = "", xlab = "Time", ylab = "No. infected", add = FALSE, ...)
fanplot(quantiles, probs, means = NULL, observed = NULL, start = 1, fan.args = list(), means.args = list(), observed.args = list(), key.args = NULL, xlim = NULL, ylim = NULL, log = "", xlab = "Time", ylab = "No. infected", add = FALSE, ...)
quantiles |
a time x |
probs |
numeric vector of probabilities with values between 0 and 1. |
means |
(optional) numeric vector of point forecasts. |
observed |
(optional) numeric vector of observed values. |
start |
time index (x-coordinate) of the first prediction. |
fan.args |
a list of graphical parameters for the |
means.args |
a list of graphical parameters for |
observed.args |
a list of graphical parameters for |
key.args |
if a list, a color key (in |
xlim , ylim
|
axis ranges. |
log |
a character string specifying which axes are to be logarithmic,
e.g., |
xlab , ylab
|
axis labels. |
add |
logical indicating if the fan plot should be added to an existing plot. |
... |
further arguments are passed to |
NULL
(invisibly), with the side effect of drawing a fan chart.
Sebastian Meyer
the underlying fan
function in package
fanplot.
The function is used in plot.oneStepAhead
and
plot.hhh4sims
.
## artificial data example to illustrate the graphical options if (requireNamespace("fanplot")) { means <- c(18, 19, 20, 25, 26, 35, 34, 25, 19) y <- rlnorm(length(means), log(means), 0.5) quantiles <- sapply(1:99/100, qlnorm, log(means), seq(.5,.8,length.out=length(means))) ## default style with point predictions, color key and log-scale fanplot(quantiles = quantiles, probs = 1:99/100, means = means, observed = y, key.args = list(start = 1, space = .3), log = "y") ## with contour lines instead of a key, and different colors pal <- colorRampPalette(c("darkgreen", "gray93")) fanplot(quantiles = quantiles, probs = 1:99/100, observed = y, fan.args = list(fan.col = pal, ln = c(5,10,25,50,75,90,95)/100), observed.args = list(type = "b", pch = 19)) }
## artificial data example to illustrate the graphical options if (requireNamespace("fanplot")) { means <- c(18, 19, 20, 25, 26, 35, 34, 25, 19) y <- rlnorm(length(means), log(means), 0.5) quantiles <- sapply(1:99/100, qlnorm, log(means), seq(.5,.8,length.out=length(means))) ## default style with point predictions, color key and log-scale fanplot(quantiles = quantiles, probs = 1:99/100, means = means, observed = y, key.args = list(start = 1, space = .3), log = "y") ## with contour lines instead of a key, and different colors pal <- colorRampPalette(c("darkgreen", "gray93")) fanplot(quantiles = quantiles, probs = 1:99/100, observed = y, fan.args = list(fan.col = pal, ln = c(5,10,25,50,75,90,95)/100), observed.args = list(type = "b", pch = 19)) }
The function takes range
values of the surveillance time
series sts
and for each time point uses a Poisson GLM with overdispersion to
predict an upper bound on the number of counts according to the procedure by
Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed
number of counts. If the observation is above the bound, then an alarm is raised.
The implementation is illustrated in Salmon et al. (2016).
farringtonFlexible(sts, control = list( range = NULL, b = 5, w = 3, reweight = TRUE, weightsThreshold = 2.58, verbose = FALSE, glmWarnings = TRUE, alpha = 0.05, trend = TRUE, pThresholdTrend = 0.05, limit54 = c(5,4), powertrans = "2/3", fitFun = "algo.farrington.fitGLM.flexible", populationOffset = FALSE, noPeriods = 1, pastWeeksNotIncluded = NULL, thresholdMethod = "delta"))
farringtonFlexible(sts, control = list( range = NULL, b = 5, w = 3, reweight = TRUE, weightsThreshold = 2.58, verbose = FALSE, glmWarnings = TRUE, alpha = 0.05, trend = TRUE, pThresholdTrend = 0.05, limit54 = c(5,4), powertrans = "2/3", fitFun = "algo.farrington.fitGLM.flexible", populationOffset = FALSE, noPeriods = 1, pastWeeksNotIncluded = NULL, thresholdMethod = "delta"))
sts |
object of class |
control |
Control object given as a
|
The following steps are performed according to the Farrington et al. (1996) paper.
Fit of the initial model with intercept, time trend if trend
is TRUE
,
seasonal factor variable if noPeriod
is bigger than 1, and population offset if
populationOffset
is TRUE
. Initial estimation of mean and
overdispersion.
Calculation of the weights omega (correction for past outbreaks) if reweighting
is TRUE
.
The threshold for reweighting is defined in control
.
Refitting of the model
Revised estimation of overdispersion
Omission of the trend, if it is not significant
Repetition of the whole procedure
Calculation of the threshold value using the model to compute a quantile of the predictive distribution.
The method used depends on thresholdMethod
, this can either be:
One assumes that the prediction error (or a transformation of the prediction
error, depending on powertrans
), is normally distributed. The threshold is deduced from a quantile of
this normal distribution using the variance and estimate of the expected
count given by GLM, and the delta rule. The procedure takes into account both the estimation error (variance of the estimator
of the expected count in the GLM) and the prediction error (variance of the prediction error). This is the suggestion
in Farrington et al. (1996).
One assumes that the new count follows a negative binomial distribution parameterized by the expected count and the overdispersion estimated in the GLM. The threshold is deduced from a quantile of this discrete distribution. This process disregards the estimation error, though. This method was used in Noufaily, et al. (2012).
One also uses the assumption of the negative binomial sampling distribution but does not plug in the estimate of the expected count from the GLM, instead one uses a quantile from the asymptotic normal distribution of the expected count estimated in the GLM; in order to take into account both the estimation error and the prediction error.
Computation of exceedance score
Warning: monthly data containing the last day of each month as date should be analysed with epochAsDate=FALSE
in the sts
object. Otherwise February makes it impossible to find some reference time points.
An object of class sts
with the slots upperbound
and alarm
filled by appropriate output of the algorithm.
The control
slot of the input sts
is amended with the
following matrix elements, all with length(range)
rows:
Booleans indicating whether a time trend was fitted for this time point.
coefficient of the time trend in the GLM for this time point. If no trend was fitted it is equal to NA.
probability of observing a value at least equal to the observation under the null hypothesis .
expectation of the predictive distribution for each timepoint. It is only reported if the conditions for raising an alarm are met (enough cases).
input for the negative binomial distribution to get the upperbound as a quantile (either a plug-in from the GLM or a quantile from the asymptotic normal distribution of the estimator)
overdispersion of the GLM at each timepoint.
M. Salmon, M. Höhle
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi:10.18637/jss.v070.i10
algo.farrington.fitGLM
,algo.farrington.threshold
data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) ### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS control1 <- list(range=282:312, noPeriods=1, b=4, w=3, weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05, alpha=0.1) control2 <- list(range=282:312, noPeriods=10, b=4, w=3, weightsThreshold=2.58, pastWeeksNotIncluded=26, pThresholdTrend=1, alpha=0.1) salm1 <- farringtonFlexible(salm,control=control1) salm2 <- farringtonFlexible(salm,control=control2) ### PLOT THE RESULTS y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE) plot(salm1, ylim=c(0,y.max), main='S. Newport in Germany', legend.opts=NULL) lines(1:(nrow(salm1)+1)-0.5, c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]), type="s",col='tomato4',lwd=2) lines(1:(nrow(salm2)+1)-0.5, c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]), type="s",col="blueviolet",lwd=2) legend("topleft", legend=c('Alarm','Upperbound with old options', 'Upperbound with new options'), pch=c(24,NA,NA),lty=c(NA,1,1), bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))
data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) ### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS control1 <- list(range=282:312, noPeriods=1, b=4, w=3, weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05, alpha=0.1) control2 <- list(range=282:312, noPeriods=10, b=4, w=3, weightsThreshold=2.58, pastWeeksNotIncluded=26, pThresholdTrend=1, alpha=0.1) salm1 <- farringtonFlexible(salm,control=control1) salm2 <- farringtonFlexible(salm,control=control2) ### PLOT THE RESULTS y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE) plot(salm1, ylim=c(0,y.max), main='S. Newport in Germany', legend.opts=NULL) lines(1:(nrow(salm1)+1)-0.5, c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]), type="s",col='tomato4',lwd=2) lines(1:(nrow(salm2)+1)-0.5, c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]), type="s",col="blueviolet",lwd=2) legend("topleft", legend=c('Alarm','Upperbound with old options', 'Upperbound with new options'), pch=c(24,NA,NA),lty=c(NA,1,1), bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))
Given a specification of the average run length in the (a)cceptance and (r)ejected setting determine the k and h values in a standard normal setting.
find.kh(ARLa = 500, ARLr = 7, sided = "one", method = "BFGS", verbose=FALSE)
find.kh(ARLa = 500, ARLr = 7, sided = "one", method = "BFGS", verbose=FALSE)
ARLa |
average run length in acceptance setting, aka. in control state. Specifies the number of observations before false alarm. |
ARLr |
average run length in rejection state, aka. out of control state. Specifies the number of observations before an increase is detected (i.e. detection delay) |
sided |
one-sided cusum scheme |
method |
Which method to use in the function |
verbose |
gives extra information about the root finding process |
Functions from the spc package are used in a simple univariate root finding problem.
Returns a list with reference value k and decision interval h.
if (requireNamespace("spc")) { find.kh(ARLa=500,ARLr=7,sided="one") find.kh(ARLa=500,ARLr=3,sided="one") }
if (requireNamespace("spc")) { find.kh(ARLa=500,ARLr=7,sided="one") find.kh(ARLa=500,ARLr=3,sided="one") }
Function to find a decision interval h
* for given reference value k
and desired ARL so that the
average run length for a Poisson or Binomial CUSUM with in-control
parameter
, reference value
k
and is approximately ,
i.e.
,
or larger, i.e.
.
findH(ARL0, theta0, s = 1, rel.tol = 0.03, roundK = TRUE, distr = c("poisson", "binomial"), digits = 1, FIR = FALSE, ...) hValues(theta0, ARL0, rel.tol=0.02, s = 1, roundK = TRUE, digits = 1, distr = c("poisson", "binomial"), FIR = FALSE, ...)
findH(ARL0, theta0, s = 1, rel.tol = 0.03, roundK = TRUE, distr = c("poisson", "binomial"), digits = 1, FIR = FALSE, ...) hValues(theta0, ARL0, rel.tol=0.02, s = 1, roundK = TRUE, digits = 1, distr = c("poisson", "binomial"), FIR = FALSE, ...)
ARL0 |
desired in-control ARL |
theta0 |
in-control parameter |
s |
change to detect, see details |
distr |
|
rel.tol |
relative tolerance, i.e. the search for |
digits |
the reference value |
roundK |
passed to |
FIR |
if |
... |
further arguments for the distribution function, i.e. number
of trials |
The out-of-control parameter used to determine the reference value k
is specified as:
for a Poisson variate
for a Binomial variate
findH
returns a vector and hValues
returns a matrix with elements
theta0 |
in-control parameter |
h |
decision interval |
k |
reference value |
ARL |
ARL for a CUSUM with parameters |
rel.tol |
corresponds to |
Calculates the reference value k
for a Poisson or binomial CUSUM
designed to detect a shift from to
findK(theta0, theta1, distr = c("poisson", "binomial"), roundK = FALSE, digits = 1, ...)
findK(theta0, theta1, distr = c("poisson", "binomial"), roundK = FALSE, digits = 1, ...)
theta0 |
in-control parameter |
theta1 |
out-of-control parameter |
distr |
|
digits |
the reference value |
roundK |
For discrete data and rational reference value there is only
a limited set of possible values that the CUSUM can take (and
therefore there is also only a limited set of ARLs).
If |
... |
further arguments for the distribution function, i.e. number of
trials |
Returns reference value k
.
Weekly number of influenza A & B cases in the 140 districts of the two Southern German states Bavaria and Baden-Wuerttemberg, for the years 2001 to 2008. These surveillance data have been analyzed originally by Paul and Held (2011) and more recently by Meyer and Held (2014).
data(fluBYBW)
data(fluBYBW)
An sts
object containing
observations starting from week 1 in 2001.
The population
slot contains the population fractions
of each district at 31.12.2001, obtained from the Federal Statistical
Office of Germany.
The map
slot contains an object of class
"SpatialPolygonsDataFrame"
.
Prior to surveillance version 1.6-0, data(fluBYBW)
contained a redundant last row (417) filled with zeroes only.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 6 March 2009.
Paul, M. and Held, L. (2011) Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118-1136.
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
data("fluBYBW") # Count time series plot plot(fluBYBW, type = observed ~ time) # Map of disease incidence (per 100000 inhabitants) for the year 2001 plot(fluBYBW, type = observed ~ unit, tps = 1:52, total.args = list(), population = fluBYBW@map$X31_12_01 / 100000) # the overall rate for 2001 shown in the bottom right corner is sum(observed(fluBYBW[1:52,])) / sum(fluBYBW@map$X31_12_01) * 100000 ## Not run: # Generating an animation takes a while. # Here we take the first 20 weeks of 2001 (runtime: ~3 minutes). # The full animation is available in Supplement A of Meyer and Held (2014) if (require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(fluBYBW, tps = 1:20), title="Evolution of influenza in Bayern and Baden-Wuerttemberg", ani.width=500, ani.height=600) setwd(oldwd) } ## End(Not run)
data("fluBYBW") # Count time series plot plot(fluBYBW, type = observed ~ time) # Map of disease incidence (per 100000 inhabitants) for the year 2001 plot(fluBYBW, type = observed ~ unit, tps = 1:52, total.args = list(), population = fluBYBW@map$X31_12_01 / 100000) # the overall rate for 2001 shown in the bottom right corner is sum(observed(fluBYBW[1:52,])) / sum(fluBYBW@map$X31_12_01) * 100000 ## Not run: # Generating an animation takes a while. # Here we take the first 20 weeks of 2001 (runtime: ~3 minutes). # The full animation is available in Supplement A of Meyer and Held (2014) if (require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(fluBYBW, tps = 1:20), title="Evolution of influenza in Bayern and Baden-Wuerttemberg", ani.width=500, ani.height=600) setwd(oldwd) } ## End(Not run)
An extension of format.Date
with additional formatting
strings for quarters. Used by linelist2sts
.
formatDate(x, format)
formatDate(x, format)
x |
a |
format |
a character string, see
|
a character vector representing the input date(s) x
following the format
specification.
formatDate(as.Date("2021-10-13"), "%G/%OQ/%q")
formatDate(as.Date("2021-10-13"), "%G/%OQ/%q")
Just yapf – yet another p-value formatter...
It is a wrapper around format.pval
,
such that by default eps = 1e-4
, scientific = FALSE
,
digits = if (p<10*eps) 1 else 2
, and nsmall = 2
.
formatPval(pv, eps = 1e-4, scientific = FALSE, ...)
formatPval(pv, eps = 1e-4, scientific = FALSE, ...)
pv |
a numeric vector (of p-values). |
eps |
a numerical tolerance, see |
scientific |
see |
... |
further arguments passed to |
The character vector of formatted p-values.
formatPval(c(0.9, 0.13567, 0.0432, 0.000546, 1e-8))
formatPval(c(0.9, 0.13567, 0.0432, 0.000546, 1e-8))
twinstim
as a Poisson-glm
An endemic-only twinstim
is equivalent to a Poisson
regression model for the aggregated number of events,
, by time-space-type cell. The rate of the
corresponding Poisson distribution is
,
where
is a multiplicative
offset. Thus, the
glm
function can be used to fit
an endemic-only twinstim
. However, wrapping in glm
is
usually slower.
glm_epidataCS(formula, data, ...)
glm_epidataCS(formula, data, ...)
formula |
an endemic model formula without response, comprising variables of
|
data |
an object of class |
... |
arguments passed to |
a glm
Sebastian Meyer
data("imdepi", "imdepifit") ## Fit an endemic-only twinstim() and an equivalent model wrapped in glm() fit_twinstim <- update(imdepifit, epidemic = ~0, siaf = NULL, subset = NULL, optim.args=list(control=list(trace=0)), verbose=FALSE) fit_glm <- glm_epidataCS(formula(fit_twinstim)$endemic, data = imdepi) ## Compare the coefficients cbind(twinstim = coef(fit_twinstim), glm = coef(fit_glm)) ### also compare to an equivalent endemic-only hhh4() fit ## first need to aggregate imdepi into an "sts" object load(system.file("shapes", "districtsD.RData", package="surveillance")) imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, tiles = districtsD, popcol.stgrid = "popdensity") ## determine the correct offset to get an equivalent model offset <- 2 * rep(with(subset(imdepi$stgrid, !duplicated(BLOCK)), stop - start), ncol(imdsts)) * sum(districtsD$POPULATION) * population(imdsts) ## fit the model using hhh4() fit_hhh4 <- hhh4(imdsts, control = list( end = list( f = addSeason2formula(~I(start/365-3.5), period=365, timevar="start"), offset = offset ), family = "Poisson", subset = 1:nrow(imdsts), data = list(start=with(subset(imdepi$stgrid, !duplicated(BLOCK)), start)))) summary(fit_hhh4) stopifnot(all.equal(coef(fit_hhh4), coef(fit_glm), check.attributes=FALSE))
data("imdepi", "imdepifit") ## Fit an endemic-only twinstim() and an equivalent model wrapped in glm() fit_twinstim <- update(imdepifit, epidemic = ~0, siaf = NULL, subset = NULL, optim.args=list(control=list(trace=0)), verbose=FALSE) fit_glm <- glm_epidataCS(formula(fit_twinstim)$endemic, data = imdepi) ## Compare the coefficients cbind(twinstim = coef(fit_twinstim), glm = coef(fit_glm)) ### also compare to an equivalent endemic-only hhh4() fit ## first need to aggregate imdepi into an "sts" object load(system.file("shapes", "districtsD.RData", package="surveillance")) imdsts <- epidataCS2sts(imdepi, freq = 12, start = c(2002, 1), neighbourhood = NULL, tiles = districtsD, popcol.stgrid = "popdensity") ## determine the correct offset to get an equivalent model offset <- 2 * rep(with(subset(imdepi$stgrid, !duplicated(BLOCK)), stop - start), ncol(imdsts)) * sum(districtsD$POPULATION) * population(imdsts) ## fit the model using hhh4() fit_hhh4 <- hhh4(imdsts, control = list( end = list( f = addSeason2formula(~I(start/365-3.5), period=365, timevar="start"), offset = offset ), family = "Poisson", subset = 1:nrow(imdsts), data = list(start=with(subset(imdepi$stgrid, !duplicated(BLOCK)), start)))) summary(fit_hhh4) stopifnot(all.equal(coef(fit_hhh4), coef(fit_glm), check.attributes=FALSE))
Number of Hepatitis A cases among adult (age>18) males in Berlin, 2001-2006. An increase is seen during 2006.
data("ha") data("ha.sts")
data("ha") data("ha.sts")
ha
is a disProg
object containing
observations starting from week 1 in 2001 to week 30 in 2006.
ha.sts
was generated from ha
via the converter function
disProg2sts
and includes a map of Berlin's districts.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 25 August 2006.
Robert Koch Institut, Epidemiologisches Bulletin 33/2006, p.290.
## deprecated "disProg" object data("ha") ha plot(aggregate(ha)) ## new-style "sts" object data("ha.sts") ha.sts plot(ha.sts, type = observed ~ time) # = plot(aggregate(ha.sts, by = "unit")) plot(ha.sts, type = observed ~ unit, labels = TRUE)
## deprecated "disProg" object data("ha") ha plot(aggregate(ha)) ## new-style "sts" object data("ha.sts") ha.sts plot(ha.sts, type = observed ~ time) # = plot(aggregate(ha.sts, by = "unit")) plot(ha.sts, type = observed ~ unit, labels = TRUE)
Data on the 188 cases in the measles outbreak among children in the
German city of Hagelloch (near Tübingen) 1861. The data were
originally collected by Dr. Albert Pfeilsticker (1863) and augmented and
re-analysed by Dr. Heike Oesterle (1992).
This dataset is used to illustrate the twinSIR
model class in
vignette("twinSIR")
.
data("hagelloch")
data("hagelloch")
Loading data("hagelloch")
gives two objects:
hagelloch
and hagelloch.df
.
The latter is the original data.frame
of 188 rows
with individual information for each infected child.
hagelloch
has been generated from hagelloch.df
via as.epidata
(see the Examples below) to obtain an
"epidata"
object for use with twinSIR
.
It contains the entire SIR event history of the outbreak
(but not all of the covariates).
The covariate information in hagelloch.df
is as follows:
patient number
patient name (as a factor)
family index
house number
age in years
gender of the individual (factor: male, female)
Date
of prodromes
Date
of rash
class (factor: preschool, 1st class, 2nd class)
Date
of death (with missings)
number of patient who is the putative source of infection (0 = unknown)
serial interval = number of days between dates of prodromes of infection source and infected person
complications (factor: no complications, bronchopneumonia, severe bronchitis, lobar pneumonia, pseudocroup, cerebral edema)
duration of prodromes in days
number of cases in family
number of initial cases
generation number of the case
day of max. fever (days after rush)
max. fever (degree Celsius)
x coordinate of house (in meters). Scaling in metres is obtained by multiplying the original coordinates by 2.5 (see details in Neal and Roberts (2004))
y coordinate of house (in meters). See also the above
description of x.loc
.
Time of prodromes (first symptoms) in days after the start of the epidemic (30 Oct 1861).
Time upon which the rash first appears.
Time of death, if available.
Time at which the infectious period of the individual is assumed to end. This unknown time is calculated as
where – as in Section 3.1 of Neal and Roberts (2004) – we use
.
Time at which the individual is assumed to become infectious. Actually this time is unknown, but we use
where as in Neal and Roberts (2004).
The time variables describe the transitions of the individual in an Susceptible-Infectious-Recovered (SIR) model. Note that in order to avoid ties in the event times resulting from daily interval censoring, the times have been jittered uniformly within the respective day. The time point 0.5 would correspond to noon of 30 Oct 1861.
The hagelloch
"epidata"
object only retains some of
the above covariates to save space. Apart from the usual
"epidata"
event columns, hagelloch
contains a number of
extra variables representing distance- and covariate-based weights for
the force of infection:
the number of currently infectious children in the same household (including the child itself if it is currently infectious).
the number of currently infectious children outside the household.
the number of children infectious during the respective time block and being members of class 1 and 2, respectively; but the value is 0 if the individual of the row is not herself a member of the respective class.
Such epidemic covariates can been computed by specifying suitable
f
and w
arguments in as.epidata
at
conversion (see the code below), or at a later step via the
update
-method for "epidata"
.
Thanks to Peter J. Neal, University of Manchester, for providing us with these data, which he again became from Niels Becker, Australian National University. To cite the data, the main references are Pfeilsticker (1863) and Oesterle (1992).
Pfeilsticker, A. (1863). Beiträge zur Pathologie der Masern mit besonderer Berücksichtigung der statistischen Verhältnisse, M.D. Thesis, Eberhard-Karls-Universität Tübingen. Available as https://archive.org/details/beitrgezurpatho00pfeigoog.
Oesterle, H. (1992). Statistische Reanalyse einer Masernepidemie 1861 in Hagelloch, M.D. Thesis, Eberhard-Karls-Universitäat Tübingen.
Neal, P. J. and Roberts, G. O (2004). Statistical inference and model selection for the 1861 Hagelloch measles epidemic, Biostatistics 5(2):249-261
data class: epidata
point process model: twinSIR
illustration with hagelloch
: vignette("twinSIR")
data("hagelloch") head(hagelloch.df) # original data documented in Oesterle (1992) head(as.data.frame(hagelloch)) # "epidata" event history format ## How the "epidata" 'hagelloch' was created from 'hagelloch.df' stopifnot(all.equal(hagelloch, as.epidata( hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ), keep.cols = c("SEX", "AGE", "CL")) )) ### Basic plots produced from hagelloch.df # Show case locations as in Neal & Roberts (different scaling) using # the data.frame (promoted to a SpatialPointsDataFrame) coordinates(hagelloch.df) <- c("x.loc","y.loc") plot(hagelloch.df, xlab="x [m]", ylab="x [m]", pch=15, axes=TRUE, cex=sqrt(multiplicity(hagelloch.df))) # Epicurve hist(as.numeric(hagelloch.df$tI), xlab="Time (days)", ylab="Cases", main="") ### "epidata" summary and plot methods (s <- summary(hagelloch)) head(s$byID) plot(s) ## Not run: # Show a dynamic illustration of the spread of the infection animate(hagelloch, time.spacing=0.1, sleep=1/100, legend.opts=list(x="topleft")) ## End(Not run)
data("hagelloch") head(hagelloch.df) # original data documented in Oesterle (1992) head(as.data.frame(hagelloch)) # "epidata" event history format ## How the "epidata" 'hagelloch' was created from 'hagelloch.df' stopifnot(all.equal(hagelloch, as.epidata( hagelloch.df, t0 = 0, tI.col = "tI", tR.col = "tR", id.col = "PN", coords.cols = c("x.loc", "y.loc"), f = list( household = function(u) u == 0, nothousehold = function(u) u > 0 ), w = list( c1 = function (CL.i, CL.j) CL.i == "1st class" & CL.j == CL.i, c2 = function (CL.i, CL.j) CL.i == "2nd class" & CL.j == CL.i ), keep.cols = c("SEX", "AGE", "CL")) )) ### Basic plots produced from hagelloch.df # Show case locations as in Neal & Roberts (different scaling) using # the data.frame (promoted to a SpatialPointsDataFrame) coordinates(hagelloch.df) <- c("x.loc","y.loc") plot(hagelloch.df, xlab="x [m]", ylab="x [m]", pch=15, axes=TRUE, cex=sqrt(multiplicity(hagelloch.df))) # Epicurve hist(as.numeric(hagelloch.df$tI), xlab="Time (days)", ylab="Cases", main="") ### "epidata" summary and plot methods (s <- summary(hagelloch)) head(s$byID) plot(s) ## Not run: # Show a dynamic illustration of the spread of the infection animate(hagelloch, time.spacing=0.1, sleep=1/100, legend.opts=list(x="topleft")) ## End(Not run)
Weekly number of reported hepatitis A infections in Germany 2001-2004.
data(hepatitisA)
data(hepatitisA)
A disProg
object containing
observations starting from week 1 in 2001 to week 52 in 2004.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 11-01-2005.
data(hepatitisA) plot(hepatitisA)
data(hepatitisA) plot(hepatitisA)
Fits an autoregressive Poisson or negative binomial model
to a univariate or multivariate time series of counts.
The characteristic feature of hhh4
models is the additive
decomposition of the conditional mean into epidemic and
endemic components (Held et al, 2005).
Log-linear predictors of covariates and random intercepts are allowed
in all components; see the Details below.
A general introduction to the hhh4
modelling approach and its
implementation is given in the vignette("hhh4")
. Meyer et al
(2017, Section 5, available as vignette("hhh4_spacetime")
)
describe hhh4
models for areal time series of infectious
disease counts.
hhh4(stsObj, control = list( ar = list(f = ~ -1, offset = 1, lag = 1), ne = list(f = ~ -1, offset = 1, lag = 1, weights = neighbourhood(stsObj) == 1, scale = NULL, normalize = FALSE), end = list(f = ~ 1, offset = 1), family = c("Poisson", "NegBin1", "NegBinM"), subset = 2:nrow(stsObj), optimizer = list(stop = list(tol=1e-5, niter=100), regression = list(method="nlminb"), variance = list(method="nlminb")), verbose = FALSE, start = list(fixed=NULL, random=NULL, sd.corr=NULL), data = list(t = stsObj@epoch - min(stsObj@epoch)), keep.terms = FALSE ), check.analyticals = FALSE)
hhh4(stsObj, control = list( ar = list(f = ~ -1, offset = 1, lag = 1), ne = list(f = ~ -1, offset = 1, lag = 1, weights = neighbourhood(stsObj) == 1, scale = NULL, normalize = FALSE), end = list(f = ~ 1, offset = 1), family = c("Poisson", "NegBin1", "NegBinM"), subset = 2:nrow(stsObj), optimizer = list(stop = list(tol=1e-5, niter=100), regression = list(method="nlminb"), variance = list(method="nlminb")), verbose = FALSE, start = list(fixed=NULL, random=NULL, sd.corr=NULL), data = list(t = stsObj@epoch - min(stsObj@epoch)), keep.terms = FALSE ), check.analyticals = FALSE)
stsObj |
object of class |
control |
a list containing the model specification and control arguments:
The auxiliary function |
check.analyticals |
logical (or a subset of
|
An endemic-epidemic multivariate time-series model for infectious
disease counts from units
during
periods
was proposed by Held et al (2005) and was
later extended in a series of papers (Paul et al, 2008; Paul and Held,
2011; Held and Paul, 2012; Meyer and Held, 2014).
In its most general formulation, this so-called
hhh4
(or HHH or
or triple-H) model assumes that, conditional on past
observations,
has a Poisson or negative binomial
distribution with mean
In the case of a negative binomial model, the conditional
variance is
with overdispersion parameters
(possibly shared
across different units, e.g.,
).
Univariate time series of counts
are supported as well, in
which case
hhh4
can be regarded as an extension of
glm.nb
to account for autoregression.
See the Examples below for a comparison of an endemic-only
hhh4
model with a corresponding glm.nb
.
The three unknown quantities of the mean ,
in the autoregressive (
ar
) component,
in the neighbour-driven (
ne
) component, and
in the endemic (
end
) component,
are log-linear predictors incorporating time-/unit-specific
covariates. They may also contain unit-specific random intercepts
as proposed by Paul and Held (2011). The endemic mean is usually
modelled proportional to a unit-specific offset
(e.g., population numbers or fractions); it is possible to include
such multiplicative offsets in the epidemic components as well.
The
are transmission weights reflecting the flow of
infections from unit
to unit
. If weights vary over time
(prespecified as a 3-dimensional array
), the
ne
sum in the mean uses .
In spatial
hhh4
applications, the “units” refer to
geographical regions and the weights could be derived from movement
network data. Alternatively, the weights can be
estimated parametrically as a function of adjacency order (Meyer and
Held, 2014), see
W_powerlaw
.
(Penalized) Likelihood inference for such hhh4
models has been
established by Paul and Held (2011) with extensions for parametric
neighbourhood weights by Meyer and Held (2014).
Supplied with the analytical score function and Fisher information,
the function hhh4
by default uses the quasi-Newton algorithm
available through nlminb
to maximize the log-likelihood.
Convergence is usually fast even for a large number of parameters.
If the model contains random effects, the penalized and marginal
log-likelihoods are maximized alternately until convergence.
hhh4
returns an object of class "hhh4"
,
which is a list containing the following components:
coefficients |
named vector with estimated (regression) parameters of the model |
se |
estimated standard errors (for regression parameters) |
cov |
covariance matrix (for regression parameters) |
Sigma |
estimated variance-covariance matrix of random effects |
Sigma.orig |
estimated variance parameters on internal scale used for optimization |
Sigma.cov |
inverse of marginal Fisher information (on internal
scale), i.e., the asymptotic covariance matrix of |
call |
the matched call |
dim |
vector with number of fixed and random effects in the model |
loglikelihood |
(penalized) loglikelihood evaluated at the MLE |
margll |
(approximate) log marginal likelihood should the model contain random effects |
convergence |
logical. Did optimizer converge? |
fitted.values |
fitted mean values |
control |
control object of the fit |
terms |
the terms object used in the fit if |
stsObj |
the supplied |
lags |
named integer vector of length two containing the lags
used for the epidemic components |
nObs |
number of observations used for fitting the model |
nTime |
number of time points used for fitting the model |
nUnit |
number of units (e.g. areas) used for fitting the model |
runtime |
the |
Michaela Paul, Sebastian Meyer, Leonhard Held
Held, L., Höhle, M. and Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5 (3), 187-199. doi:10.1191/1471082X05st098oa
Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27 (29), 6250-6267. doi:10.1002/sim.4177
Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177
Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54 (6), 824-843. doi:10.1002/bimj.201200037
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
See the special functions fe
, ri
and the
examples below for how to specify unit-specific effects.
Further details on the modelling approach and illustrations of its
implementation can be found in vignette("hhh4")
and
vignette("hhh4_spacetime")
.
###################### ## Univariate examples ###################### ### weekly counts of salmonella agona cases, UK, 1990-1995 data("salmonella.agona") ## convert old "disProg" to new "sts" data class salmonella <- disProg2sts(salmonella.agona) salmonella plot(salmonella) ## generate formula for an (endemic) time trend and seasonality f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52) f.end ## specify a simple autoregressive negative binomial model model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit this model to the data res <- hhh4(salmonella, model1) ## summarize the model fit summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE) plot(res) plot(res, type = "season", components = "end") ### weekly counts of meningococcal infections, Germany, 2001-2006 data("influMen") fluMen <- disProg2sts(influMen) meningo <- fluMen[, "meningococcus"] meningo plot(meningo) ## again a simple autoregressive NegBin model with endemic seasonality meningoFit <- hhh4(stsObj = meningo, control = list( ar = list(f = ~1), end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)), family = "NegBin1" )) summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(meningoFit) plot(meningoFit, type = "season", components = "end") ######################## ## Multivariate examples ######################## ### bivariate analysis of influenza and meningococcal infections ### (see Paul et al, 2008) plot(fluMen, same.scale = FALSE) ## Fit a negative binomial model with ## - autoregressive component: disease-specific intercepts ## - neighbour-driven component: only transmission from flu to men ## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively ## - disease-specific overdispersion WfluMen <- neighbourhood(fluMen) WfluMen["meningococcus","influenza"] <- 0 WfluMen f.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)), S = c(3, 1), period = 52) f.end_fluMen fluMenFit <- hhh4(fluMen, control = list( ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)), ne = list(f = ~ 1, weights = WfluMen), end = list(f = f.end_fluMen), family = "NegBinM")) summary(fluMenFit, idx2Exp=1:3) plot(fluMenFit, type = "season", components = "end", unit = 1) plot(fluMenFit, type = "season", components = "end", unit = 2) ### weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") measlesWeserEms plot(measlesWeserEms) # note the two districts with zero cases ## we could fit the same simple model as for the salmonella cases above model1 <- list( ar = list(f = ~1), end = list(f = addSeason2formula(~1 + t, period = 52)), family = "NegBin1" ) measlesFit <- hhh4(measlesWeserEms, model1) summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) ## but we should probably at least use a population offset in the endemic ## component to reflect heterogeneous incidence levels of the districts, ## and account for spatial dependence (here just using first-order adjacency) measlesFit2 <- update(measlesFit, end = list(offset = population(measlesWeserEms)), ne = list(f = ~1, weights = neighbourhood(measlesWeserEms) == 1)) summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(measlesFit2, units = NULL, hide0s = TRUE) ## 'measlesFit2' corresponds to the 'measlesFit_basic' model in ## vignette("hhh4_spacetime"). See there for further analyses, ## including vaccination coverage as a covariate, ## spatial power-law weights, and random intercepts. ## Not run: ### last but not least, a more sophisticated (and time-consuming) ### analysis of weekly counts of influenza from 140 districts in ### Southern Germany (originally analysed by Paul and Held, 2011, ### and revisited by Held and Paul, 2012, and Meyer and Held, 2014) data("fluBYBW") plot(fluBYBW, type = observed ~ time) plot(fluBYBW, type = observed ~ unit, ## mean yearly incidence per 100.000 inhabitants (8 years) population = fluBYBW@map$X31_12_01 / 100000 * 8) ## For the full set of models for data("fluBYBW") as analysed by ## Paul and Held (2011), including predictive model assessement ## using proper scoring rules, see the (computer-intensive) ## demo("fluBYBW") script: demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance") demoscript #file.show(demoscript) ## Here we fit the improved power-law model of Meyer and Held (2014) ## - autoregressive component: random intercepts + S = 1 sine/cosine pair ## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair ## + population gravity with normalized power-law weights ## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs ## - random intercepts are iid but correlated between components f.S1 <- addSeason2formula( ~-1 + ri(type="iid", corr="all"), S = 1, period = 52) f.end.S3 <- addSeason2formula( ~-1 + ri(type="iid", corr="all") + I((t-208)/100), S = 3, period = 52) ## for power-law weights, we need adjaceny orders, which can be ## computed from the binary adjacency indicator matrix nbOrder1 <- neighbourhood(fluBYBW) neighbourhood(fluBYBW) <- nbOrder(nbOrder1) ## full model specification fluModel <- list( ar = list(f = f.S1), ne = list(f = update.formula(f.S1, ~ . + log(pop)), weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)), normalize = TRUE, log = TRUE)), end = list(f = f.end.S3, offset = population(fluBYBW)), family = "NegBin1", data = list(pop = population(fluBYBW)), optimizer = list(variance = list(method = "Nelder-Mead")), verbose = TRUE) ## CAVE: random effects considerably increase the runtime of model estimation ## (It is usually advantageous to first fit a model with simple intercepts ## to obtain reasonable start values for the other parameters.) set.seed(1) # because random intercepts are initialized randomly fluFit <- hhh4(fluBYBW, fluModel) summary(fluFit, idx2Exp = TRUE, amplitudeShift = TRUE) plot(fluFit, type = "fitted", total = TRUE) plot(fluFit, type = "season") range(plot(fluFit, type = "maxEV")) plot(fluFit, type = "maps", prop = TRUE) gridExtra::grid.arrange( grobs = lapply(c("ar", "ne", "end"), function (comp) plot(fluFit, type = "ri", component = comp, main = comp, exp = TRUE, sub = "multiplicative effect")), nrow = 1, ncol = 3) plot(fluFit, type = "neweights", xlab = "adjacency order") ## End(Not run) ######################################################################## ## An endemic-only "hhh4" model can also be estimated using MASS::glm.nb ######################################################################## ## weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") ## fit an endemic-only "hhh4" model ## with time covariates and a district-specific offset hhh4fit <- hhh4(measlesWeserEms, control = list( end = list(f = addSeason2formula(~1 + t, period = frequency(measlesWeserEms)), offset = population(measlesWeserEms)), ar = list(f = ~-1), ne = list(f = ~-1), family = "NegBin1", subset = 1:nrow(measlesWeserEms) )) summary(hhh4fit) ## fit the same model using MASS::glm.nb measlesWeserEmsData <- as.data.frame(measlesWeserEms, tidy = TRUE) measlesWeserEmsData$t <- c(hhh4fit$control$data$t) glmnbfit <- MASS::glm.nb( update(formula(hhh4fit)$end, observed ~ . + offset(log(population))), data = measlesWeserEmsData ) summary(glmnbfit) ## Note that the overdispersion parameter is parametrized inversely. ## The likelihood and point estimates are all the same. ## However, the variance estimates are different: in glm.nb, the parameters ## are estimated conditional on the overdispersion theta.
###################### ## Univariate examples ###################### ### weekly counts of salmonella agona cases, UK, 1990-1995 data("salmonella.agona") ## convert old "disProg" to new "sts" data class salmonella <- disProg2sts(salmonella.agona) salmonella plot(salmonella) ## generate formula for an (endemic) time trend and seasonality f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52) f.end ## specify a simple autoregressive negative binomial model model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit this model to the data res <- hhh4(salmonella, model1) ## summarize the model fit summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE) plot(res) plot(res, type = "season", components = "end") ### weekly counts of meningococcal infections, Germany, 2001-2006 data("influMen") fluMen <- disProg2sts(influMen) meningo <- fluMen[, "meningococcus"] meningo plot(meningo) ## again a simple autoregressive NegBin model with endemic seasonality meningoFit <- hhh4(stsObj = meningo, control = list( ar = list(f = ~1), end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)), family = "NegBin1" )) summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(meningoFit) plot(meningoFit, type = "season", components = "end") ######################## ## Multivariate examples ######################## ### bivariate analysis of influenza and meningococcal infections ### (see Paul et al, 2008) plot(fluMen, same.scale = FALSE) ## Fit a negative binomial model with ## - autoregressive component: disease-specific intercepts ## - neighbour-driven component: only transmission from flu to men ## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively ## - disease-specific overdispersion WfluMen <- neighbourhood(fluMen) WfluMen["meningococcus","influenza"] <- 0 WfluMen f.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)), S = c(3, 1), period = 52) f.end_fluMen fluMenFit <- hhh4(fluMen, control = list( ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)), ne = list(f = ~ 1, weights = WfluMen), end = list(f = f.end_fluMen), family = "NegBinM")) summary(fluMenFit, idx2Exp=1:3) plot(fluMenFit, type = "season", components = "end", unit = 1) plot(fluMenFit, type = "season", components = "end", unit = 2) ### weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") measlesWeserEms plot(measlesWeserEms) # note the two districts with zero cases ## we could fit the same simple model as for the salmonella cases above model1 <- list( ar = list(f = ~1), end = list(f = addSeason2formula(~1 + t, period = 52)), family = "NegBin1" ) measlesFit <- hhh4(measlesWeserEms, model1) summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) ## but we should probably at least use a population offset in the endemic ## component to reflect heterogeneous incidence levels of the districts, ## and account for spatial dependence (here just using first-order adjacency) measlesFit2 <- update(measlesFit, end = list(offset = population(measlesWeserEms)), ne = list(f = ~1, weights = neighbourhood(measlesWeserEms) == 1)) summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE) plot(measlesFit2, units = NULL, hide0s = TRUE) ## 'measlesFit2' corresponds to the 'measlesFit_basic' model in ## vignette("hhh4_spacetime"). See there for further analyses, ## including vaccination coverage as a covariate, ## spatial power-law weights, and random intercepts. ## Not run: ### last but not least, a more sophisticated (and time-consuming) ### analysis of weekly counts of influenza from 140 districts in ### Southern Germany (originally analysed by Paul and Held, 2011, ### and revisited by Held and Paul, 2012, and Meyer and Held, 2014) data("fluBYBW") plot(fluBYBW, type = observed ~ time) plot(fluBYBW, type = observed ~ unit, ## mean yearly incidence per 100.000 inhabitants (8 years) population = fluBYBW@map$X31_12_01 / 100000 * 8) ## For the full set of models for data("fluBYBW") as analysed by ## Paul and Held (2011), including predictive model assessement ## using proper scoring rules, see the (computer-intensive) ## demo("fluBYBW") script: demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance") demoscript #file.show(demoscript) ## Here we fit the improved power-law model of Meyer and Held (2014) ## - autoregressive component: random intercepts + S = 1 sine/cosine pair ## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair ## + population gravity with normalized power-law weights ## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs ## - random intercepts are iid but correlated between components f.S1 <- addSeason2formula( ~-1 + ri(type="iid", corr="all"), S = 1, period = 52) f.end.S3 <- addSeason2formula( ~-1 + ri(type="iid", corr="all") + I((t-208)/100), S = 3, period = 52) ## for power-law weights, we need adjaceny orders, which can be ## computed from the binary adjacency indicator matrix nbOrder1 <- neighbourhood(fluBYBW) neighbourhood(fluBYBW) <- nbOrder(nbOrder1) ## full model specification fluModel <- list( ar = list(f = f.S1), ne = list(f = update.formula(f.S1, ~ . + log(pop)), weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)), normalize = TRUE, log = TRUE)), end = list(f = f.end.S3, offset = population(fluBYBW)), family = "NegBin1", data = list(pop = population(fluBYBW)), optimizer = list(variance = list(method = "Nelder-Mead")), verbose = TRUE) ## CAVE: random effects considerably increase the runtime of model estimation ## (It is usually advantageous to first fit a model with simple intercepts ## to obtain reasonable start values for the other parameters.) set.seed(1) # because random intercepts are initialized randomly fluFit <- hhh4(fluBYBW, fluModel) summary(fluFit, idx2Exp = TRUE, amplitudeShift = TRUE) plot(fluFit, type = "fitted", total = TRUE) plot(fluFit, type = "season") range(plot(fluFit, type = "maxEV")) plot(fluFit, type = "maps", prop = TRUE) gridExtra::grid.arrange( grobs = lapply(c("ar", "ne", "end"), function (comp) plot(fluFit, type = "ri", component = comp, main = comp, exp = TRUE, sub = "multiplicative effect")), nrow = 1, ncol = 3) plot(fluFit, type = "neweights", xlab = "adjacency order") ## End(Not run) ######################################################################## ## An endemic-only "hhh4" model can also be estimated using MASS::glm.nb ######################################################################## ## weekly counts of measles, Weser-Ems region of Lower Saxony, Germany data("measlesWeserEms") ## fit an endemic-only "hhh4" model ## with time covariates and a district-specific offset hhh4fit <- hhh4(measlesWeserEms, control = list( end = list(f = addSeason2formula(~1 + t, period = frequency(measlesWeserEms)), offset = population(measlesWeserEms)), ar = list(f = ~-1), ne = list(f = ~-1), family = "NegBin1", subset = 1:nrow(measlesWeserEms) )) summary(hhh4fit) ## fit the same model using MASS::glm.nb measlesWeserEmsData <- as.data.frame(measlesWeserEms, tidy = TRUE) measlesWeserEmsData$t <- c(hhh4fit$control$data$t) glmnbfit <- MASS::glm.nb( update(formula(hhh4fit)$end, observed ~ . + offset(log(population))), data = measlesWeserEmsData ) summary(glmnbfit) ## Note that the overdispersion parameter is parametrized inversely. ## The likelihood and point estimates are all the same. ## However, the variance estimates are different: in glm.nb, the parameters ## are estimated conditional on the overdispersion theta.
The special functions fe
and ri
are used to specify
unit-specific effects of covariates and random intercept terms,
respectively, in the component formulae of hhh4
.
fe(x, unitSpecific = FALSE, which = NULL, initial = NULL) ri(type = c("iid","car"), corr = c("none", "all"), initial.fe = 0, initial.var = -.5, initial.re = NULL)
fe(x, unitSpecific = FALSE, which = NULL, initial = NULL) ri(type = c("iid","car"), corr = c("none", "all"), initial.fe = 0, initial.var = -.5, initial.re = NULL)
x |
an expression like |
unitSpecific |
logical indicating if the effect of |
which |
vector of logicals indicating which unit(s)
should get an unit-specific parameter.
For units with a |
initial |
initial values (on internal scale!)
for the fixed effects used for optimization. The default
( |
type |
random intercepts either follow an IID or a CAR model. |
corr |
whether random effects
in different components (such as |
initial.fe |
initial value for the random intercept mean. |
initial.var |
initial values (on internal scale!) for the variance components used for optimization. |
initial.re |
initial values (on internal scale!) for the random effects
used for optimization. The default |
These special functions are intended for use in component formulae of
hhh4
models and are not exported from the package namespace.
If unit-specific fixed or random intercepts are specified, an overall
intercept must be excluded (by -1
) in the component formula.
hhh4
model specifications in vignette("hhh4")
,
vignette("hhh4_spacetime")
or on the help page of
hhh4
.
"hhh4"
Objects
Besides print
and summary
methods there are also some standard
extraction methods defined for objects of class "hhh4"
resulting
from a call to hhh4
.
The implementation is illustrated in Meyer et al. (2017, Section 5),
see vignette("hhh4_spacetime")
.
## S3 method for class 'hhh4' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'hhh4' summary(object, maxEV = FALSE, ...) ## S3 method for class 'hhh4' coef(object, se = FALSE, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' fixef(object, ...) ## S3 method for class 'hhh4' ranef(object, tomatrix = FALSE, intercept = FALSE, ...) ## S3 method for class 'hhh4' coeflist(x, ...) ## S3 method for class 'hhh4' formula(x, ...) ## S3 method for class 'hhh4' nobs(object, ...) ## S3 method for class 'hhh4' logLik(object, ...) ## S3 method for class 'hhh4' vcov(object, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' confint(object, parm, level = 0.95, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' residuals(object, type = c("deviance", "pearson", "response"), ...)
## S3 method for class 'hhh4' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'hhh4' summary(object, maxEV = FALSE, ...) ## S3 method for class 'hhh4' coef(object, se = FALSE, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' fixef(object, ...) ## S3 method for class 'hhh4' ranef(object, tomatrix = FALSE, intercept = FALSE, ...) ## S3 method for class 'hhh4' coeflist(x, ...) ## S3 method for class 'hhh4' formula(x, ...) ## S3 method for class 'hhh4' nobs(object, ...) ## S3 method for class 'hhh4' logLik(object, ...) ## S3 method for class 'hhh4' vcov(object, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' confint(object, parm, level = 0.95, reparamPsi = TRUE, idx2Exp = NULL, amplitudeShift = FALSE, ...) ## S3 method for class 'hhh4' residuals(object, type = c("deviance", "pearson", "response"), ...)
x , object
|
an object of class |
digits |
the number of significant digits to use when printing parameter estimates. |
maxEV |
logical indicating if the summary should contain the
(range of the) dominant eigenvalue as a measure of the importance of
the epidemic components. By default, the value is not calculated as
this may take some seconds depending on the number of time points
and units in |
... |
For the |
reparamPsi |
logical. If |
se |
logical switch indicating if standard errors are required |
idx2Exp |
integer vector selecting the parameters
which should be returned on exp-scale.
Alternatively, |
amplitudeShift |
logical switch indicating whether the parameters
for sine/cosine terms modelling seasonal patterns
(see |
tomatrix |
logical. If |
intercept |
logical. If |
parm |
a vector of numbers or names, specifying which parameters are to be given confidence intervals. If missing, all parameters are considered. |
level |
the confidence level required. |
type |
the type of residuals which should be returned. The
alternatives are |
The coef
-method returns all estimated (regression)
parameters from a hhh4
model.
If the model includes random effects, those can be extracted with
ranef
, whereas fixef
returns the fixed parameters.
The coeflist
-method extracts the model coefficients in a list
(by parameter group).
The formula
-method returns the formulae used for the
three log-linear predictors in a list with elements "ar"
,
"ne"
, and "end"
.
The nobs
-method returns the number of observations used
for model fitting.
The logLik
-method returns an object of class
"logLik"
with "df"
and "nobs"
attributes.
For a random effects model, the value of the penalized
log-likelihood at the MLE is returned, but degrees of freedom are
not available (NA_real_
).
As a consequence, AIC
and BIC
are only
well defined for models without random effects;
otherwise these functions return NA_real_
.
The vcov
-method returns the estimated
variance-covariance matrix of the regression parameters.
The estimated variance-covariance matrix of random effects is
available as object$Sigma
.
The confint
-method returns Wald-type confidence
intervals (assuming asymptotic normality).
The residuals
-method extracts raw ("response"
) or
"deviance"
or standardized ("pearson"
)
residuals from the model fit similar to
residuals.glm
for Poisson or NegBin GLM's.
Michaela Paul and Sebastian Meyer
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
the plot
and update
methods
for fitted "hhh4"
models.
hhh4
-modelsThere are six type
s of plots for fitted hhh4
models:
Plot the "fitted"
component means (of selected units)
along time along with the observed counts.
Plot the estimated "season"
ality of the three components.
Plot the time-course of the dominant eigenvalue "maxEV"
.
If the units of the corresponding multivariate
"sts"
object represent different regions,
maps of the fitted mean components averaged over time ("maps"
),
or a map of estimated region-specific intercepts ("ri"
) of a
selected model component can be produced.
Plot the (estimated) neighbourhood weights
("neweights"
) as a function of neighbourhood order
(shortest-path distance between regions), i.e., w_ji ~ o_ji
.
Spatio-temporal "hhh4"
models and these plots are illustrated in
Meyer et al. (2017, Section 5), see vignette("hhh4_spacetime")
.
## S3 method for class 'hhh4' plot(x, type=c("fitted", "season", "maxEV", "maps", "ri", "neweights"), ...) plotHHH4_fitted(x, units = 1, names = NULL, col = c("grey85", "blue", "orange"), pch = 19, pt.cex = 0.6, pt.col = 1, par.settings = list(), legend = TRUE, legend.args = list(), legend.observed = FALSE, decompose = NULL, total = FALSE, meanHHH = NULL, ...) plotHHH4_fitted1(x, unit = 1, main = NULL, col = c("grey85", "blue", "orange"), pch = 19, pt.cex = 0.6, pt.col = 1, border = col, start = x$stsObj@start, end = NULL, xaxis = NULL, xlim = NULL, ylim = NULL, xlab = "", ylab = "No. infected", hide0s = FALSE, decompose = NULL, total = FALSE, meanHHH = NULL) plotHHH4_season(..., components = NULL, intercept = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, ylab = "", main = NULL, par.settings = list(), matplot.args = list(), legend = NULL, legend.args = list(), refline.args = list(), unit = 1, period = NULL) getMaxEV_season(x, period = frequency(x$stsObj)) plotHHH4_maxEV(..., matplot.args = list(), refline.args = list(), legend.args = list()) getMaxEV(x) plotHHH4_maps(x, which = c("mean", "endemic", "epi.own", "epi.neighbours"), prop = FALSE, main = which, zmax = NULL, col.regions = NULL, labels = FALSE, sp.layout = NULL, ..., map = x$stsObj@map, meanHHH = NULL) plotHHH4_ri(x, component, exp = FALSE, at = list(n = 10), col.regions = cm.colors(100), colorkey = TRUE, labels = FALSE, sp.layout = NULL, gpar.missing = list(col = "darkgrey", lty = 2, lwd = 2), ...) plotHHH4_neweights(x, plotter = boxplot, ..., exclude = 0, maxlag = Inf)
## S3 method for class 'hhh4' plot(x, type=c("fitted", "season", "maxEV", "maps", "ri", "neweights"), ...) plotHHH4_fitted(x, units = 1, names = NULL, col = c("grey85", "blue", "orange"), pch = 19, pt.cex = 0.6, pt.col = 1, par.settings = list(), legend = TRUE, legend.args = list(), legend.observed = FALSE, decompose = NULL, total = FALSE, meanHHH = NULL, ...) plotHHH4_fitted1(x, unit = 1, main = NULL, col = c("grey85", "blue", "orange"), pch = 19, pt.cex = 0.6, pt.col = 1, border = col, start = x$stsObj@start, end = NULL, xaxis = NULL, xlim = NULL, ylim = NULL, xlab = "", ylab = "No. infected", hide0s = FALSE, decompose = NULL, total = FALSE, meanHHH = NULL) plotHHH4_season(..., components = NULL, intercept = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, ylab = "", main = NULL, par.settings = list(), matplot.args = list(), legend = NULL, legend.args = list(), refline.args = list(), unit = 1, period = NULL) getMaxEV_season(x, period = frequency(x$stsObj)) plotHHH4_maxEV(..., matplot.args = list(), refline.args = list(), legend.args = list()) getMaxEV(x) plotHHH4_maps(x, which = c("mean", "endemic", "epi.own", "epi.neighbours"), prop = FALSE, main = which, zmax = NULL, col.regions = NULL, labels = FALSE, sp.layout = NULL, ..., map = x$stsObj@map, meanHHH = NULL) plotHHH4_ri(x, component, exp = FALSE, at = list(n = 10), col.regions = cm.colors(100), colorkey = TRUE, labels = FALSE, sp.layout = NULL, gpar.missing = list(col = "darkgrey", lty = 2, lwd = 2), ...) plotHHH4_neweights(x, plotter = boxplot, ..., exclude = 0, maxlag = Inf)
x |
a fitted |
type |
type of plot: either |
... |
For |
units , unit
|
integer or character vector specifying a single
|
names , main
|
main title(s) for the selected
|
col , border
|
length 3 vectors specifying the fill and border colors for the endemic, autoregressive, and spatio-temporal component polygons (in this order). |
pch , pt.cex , pt.col
|
style specifications for the dots drawn to represent
the observed counts. |
par.settings |
list of graphical parameters for
|
legend |
Integer vector specifying in which of the
|
legend.args |
list of arguments for |
legend.observed |
logical indicating if the legend should contain a line for the dots corresponding to observed counts. |
decompose |
if |
total |
logical indicating if the fitted components should be
summed over all units to be compared with the total observed
counts at each time point. If |
start , end
|
time range to plot specified by vectors of length two
in the form |
xaxis |
if this is a list (of arguments for
|
xlim |
numeric vector of length 2 specifying the x-axis range.
The default ( |
ylim |
y-axis range.
For |
xlab , ylab
|
axis labels. For |
hide0s |
logical indicating if dots for zero observed counts should be omitted. Especially useful if there are too many. |
meanHHH |
(internal) use different component means than those
estimated and available from |
components |
character vector of component names, i.e., a subset
of |
intercept |
logical indicating whether to include the global
intercept. For |
exp |
logical indicating whether to |
at |
a numeric vector of breaks for the color levels (see
|
matplot.args |
list of line style specifications passed to
|
refline.args |
list of line style specifications (e.g.,
|
period |
a numeric value giving the (longest) period of the
harmonic terms in the model. This usually coincides with the
|
which |
a character vector specifying the components of the mean for which to produce maps. By default, the overall mean and all three components are shown. |
prop |
a logical indicating whether the component maps should display proportions of the total mean instead of absolute numbers. |
zmax |
a numeric vector of length |
col.regions |
a vector of colors used to encode the fitted
component means (see |
colorkey |
a Boolean indicating whether to draw the color key.
Alternatively, a list specifying how to draw it, see
|
map |
an object inheriting from |
component |
component for which to plot the estimated
region-specific random intercepts. Must partially match one of
|
labels |
determines if and how regions are labeled, see
|
sp.layout |
optional list of additional layout items, see
|
gpar.missing |
list of graphical parameters for
|
plotter |
the (name of a) function used to produce the plot of
weights (a numeric vector) as a function of neighbourhood order (a
factor variable). It is called as
|
exclude |
vector of neighbourhood orders to be excluded from
plotting (passed to |
maxlag |
maximum order of neighbourhood to be assumed when
computing the |
plotHHH4_fitted1
invisibly returns a matrix of the fitted
component means for the selected unit
, and plotHHH4_fitted
returns these in a list for all units
.plotHHH4_season
invisibly returns the plotted y-values, i.e. the
multiplicative seasonality effect within each of components
.
Note that this will include the intercept, i.e. the point estimate of
is plotted and returned.
getMaxEV_season
returns a list with elements
"maxEV.season"
(as plotted by
plotHHH4_season(..., components="maxEV")
,
"maxEV.const"
and "Lambda.const"
(the Lambda matrix and
its dominant eigenvalue if time effects are ignored).plotHHH4_maxEV
(invisibly) and getMaxEV
return the
dominant eigenvalue of the matrix for all time points
of
x$stsObj
.plotHHH4_maps
returns a trellis.object
if
length(which) == 1
(a single spplot
), and
otherwise uses grid.arrange
from the
gridExtra package to arrange all length(which)
spplot
s on a single page.
plotHHH4_ri
returns the generated spplot
, i.e.,
a trellis.object
.plotHHH4_neweights
eventually calls plotter
and
thus returns whatever is returned by that function.
Sebastian Meyer
Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54, 824-843. doi:10.1002/bimj.201200037
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
other methods for hhh4
fits, e.g., summary.hhh4
.
data("measlesWeserEms") ## fit a simple hhh4 model measlesModel <- list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~0 + ri(type="iid"), S=1, period=52), offset = population(measlesWeserEms)), family = "NegBin1" ) measlesFit <- hhh4(measlesWeserEms, measlesModel) ## fitted values for a single unit plot(measlesFit, units=2) ## sum fitted components over all units plot(measlesFit, total=TRUE) ## 'xaxis' option for a nicely formatted time axis ## default tick locations and labels: plot(measlesFit, total=TRUE, xaxis=list(epochsAsDate=TRUE, line=1)) ## an alternative with monthly ticks: oopts <- surveillance.options(stsTickFactors = c("%m"=0.75, "%Y" = 1.5)) plot(measlesFit, total=TRUE, xaxis=list(epochsAsDate=TRUE, xaxis.tickFreq=list("%m"=atChange, "%Y"=atChange), xaxis.labelFreq=list("%Y"=atMedian), xaxis.labelFormat="%Y")) surveillance.options(oopts) ## plot the multiplicative effect of seasonality plot(measlesFit, type="season") ## alternative fit with biennial pattern, plotted jointly with original fit measlesFit2 <- update(measlesFit, end = list(f = addSeason2formula(~0 + ri(type="iid"), S=2, period=104))) plotHHH4_season(measlesFit, measlesFit2, components="end", period=104) ## dominant eigenvalue of the Lambda matrix (cf. Held and Paul, 2012) getMaxEV(measlesFit) # here simply constant and equal to exp(ar.1) plot(measlesFit, type="maxEV") # not very exciting ## fitted mean components/proportions by district, averaged over time if (requireNamespace("gridExtra")) { plot(measlesFit, type="maps", labels=list(cex=0.6), which=c("endemic", "epi.own"), prop=TRUE, zmax=NA, main=c("endemic proportion", "autoregressive proportion")) } ## estimated random intercepts of the endemic component round(nu0 <- fixef(measlesFit)["end.ri(iid)"], 4) # global intercept round(ranefs <- ranef(measlesFit, tomatrix = TRUE), 4) # zero-mean deviations stopifnot(all.equal( nu0 + ranefs, ranef(measlesFit, intercept = TRUE) # local intercepts (log-scale) )) plot(measlesFit, type="ri", component="end", main="deviations around the endemic intercept (log-scale)") exp(ranef(measlesFit)) # multiplicative effects, plotted below plot(measlesFit, type="ri", component="end", exp=TRUE, main="multiplicative effects", labels=list(font=3, labels="GEN")) ## neighbourhood weights as a function of neighbourhood order plot(measlesFit, type="neweights") # boring, model has no "ne" component ## fitted values for the 6 regions with most cases and some customization bigunits <- tail(names(sort(colSums(observed(measlesWeserEms)))), 6) plot(measlesFit, units=bigunits, names=measlesWeserEms@map@data[bigunits,"GEN"], legend=5, legend.args=list(x="top"), xlab="Time (weekly)", hide0s=TRUE, ylim=c(0,max(observed(measlesWeserEms)[,bigunits])), start=c(2002,1), end=c(2002,26), par.settings=list(xaxs="i"))
data("measlesWeserEms") ## fit a simple hhh4 model measlesModel <- list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~0 + ri(type="iid"), S=1, period=52), offset = population(measlesWeserEms)), family = "NegBin1" ) measlesFit <- hhh4(measlesWeserEms, measlesModel) ## fitted values for a single unit plot(measlesFit, units=2) ## sum fitted components over all units plot(measlesFit, total=TRUE) ## 'xaxis' option for a nicely formatted time axis ## default tick locations and labels: plot(measlesFit, total=TRUE, xaxis=list(epochsAsDate=TRUE, line=1)) ## an alternative with monthly ticks: oopts <- surveillance.options(stsTickFactors = c("%m"=0.75, "%Y" = 1.5)) plot(measlesFit, total=TRUE, xaxis=list(epochsAsDate=TRUE, xaxis.tickFreq=list("%m"=atChange, "%Y"=atChange), xaxis.labelFreq=list("%Y"=atMedian), xaxis.labelFormat="%Y")) surveillance.options(oopts) ## plot the multiplicative effect of seasonality plot(measlesFit, type="season") ## alternative fit with biennial pattern, plotted jointly with original fit measlesFit2 <- update(measlesFit, end = list(f = addSeason2formula(~0 + ri(type="iid"), S=2, period=104))) plotHHH4_season(measlesFit, measlesFit2, components="end", period=104) ## dominant eigenvalue of the Lambda matrix (cf. Held and Paul, 2012) getMaxEV(measlesFit) # here simply constant and equal to exp(ar.1) plot(measlesFit, type="maxEV") # not very exciting ## fitted mean components/proportions by district, averaged over time if (requireNamespace("gridExtra")) { plot(measlesFit, type="maps", labels=list(cex=0.6), which=c("endemic", "epi.own"), prop=TRUE, zmax=NA, main=c("endemic proportion", "autoregressive proportion")) } ## estimated random intercepts of the endemic component round(nu0 <- fixef(measlesFit)["end.ri(iid)"], 4) # global intercept round(ranefs <- ranef(measlesFit, tomatrix = TRUE), 4) # zero-mean deviations stopifnot(all.equal( nu0 + ranefs, ranef(measlesFit, intercept = TRUE) # local intercepts (log-scale) )) plot(measlesFit, type="ri", component="end", main="deviations around the endemic intercept (log-scale)") exp(ranef(measlesFit)) # multiplicative effects, plotted below plot(measlesFit, type="ri", component="end", exp=TRUE, main="multiplicative effects", labels=list(font=3, labels="GEN")) ## neighbourhood weights as a function of neighbourhood order plot(measlesFit, type="neweights") # boring, model has no "ne" component ## fitted values for the 6 regions with most cases and some customization bigunits <- tail(names(sort(colSums(observed(measlesWeserEms)))), 6) plot(measlesFit, units=bigunits, names=measlesWeserEms@map@data[bigunits,"GEN"], legend=5, legend.args=list(x="top"), xlab="Time (weekly)", hide0s=TRUE, ylim=c(0,max(observed(measlesWeserEms)[,bigunits])), start=c(2002,1), end=c(2002,26), par.settings=list(xaxs="i"))
hhh4
ModelGet fitted (component) means from a hhh4
model.
## S3 method for class 'hhh4' predict(object, newSubset=object$control$subset, type="response", ...)
## S3 method for class 'hhh4' predict(object, newSubset=object$control$subset, type="response", ...)
object |
fitted |
newSubset |
subset of time points for which to return the
predictions. Defaults to the subset used for fitting the model, and
must be a subset of |
type |
the type of prediction required. The default
( |
... |
unused (argument of the generic). |
matrix of fitted means for each time point (of newSubset
) and region.
Predictions for “newdata”, i.e., with modified covariates or
fixed weights, can be computed manually by adjusting the control list
(in a copy of the original fit), dropping the old terms
, and using
the internal function meanHHH
directly, see the Example.
Michaela Paul and Sebastian Meyer
## simulate simple seasonal noise with reduced baseline for t >= 60 t <- 0:100 y <- rpois(length(t), exp(3 + sin(2*pi*t/52) - 2*(t >= 60))) obj <- sts(y) plot(obj) ## fit true model fit <- hhh4(obj, list(end = list(f = addSeason2formula(~lock)), data = list(lock = as.integer(t >= 60)), family = "Poisson")) coef(fit, amplitudeShift = TRUE, se = TRUE) ## compute predictions for a subset of the time points stopifnot(identical(predict(fit), fitted(fit))) plot(obj) lines(40:80, predict(fit, newSubset = 40:80), lwd = 2) ## advanced: compute predictions for "newdata" (here, a modified covariate) mod <- fit mod$terms <- NULL # to be sure mod$control$data$lock[t >= 60] <- 0.5 pred <- meanHHH(mod$coefficients, terms(mod))$mean plot(fit, xaxis = NA) lines(mod$control$subset, pred, lty = 2)
## simulate simple seasonal noise with reduced baseline for t >= 60 t <- 0:100 y <- rpois(length(t), exp(3 + sin(2*pi*t/52) - 2*(t >= 60))) obj <- sts(y) plot(obj) ## fit true model fit <- hhh4(obj, list(end = list(f = addSeason2formula(~lock)), data = list(lock = as.integer(t >= 60)), family = "Poisson")) coef(fit, amplitudeShift = TRUE, se = TRUE) ## compute predictions for a subset of the time points stopifnot(identical(predict(fit), fitted(fit))) plot(obj) lines(40:80, predict(fit, newSubset = 40:80), lwd = 2) ## advanced: compute predictions for "newdata" (here, a modified covariate) mod <- fit mod$terms <- NULL # to be sure mod$control$data$lock[t >= 60] <- 0.5 pred <- meanHHH(mod$coefficients, terms(mod))$mean plot(fit, xaxis = NA) lines(mod$control$subset, pred, lty = 2)
"hhh4"
Count Time SeriesSimulates a multivariate time series of counts based on the Poisson/Negative Binomial model as described in Paul and Held (2011).
## S3 method for class 'hhh4' simulate(object, nsim = 1, seed = NULL, y.start = NULL, subset = 1:nrow(object$stsObj), coefs = coef(object), components = c("ar","ne","end"), simplify = nsim>1, ...)
## S3 method for class 'hhh4' simulate(object, nsim = 1, seed = NULL, y.start = NULL, subset = 1:nrow(object$stsObj), coefs = coef(object), components = c("ar","ne","end"), simplify = nsim>1, ...)
object |
an object of class |
nsim |
number of time series to simulate. Defaults to |
seed |
an object specifying how the random number generator should be
initialized for simulation (via |
y.start |
vector or matrix (with |
subset |
time period in which to simulate data. Defaults to (and cannot
exceed) the whole period defined by the underlying |
coefs |
coefficients used for simulation from the model in |
components |
character vector indicating which components of the fitted model
|
simplify |
logical indicating if only the simulated counts ( |
... |
unused (argument of the generic). |
Simulates data from a Poisson or a Negative Binomial model with mean
where
,
, and
are
parameters which are modelled parametrically.
The function uses the model and parameter estimates of the fitted
object
to simulate the time series.
With the argument coefs
it is possible to simulate from
the model as specified in object
, but with different
parameter values.
If simplify=FALSE
: an object of class
"sts"
(nsim = 1
) or a list of those
(nsim > 1
).
If simplify=TRUE
: an object of class
"hhh4sims"
, which is an array of dimension
c(length(subset), ncol(object$stsObj), nsim)
.
The originally observed counts during the simulation period,
object$stsObj[subset,]
, are attached for reference
(used by the plot
-methods) as an attribute "stsObserved"
,
and the initial condition y.start
as attribute "initial"
.
The [
-method for "hhh4sims"
takes care of subsetting
these attributes appropriately.
Michaela Paul and Sebastian Meyer
Paul, M. and Held, L. (2011) Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118–1136
plot.hhh4sims
and scores.hhh4sims
and the examples therein for nsim > 1
.
data(influMen) # convert to sts class and extract meningococcal disease time series meningo <- disProg2sts(influMen)[,2] # fit model fit <- hhh4(meningo, control = list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~1, period = 52)), family = "NegBin1")) plot(fit) # simulate from model (generates an "sts" object) simData <- simulate(fit, seed=1234) # plot simulated data plot(simData, main = "simulated data", xaxis.labelFormat=NULL) # use simplify=TRUE to return an array of simulated counts simCounts <- simulate(fit, seed=1234, simplify=TRUE) dim(simCounts) # nTime x nUnit x nsim # plot the first year of simulated counts (+ initial + observed) plot(simCounts[1:52,,], type = "time", xaxis.labelFormat = NULL) # see help(plot.hhh4sims) for other plots, mainly useful for nsim > 1 # simulate from a Poisson instead of a NegBin model # keeping all other parameters fixed at their original estimates coefs <- replace(coef(fit), "overdisp", 0) simData2 <- simulate(fit, seed=123, coefs = coefs) plot(simData2, main = "simulated data: Poisson model", xaxis.labelFormat = NULL) # simulate from a model with higher autoregressive parameter coefs <- replace(coef(fit), "ar.1", log(0.9)) simData3 <- simulate(fit, seed=321, coefs = coefs) plot(simData3, main = "simulated data: lambda = 0.5", xaxis.labelFormat = NULL) ## more sophisticated: simulate beyond initially observed time range # extend data range by one year (non-observed domain), filling with NA values nextend <- 52 timeslots <- c("observed", "state", "alarm", "upperbound", "populationFrac") addrows <- function (mat, n) mat[c(seq_len(nrow(mat)), rep(NA, n)),,drop=FALSE] extended <- Map(function (x) addrows(slot(meningo, x), n = nextend), x = timeslots) # create new sts object with extended matrices meningo2 <- do.call("sts", c(list(start = meningo@start, frequency = meningo@freq, map = meningo@map), extended)) # fit to the observed time range only, via the 'subset' argument fit2 <- hhh4(meningo2, control = list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~1, period = 52)), family = "NegBin1", subset = 2:(nrow(meningo2) - nextend))) # the result is the same as before stopifnot(all.equal(fit, fit2, ignore = c("stsObj", "control"))) # long-term probabilistic forecast via simulation for non-observed time points meningoSim <- simulate(fit2, nsim = 100, seed = 1, subset = seq(nrow(meningo)+1, nrow(meningo2)), y.start = tail(observed(meningo), 1)) apply(meningoSim, 1:2, function (ysim) quantile(ysim, c(0.1, 0.5, 0.9))) # three plot types are available for "hhh4sims", see also ?plot.hhh4sims plot(meningoSim, type = "time", average = median) plot(meningoSim, type = "size", observed = FALSE) if (requireNamespace("fanplot")) plot(meningoSim, type = "fan", means.args = list(), fan.args = list(ln = c(.1,.9), ln.col = 8))
data(influMen) # convert to sts class and extract meningococcal disease time series meningo <- disProg2sts(influMen)[,2] # fit model fit <- hhh4(meningo, control = list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~1, period = 52)), family = "NegBin1")) plot(fit) # simulate from model (generates an "sts" object) simData <- simulate(fit, seed=1234) # plot simulated data plot(simData, main = "simulated data", xaxis.labelFormat=NULL) # use simplify=TRUE to return an array of simulated counts simCounts <- simulate(fit, seed=1234, simplify=TRUE) dim(simCounts) # nTime x nUnit x nsim # plot the first year of simulated counts (+ initial + observed) plot(simCounts[1:52,,], type = "time", xaxis.labelFormat = NULL) # see help(plot.hhh4sims) for other plots, mainly useful for nsim > 1 # simulate from a Poisson instead of a NegBin model # keeping all other parameters fixed at their original estimates coefs <- replace(coef(fit), "overdisp", 0) simData2 <- simulate(fit, seed=123, coefs = coefs) plot(simData2, main = "simulated data: Poisson model", xaxis.labelFormat = NULL) # simulate from a model with higher autoregressive parameter coefs <- replace(coef(fit), "ar.1", log(0.9)) simData3 <- simulate(fit, seed=321, coefs = coefs) plot(simData3, main = "simulated data: lambda = 0.5", xaxis.labelFormat = NULL) ## more sophisticated: simulate beyond initially observed time range # extend data range by one year (non-observed domain), filling with NA values nextend <- 52 timeslots <- c("observed", "state", "alarm", "upperbound", "populationFrac") addrows <- function (mat, n) mat[c(seq_len(nrow(mat)), rep(NA, n)),,drop=FALSE] extended <- Map(function (x) addrows(slot(meningo, x), n = nextend), x = timeslots) # create new sts object with extended matrices meningo2 <- do.call("sts", c(list(start = meningo@start, frequency = meningo@freq, map = meningo@map), extended)) # fit to the observed time range only, via the 'subset' argument fit2 <- hhh4(meningo2, control = list( ar = list(f = ~ 1), end = list(f = addSeason2formula(~1, period = 52)), family = "NegBin1", subset = 2:(nrow(meningo2) - nextend))) # the result is the same as before stopifnot(all.equal(fit, fit2, ignore = c("stsObj", "control"))) # long-term probabilistic forecast via simulation for non-observed time points meningoSim <- simulate(fit2, nsim = 100, seed = 1, subset = seq(nrow(meningo)+1, nrow(meningo2)), y.start = tail(observed(meningo), 1)) apply(meningoSim, 1:2, function (ysim) quantile(ysim, c(0.1, 0.5, 0.9))) # three plot types are available for "hhh4sims", see also ?plot.hhh4sims plot(meningoSim, type = "time", average = median) plot(meningoSim, type = "size", observed = FALSE) if (requireNamespace("fanplot")) plot(meningoSim, type = "fan", means.args = list(), fan.args = list(ln = c(.1,.9), ln.col = 8))
"hhh4"
Models
Arrays of simulated counts from simulate.hhh4
can be
visualized as final size boxplots, individual or average time series,
or fan charts (using the fanplot package).
An aggregate
-method is also available.
## S3 method for class 'hhh4sims' plot(x, ...) ## S3 method for class 'hhh4sims' aggregate(x, units = TRUE, time = FALSE, ..., drop = FALSE) as.hhh4simslist(x, ...) ## S3 method for class 'hhh4simslist' plot(x, type = c("size", "time", "fan"), ..., groups = NULL, par.settings = list()) ## S3 method for class 'hhh4simslist' aggregate(x, units = TRUE, time = FALSE, ..., drop = FALSE) plotHHH4sims_size(x, horizontal = TRUE, trafo = NULL, observed = TRUE, names = base::names(x), ...) plotHHH4sims_time(x, average = mean, individual = length(x) == 1, conf.level = if (individual) 0.95 else NULL, matplot.args = list(), initial.args = list(), legend = length(x) > 1, xlim = NULL, ylim = NULL, add = FALSE, ...) plotHHH4sims_fan(x, which = 1, fan.args = list(), observed.args = list(), initial.args = list(), means.args = NULL, key.args = NULL, xlim = NULL, ylim = NULL, add = FALSE, xaxis = list(), ...)
## S3 method for class 'hhh4sims' plot(x, ...) ## S3 method for class 'hhh4sims' aggregate(x, units = TRUE, time = FALSE, ..., drop = FALSE) as.hhh4simslist(x, ...) ## S3 method for class 'hhh4simslist' plot(x, type = c("size", "time", "fan"), ..., groups = NULL, par.settings = list()) ## S3 method for class 'hhh4simslist' aggregate(x, units = TRUE, time = FALSE, ..., drop = FALSE) plotHHH4sims_size(x, horizontal = TRUE, trafo = NULL, observed = TRUE, names = base::names(x), ...) plotHHH4sims_time(x, average = mean, individual = length(x) == 1, conf.level = if (individual) 0.95 else NULL, matplot.args = list(), initial.args = list(), legend = length(x) > 1, xlim = NULL, ylim = NULL, add = FALSE, ...) plotHHH4sims_fan(x, which = 1, fan.args = list(), observed.args = list(), initial.args = list(), means.args = NULL, key.args = NULL, xlim = NULL, ylim = NULL, add = FALSE, xaxis = list(), ...)
x |
an object of class |
type |
a character string indicating the summary plot to produce. |
... |
further arguments passed to methods. |
groups |
an optional factor to produce stratified plots by groups of units.
The special setting |
par.settings |
a list of graphical parameters for |
horizontal |
a logical indicating if the boxplots of the final size distributions should be horizontal (the default). |
trafo |
an optional transformation function from the scales package, e.g.,
|
observed |
a logical indicating if a line and axis value for the observed size of the epidemic should be added to the plot. Alternatively, a list with graphical parameters can be specified to modify the default values. |
names |
a character vector of names for |
average |
scalar-valued function to apply to the simulated counts at each time point. |
individual |
a logical indicating if the individual simulations should be shown as well. |
conf.level |
a scalar in (0,1), which determines the level of the pointwise
quantiles obtained from the simulated counts at each time point.
A value of |
matplot.args |
a list of graphical parameters for |
initial.args |
if a list (of graphical parameters for |
legend |
a logical, a character vector (providing names for |
xlim , ylim
|
vectors of length 2 determining the axis limits. |
add |
a logical indicating if the (mean) simulated time series or the fan chart, respectively, should be added to an existing plot. |
which |
a single integer or a character string selecting the model in
|
fan.args |
a list of graphical parameters for the |
observed.args |
if a list (of graphical parameters for |
means.args |
if a list (of graphical parameters for |
key.args |
if a list, a color key (in |
xaxis |
if a list of arguments for |
units |
a logical indicating aggregation over units. Can also be a factor
(or something convertible to a factor using |
time |
a logical indicating if the counts should be summed over the whole simulation period. |
drop |
a logical indicating if the unit dimension and the |
Sebastian Meyer
### univariate example data("salmAllOnset") ## fit a hhh4 model to the first 13 years salmModel <- list(end = list(f = addSeason2formula(~1 + t)), ar = list(f = ~1), family = "NegBin1", subset = 2:678) salmFit <- hhh4(salmAllOnset, salmModel) ## simulate the next 20 weeks ahead salmSims <- simulate(salmFit, nsim = 300, seed = 3, subset = 678 + seq_len(20), y.start = observed(salmAllOnset)[678,]) ## compare final size distribution to observed value summary(aggregate(salmSims, time = TRUE)) # summary of simulated values plot(salmSims, type = "size") ## individual and average simulated time series with a confidence interval plot(salmSims, type = "time", main = "20-weeks-ahead simulation") ## fan chart based on the quantiles of the simulated counts at each time point ## point forecasts are represented by a white line within the fan if (requireNamespace("fanplot")) { plot(salmSims, type = "fan", main = "20-weeks-ahead simulation", fan.args = list(ln = 1:9/10), means.args = list()) } ### multivariate example data("measlesWeserEms") ## fit a hhh4 model to the first year measlesModel <- list( end = list(f = addSeason2formula(~1), offset = population(measlesWeserEms)), ar = list(f = ~1), ne = list(f = ~1 + log(pop), weights = W_powerlaw(maxlag = 5, normalize = TRUE)), family = "NegBin1", subset = 2:52, data = list(pop = population(measlesWeserEms))) measlesFit1 <- hhh4(measlesWeserEms, control = measlesModel) ## use a Poisson distribution instead (just for comparison) measlesFit2 <- update(measlesFit1, family = "Poisson") ## simulate realizations from these models during the second year measlesSims <- lapply(X = list(NegBin = measlesFit1, Poisson = measlesFit2), FUN = simulate, nsim = 50, seed = 1, subset = 53:104, y.start = observed(measlesWeserEms)[52,]) ## final size of the first model plot(measlesSims[[1]]) ## stratified by groups of districts mygroups <- factor(substr(colnames(measlesWeserEms), 4, 4)) apply(aggregate(measlesSims[[1]], time = TRUE, units = mygroups), 1, summary) plot(measlesSims[[1]], groups = mygroups) ## a class and plot-method for a list of simulations from different models measlesSims <- as.hhh4simslist(measlesSims) plot(measlesSims) ## simulated time series plot(measlesSims, type = "time", individual = TRUE, ylim = c(0, 80)) ## fan charts if (requireNamespace("fanplot")) { opar <- par(mfrow = c(2,1)) plot(measlesSims, type = "fan", which = 1, ylim = c(0, 80), main = "NegBin", key.args = list()) plot(measlesSims, type = "fan", which = 2, ylim = c(0, 80), main = "Poisson") par(opar) }
### univariate example data("salmAllOnset") ## fit a hhh4 model to the first 13 years salmModel <- list(end = list(f = addSeason2formula(~1 + t)), ar = list(f = ~1), family = "NegBin1", subset = 2:678) salmFit <- hhh4(salmAllOnset, salmModel) ## simulate the next 20 weeks ahead salmSims <- simulate(salmFit, nsim = 300, seed = 3, subset = 678 + seq_len(20), y.start = observed(salmAllOnset)[678,]) ## compare final size distribution to observed value summary(aggregate(salmSims, time = TRUE)) # summary of simulated values plot(salmSims, type = "size") ## individual and average simulated time series with a confidence interval plot(salmSims, type = "time", main = "20-weeks-ahead simulation") ## fan chart based on the quantiles of the simulated counts at each time point ## point forecasts are represented by a white line within the fan if (requireNamespace("fanplot")) { plot(salmSims, type = "fan", main = "20-weeks-ahead simulation", fan.args = list(ln = 1:9/10), means.args = list()) } ### multivariate example data("measlesWeserEms") ## fit a hhh4 model to the first year measlesModel <- list( end = list(f = addSeason2formula(~1), offset = population(measlesWeserEms)), ar = list(f = ~1), ne = list(f = ~1 + log(pop), weights = W_powerlaw(maxlag = 5, normalize = TRUE)), family = "NegBin1", subset = 2:52, data = list(pop = population(measlesWeserEms))) measlesFit1 <- hhh4(measlesWeserEms, control = measlesModel) ## use a Poisson distribution instead (just for comparison) measlesFit2 <- update(measlesFit1, family = "Poisson") ## simulate realizations from these models during the second year measlesSims <- lapply(X = list(NegBin = measlesFit1, Poisson = measlesFit2), FUN = simulate, nsim = 50, seed = 1, subset = 53:104, y.start = observed(measlesWeserEms)[52,]) ## final size of the first model plot(measlesSims[[1]]) ## stratified by groups of districts mygroups <- factor(substr(colnames(measlesWeserEms), 4, 4)) apply(aggregate(measlesSims[[1]], time = TRUE, units = mygroups), 1, summary) plot(measlesSims[[1]], groups = mygroups) ## a class and plot-method for a list of simulations from different models measlesSims <- as.hhh4simslist(measlesSims) plot(measlesSims) ## simulated time series plot(measlesSims, type = "time", individual = TRUE, ylim = c(0, 80)) ## fan charts if (requireNamespace("fanplot")) { opar <- par(mfrow = c(2,1)) plot(measlesSims, type = "fan", which = 1, ylim = c(0, 80), main = "NegBin", key.args = list()) plot(measlesSims, type = "fan", which = 2, ylim = c(0, 80), main = "Poisson") par(opar) }
hhh4
Models
Calculate proper scoring rules based on simulated predictive distributions.
## S3 method for class 'hhh4sims' scores(x, which = "rps", units = NULL, ..., drop = TRUE) ## S3 method for class 'hhh4simslist' scores(x, ...)
## S3 method for class 'hhh4sims' scores(x, which = "rps", units = NULL, ..., drop = TRUE) ## S3 method for class 'hhh4simslist' scores(x, ...)
x |
an object of class |
which |
a character vector indicating which proper scoring rules to compute.
By default, only the ranked probability score ( |
units |
if non- |
drop |
a logical indicating if univariate dimensions should be dropped (the default). |
... |
unused (argument of the generic). |
This implementation can only compute univariate scores, i.e., independently for each time point.
The logarithmic score is badly estimated if the domain is large and there are not enough samples to cover the underlying distribution in enough detail (the score becomes infinite when an observed value does not occur in the samples). An alternative is to use kernel density estimation as implemented in the R package scoringRules.
Sebastian Meyer
data("salmAllOnset") ## fit a hhh4 model to the first 13 years salmModel <- list(end = list(f = addSeason2formula(~1 + t)), ar = list(f = ~1), family = "NegBin1", subset = 2:678) salmFit <- hhh4(salmAllOnset, salmModel) ## simulate the next 20 weeks ahead (with very small 'nsim' for speed) salmSims <- simulate(salmFit, nsim = 500, seed = 3, subset = 678 + seq_len(20), y.start = observed(salmAllOnset)[678,]) if (requireNamespace("fanplot")) plot(salmSims, "fan") ### calculate scores at each time point ## using empirical distribution of simulated counts as forecast distribution scores(salmSims, which = c("rps", "logs", "dss")) ## observed count sometimes not covered by simulations -> infinite log-score ## => for a more detailed forecast, either considerably increase 'nsim', or: ## 1. use continuous density() of simulated counts as forecast distribution fi <- apply(salmSims, 1, function (x) approxfun(density(x))) logs_kde <- mapply(function (f, y) -log(f(y)), f = fi, y = observed(attr(salmSims,"stsObserved"))) cbind("empirical" = scores(salmSims, "logs"), "density" = logs_kde) ## a similar KDE approach is implemented in scoringRules::logs_sample() ## 2. average conditional predictive NegBin's of simulated trajectories, ## currently only implemented in HIDDA.forecasting::dhhh4sims() ### produce a PIT histogram ## using empirical distribution of simulated counts as forecast distribition pit(x = observed(attr(salmSims, "stsObserved")), pdistr = apply(salmSims, 1:2, ecdf)) ## long-term forecast is badly calibrated (lower tail is unused, see fan above) ## we also get a warning for the same reason as infinite log-scores
data("salmAllOnset") ## fit a hhh4 model to the first 13 years salmModel <- list(end = list(f = addSeason2formula(~1 + t)), ar = list(f = ~1), family = "NegBin1", subset = 2:678) salmFit <- hhh4(salmAllOnset, salmModel) ## simulate the next 20 weeks ahead (with very small 'nsim' for speed) salmSims <- simulate(salmFit, nsim = 500, seed = 3, subset = 678 + seq_len(20), y.start = observed(salmAllOnset)[678,]) if (requireNamespace("fanplot")) plot(salmSims, "fan") ### calculate scores at each time point ## using empirical distribution of simulated counts as forecast distribution scores(salmSims, which = c("rps", "logs", "dss")) ## observed count sometimes not covered by simulations -> infinite log-score ## => for a more detailed forecast, either considerably increase 'nsim', or: ## 1. use continuous density() of simulated counts as forecast distribution fi <- apply(salmSims, 1, function (x) approxfun(density(x))) logs_kde <- mapply(function (f, y) -log(f(y)), f = fi, y = observed(attr(salmSims,"stsObserved"))) cbind("empirical" = scores(salmSims, "logs"), "density" = logs_kde) ## a similar KDE approach is implemented in scoringRules::logs_sample() ## 2. average conditional predictive NegBin's of simulated trajectories, ## currently only implemented in HIDDA.forecasting::dhhh4sims() ### produce a PIT histogram ## using empirical distribution of simulated counts as forecast distribition pit(x = observed(attr(salmSims, "stsObserved")), pdistr = apply(salmSims, 1:2, ecdf)) ## long-term forecast is badly calibrated (lower tail is unused, see fan above) ## we also get a warning for the same reason as infinite log-scores
update
a fitted "hhh4"
model
Re-fit a "hhh4"
model with a modified control
list.
## S3 method for class 'hhh4' update(object, ..., S = NULL, subset.upper = NULL, use.estimates = object$convergence, evaluate = TRUE)
## S3 method for class 'hhh4' update(object, ..., S = NULL, subset.upper = NULL, use.estimates = object$convergence, evaluate = TRUE)
object |
a fitted |
... |
components modifying the original control list for
|
S |
a named list of numeric vectors serving as argument for
|
subset.upper |
if a scalar value, refit the model to the data up to the time index
given by |
use.estimates |
logical specifying if |
evaluate |
logical indicating if the updated model should be fitted directly
(defaults to |
If evaluate = TRUE
the re-fitted object, otherwise the updated
control
list for hhh4
.
Sebastian Meyer
data("salmonella.agona") ## convert to sts class salmonella <- disProg2sts(salmonella.agona) ## fit a basic model fit0 <- hhh4(salmonella, list(ar = list(f = ~1), end = list(f = addSeason2formula(~1)))) ## the same, updating the minimal endemic-only model via 'S' (with a warning): fit0.2 <- update(hhh4(salmonella), # has no AR component S = list(ar = 0, end = 1)) local({ fit0$control$start <- fit0.2$control$start <- NULL # obviously different stopifnot(all.equal(fit0, fit0.2)) }) ## multiple updates: Poisson -> NegBin1, more harmonics fit1 <- update(fit0, family = "NegBin1", S = list(end=2, ar=2)) ## compare fits AIC(fit0, fit1) opar <- par(mfrow=c(2,2)) plot(fit0, type="fitted", names="fit0", par.settings=NULL) plot(fit1, type="fitted", names="fit1", par.settings=NULL) plot(fit0, fit1, type="season", components=c("end", "ar"), par.settings=NULL) par(opar)
data("salmonella.agona") ## convert to sts class salmonella <- disProg2sts(salmonella.agona) ## fit a basic model fit0 <- hhh4(salmonella, list(ar = list(f = ~1), end = list(f = addSeason2formula(~1)))) ## the same, updating the minimal endemic-only model via 'S' (with a warning): fit0.2 <- update(hhh4(salmonella), # has no AR component S = list(ar = 0, end = 1)) local({ fit0$control$start <- fit0.2$control$start <- NULL # obviously different stopifnot(all.equal(fit0, fit0.2)) }) ## multiple updates: Poisson -> NegBin1, more harmonics fit1 <- update(fit0, family = "NegBin1", S = list(end=2, ar=2)) ## compare fits AIC(fit0, fit1) opar <- par(mfrow=c(2,2)) plot(fit0, type="fitted", names="fit0", par.settings=NULL) plot(fit1, type="fitted", names="fit1", par.settings=NULL) plot(fit0, fit1, type="season", components=c("end", "ar"), par.settings=NULL) par(opar)
hhh4
ModelsThe function oneStepAhead
computes successive one-step-ahead
predictions for a (random effects) HHH model fitted by hhh4
.
These can be inspected using the quantile
, confint
or
plot
methods.
The associated scores
-method computes a number of (strictly) proper
scoring rules based on such one-step-ahead predictions;
see Paul and Held (2011) for details.
There are also calibrationTest
and pit
methods for oneStepAhead
predictions.
Scores, calibration tests and PIT histograms can also be
computed for the fitted values of an hhh4
model
(i.e., in-sample/training data evaluation).
oneStepAhead(result, tp, type = c("rolling", "first", "final"), which.start = c("current", "final"), keep.estimates = FALSE, verbose = type != "final", cores = 1) ## S3 method for class 'oneStepAhead' quantile(x, probs = c(2.5, 10, 50, 90, 97.5)/100, ...) ## S3 method for class 'oneStepAhead' confint(object, parm, level = 0.95, ...) ## S3 method for class 'oneStepAhead' plot(x, unit = 1, probs = 1:99/100, start = NULL, means.args = NULL, ...) ## assessment of "oneStepAhead" predictions ## S3 method for class 'oneStepAhead' scores(x, which = c("logs", "rps", "dss", "ses"), units = NULL, sign = FALSE, individual = FALSE, reverse = FALSE, ...) ## S3 method for class 'oneStepAhead' calibrationTest(x, units = NULL, ...) ## S3 method for class 'oneStepAhead' pit(x, units = NULL, ...) ## assessment of the "hhh4" model fit (in-sample predictions) ## S3 method for class 'hhh4' scores(x, which = c("logs", "rps", "dss", "ses"), subset = x$control$subset, units = seq_len(x$nUnit), sign = FALSE, ...) ## S3 method for class 'hhh4' calibrationTest(x, subset = x$control$subset, units = seq_len(x$nUnit), ...) ## S3 method for class 'hhh4' pit(x, subset = x$control$subset, units = seq_len(x$nUnit), ...)
oneStepAhead(result, tp, type = c("rolling", "first", "final"), which.start = c("current", "final"), keep.estimates = FALSE, verbose = type != "final", cores = 1) ## S3 method for class 'oneStepAhead' quantile(x, probs = c(2.5, 10, 50, 90, 97.5)/100, ...) ## S3 method for class 'oneStepAhead' confint(object, parm, level = 0.95, ...) ## S3 method for class 'oneStepAhead' plot(x, unit = 1, probs = 1:99/100, start = NULL, means.args = NULL, ...) ## assessment of "oneStepAhead" predictions ## S3 method for class 'oneStepAhead' scores(x, which = c("logs", "rps", "dss", "ses"), units = NULL, sign = FALSE, individual = FALSE, reverse = FALSE, ...) ## S3 method for class 'oneStepAhead' calibrationTest(x, units = NULL, ...) ## S3 method for class 'oneStepAhead' pit(x, units = NULL, ...) ## assessment of the "hhh4" model fit (in-sample predictions) ## S3 method for class 'hhh4' scores(x, which = c("logs", "rps", "dss", "ses"), subset = x$control$subset, units = seq_len(x$nUnit), sign = FALSE, ...) ## S3 method for class 'hhh4' calibrationTest(x, subset = x$control$subset, units = seq_len(x$nUnit), ...) ## S3 method for class 'hhh4' pit(x, subset = x$control$subset, units = seq_len(x$nUnit), ...)
result |
fitted |
tp |
numeric vector of length 2 specifying the time range in
which to compute one-step-ahead predictions (for the time points
|
type |
The default |
which.start |
Which initial parameter values should be used when successively
refitting the model to subsets of the data (up to time point
|
keep.estimates |
logical indicating if parameter estimates and log-likelihoods from the successive fits should be returned. |
verbose |
non-negative integer (usually in the range |
cores |
the number of cores to use when computing
the predictions for the set of time points |
object |
an object of class |
parm |
unused (argument of the generic). |
level |
required confidence level of the prediction interval. |
probs |
numeric vector of probabilities with values in [0,1]. |
unit |
single integer or character selecting a unit for which to produce the plot. |
start |
x-coordinate of the first prediction. If |
means.args |
if a list (of graphical parameters for |
x |
an object of class |
which |
character vector determining which scores to compute.
The package surveillance implements the following proper
scoring rules: logarithmic score ( |
subset |
subset of time points for which to calculate the scores (or test calibration, or produce the PIT histogram, respectively). Defaults to the subset used for fitting the model. |
units |
integer or character vector indexing the units for which to compute the scores (or the calibration test or the PIT histogram, respectively). By default, all units are considered. |
sign |
logical indicating if the function should also return
|
individual |
logical indicating if the individual scores of the
|
reverse |
logical indicating if the rows (time points) should be
reversed in the result. The long-standing but awkward default was to
do so for the |
... |
Unused by the |
oneStepAhead
returns a list (of class "oneStepAhead"
)
with the following components:
pred |
one-step-ahead predictions in a matrix, where each row
corresponds to one of the time points requested via the argument
|
observed |
matrix with observed counts at the predicted time
points. It has the same dimensions and names as |
psi |
in case of a negative-binomial model, a matrix of the
estimated overdispersion parameter(s) at each time point on
the internal -log-scale (1 column if |
allConverged |
logical indicating if all successive fits converged. |
If keep.estimates=TRUE
, there are the following additional elements:
coefficients |
matrix of estimated regression parameters from the successive fits. |
Sigma.orig |
matrix of estimated variance parameters from the successive fits. |
logliks |
matrix with columns |
The quantile
-method computes quantiles of the one-step-ahead
forecasts. If there is only one unit, it returns a tp x prob matrix,
otherwise a tp x unit x prob array.
The confint
-method is a convenient wrapper with probs
set
according to the required confidence level.
The function scores
computes the scoring rules specified in the
argument which
.
If multiple units
are selected and individual=TRUE
, the
result is an array of dimensions
c(nrow(pred),length(units),5+sign)
(up to surveillance
1.8-0, the first two dimensions were collapsed to give a matrix).
Otherwise, the result is a matrix with nrow(pred)
rows and
5+sign
columns. If there is only one predicted time point, the
first dimension is dropped in both cases.
The calibrationTest
- and pit
-methods are
just convenient wrappers around the respective default methods.
Sebastian Meyer and Michaela Paul
Czado, C., Gneiting, T. and Held, L. (2009): Predictive model assessment for count data. Biometrics, 65 (4), 1254-1261. doi:10.1111/j.1541-0420.2009.01191.x
Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177
vignette("hhh4")
and vignette("hhh4_spacetime")
### univariate salmonella agona count time series data("salmonella.agona") ## convert from old "disProg" to new "sts" class salmonella <- disProg2sts(salmonella.agona) ## generate formula for temporal and seasonal trends f.end <- addSeason2formula(~1 + t, S=1, period=52) model <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit the model result <- hhh4(salmonella, model) ## do sequential one-step-ahead predictions for the last 5 weeks pred <- oneStepAhead(result, nrow(salmonella)-5, type="rolling", which.start="final", verbose=FALSE) pred quantile(pred) confint(pred) ## simple plot of the 80% one-week-ahead prediction interval ## and point forecasts if (requireNamespace("fanplot")) plot(pred, probs = c(.1,.9), means.args = list()) ## note: oneStepAhead(..., type="final") just means fitted values stopifnot(identical( unname(oneStepAhead(result, nrow(salmonella)-5, type="final")$pred), unname(tail(fitted(result), 5)))) ## compute scores of the one-step-ahead predictions (sc <- scores(pred)) ## the above uses the scores-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: scores(x = pred$observed, mu = pred$pred, size = exp(pred$psi)) ## scores with respect to the fitted values are similar (scFitted <- scores(result, subset = nrow(salmonella)-(4:0))) ## test if the one-step-ahead predictions are calibrated calibrationTest(pred) # p = 0.8746 ## the above uses the calibrationTest-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: calibrationTest(x = pred$observed, mu = pred$pred, size = exp(pred$psi)) ## we can also test calibration of the fitted values ## using the calibrationTest-method for "hhh4" fits calibrationTest(result, subset = nrow(salmonella)-(4:0)) ## plot a (non-randomized) PIT histogram for the predictions pit(pred) ## the above uses the pit-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: pit(x = pred$observed, pdistr = "pnbinom", mu = pred$pred, size = exp(pred$psi)) ### multivariate measles count time series ## (omitting oneStepAhead forecasts here to keep runtime low) data("measlesWeserEms") ## simple hhh4 model with random effects in the endemic component measlesModel <- list( end = list(f = addSeason2formula(~0 + ri(type="iid"))), ar = list(f = ~1), family = "NegBin1") measlesFit <- hhh4(measlesWeserEms, control = measlesModel) ## assess overall (in-sample) calibration of the model, i.e., ## if the observed counts are from the fitted NegBin distribution calibrationTest(measlesFit) # default is DSS (not suitable for low counts) calibrationTest(measlesFit, which = "logs") # p = 0.7238 ## to assess calibration in the second year for a specific district calibrationTest(measlesFit, subset = 53:104, units = "03452", which = "rps") pit(measlesFit, subset = 53:104, units = "03452") ### For a more sophisticated multivariate analysis of ### areal time series of influenza counts - data("fluBYBW") - ### see the (computer-intensive) demo("fluBYBW") script: demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance") #file.show(demoscript)
### univariate salmonella agona count time series data("salmonella.agona") ## convert from old "disProg" to new "sts" class salmonella <- disProg2sts(salmonella.agona) ## generate formula for temporal and seasonal trends f.end <- addSeason2formula(~1 + t, S=1, period=52) model <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1") ## fit the model result <- hhh4(salmonella, model) ## do sequential one-step-ahead predictions for the last 5 weeks pred <- oneStepAhead(result, nrow(salmonella)-5, type="rolling", which.start="final", verbose=FALSE) pred quantile(pred) confint(pred) ## simple plot of the 80% one-week-ahead prediction interval ## and point forecasts if (requireNamespace("fanplot")) plot(pred, probs = c(.1,.9), means.args = list()) ## note: oneStepAhead(..., type="final") just means fitted values stopifnot(identical( unname(oneStepAhead(result, nrow(salmonella)-5, type="final")$pred), unname(tail(fitted(result), 5)))) ## compute scores of the one-step-ahead predictions (sc <- scores(pred)) ## the above uses the scores-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: scores(x = pred$observed, mu = pred$pred, size = exp(pred$psi)) ## scores with respect to the fitted values are similar (scFitted <- scores(result, subset = nrow(salmonella)-(4:0))) ## test if the one-step-ahead predictions are calibrated calibrationTest(pred) # p = 0.8746 ## the above uses the calibrationTest-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: calibrationTest(x = pred$observed, mu = pred$pred, size = exp(pred$psi)) ## we can also test calibration of the fitted values ## using the calibrationTest-method for "hhh4" fits calibrationTest(result, subset = nrow(salmonella)-(4:0)) ## plot a (non-randomized) PIT histogram for the predictions pit(pred) ## the above uses the pit-method for "oneStepAhead" predictions, ## which is a simple wrapper around the default method: pit(x = pred$observed, pdistr = "pnbinom", mu = pred$pred, size = exp(pred$psi)) ### multivariate measles count time series ## (omitting oneStepAhead forecasts here to keep runtime low) data("measlesWeserEms") ## simple hhh4 model with random effects in the endemic component measlesModel <- list( end = list(f = addSeason2formula(~0 + ri(type="iid"))), ar = list(f = ~1), family = "NegBin1") measlesFit <- hhh4(measlesWeserEms, control = measlesModel) ## assess overall (in-sample) calibration of the model, i.e., ## if the observed counts are from the fitted NegBin distribution calibrationTest(measlesFit) # default is DSS (not suitable for low counts) calibrationTest(measlesFit, which = "logs") # p = 0.7238 ## to assess calibration in the second year for a specific district calibrationTest(measlesFit, subset = 53:104, units = "03452", which = "rps") pit(measlesFit, subset = 53:104, units = "03452") ### For a more sophisticated multivariate analysis of ### areal time series of influenza counts - data("fluBYBW") - ### see the (computer-intensive) demo("fluBYBW") script: demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance") #file.show(demoscript)
hhh4
-Models
Set up power-law or nonparametric weights for the neighbourhood
component of hhh4
-models as proposed by Meyer and Held (2014).
Without normalization, power-law weights are
(if
, otherwise
),
where
(
) is the adjacency order
between regions
and
,
and the decay parameter
is to be estimated.
In the nonparametric formulation, unconstrained log-weights will be
estimated for each of the adjacency orders
2:maxlag
(the
first-order weight is fixed to 1 for identifiability).
Both weight functions can be modified to include a 0-distance weight,
which enables hhh4
models without a separate autoregressive component.
W_powerlaw(maxlag, normalize = TRUE, log = FALSE, initial = if (log) 0 else 1, from0 = FALSE) W_np(maxlag, truncate = TRUE, normalize = TRUE, initial = log(zetaweights(2:(maxlag+from0))), from0 = FALSE, to0 = truncate)
W_powerlaw(maxlag, normalize = TRUE, log = FALSE, initial = if (log) 0 else 1, from0 = FALSE) W_np(maxlag, truncate = TRUE, normalize = TRUE, initial = log(zetaweights(2:(maxlag+from0))), from0 = FALSE, to0 = truncate)
maxlag |
a single integer specifying a limiting order of
adjacency. If spatial dependence is not to be truncated at some
high order, |
truncate , to0
|
|
normalize |
logical indicating if the weights should be normalized such that the rows of the weight matrix sum to 1 (default). Note that normalization does not work with islands, i.e., regions without neighbours. |
log |
logical indicating if the decay parameter |
initial |
initial value of the parameter vector. |
from0 |
logical indicating if these parametric weights should
include the 0-distance (autoregressive) case. In the default setting
( |
hhh4
will take adjacency orders from the neighbourhood
slot of the "sts"
object, so these must be prepared before
fitting a model with parametric neighbourhood weights. The function
nbOrder
can be used to derive adjacency orders from a
binary adjacency matrix.
a list which can be passed as a specification of parametric
neighbourhood weights in the control$ne$weights
argument of
hhh4
.
Sebastian Meyer
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S. and Held, L. (2017): Incorporating social contact data in spatio-temporal models for infectious disease spread. Biostatistics, 18 (2), 338-351. doi:10.1093/biostatistics/kxw051
nbOrder
to determine adjacency orders from a binary
adjacency matrix.
getNEweights
and coefW
to extract the
estimated neighbourhood weight matrix and coefficients from an
hhh4
model.
data("measlesWeserEms") ## data contains adjaceny orders as required for parametric weights plot(measlesWeserEms, type = observed ~ unit, labels = TRUE) neighbourhood(measlesWeserEms)[1:6,1:6] max(neighbourhood(measlesWeserEms)) # max order is 5 ## fit a power-law decay of spatial interaction ## in a hhh4 model with seasonality and random intercepts in the endemic part measlesModel <- list( ar = list(f = ~ 1), ne = list(f = ~ 1, weights = W_powerlaw(maxlag=5)), end = list(f = addSeason2formula(~-1 + ri(), S=1, period=52)), family = "NegBin1") ## fit the model set.seed(1) # random intercepts are initialized randomly measlesFit <- hhh4(measlesWeserEms, measlesModel) summary(measlesFit) # "neweights.d" is the decay parameter d coefW(measlesFit) ## plot the spatio-temporal weights o_ji^-d / sum_k o_jk^-d ## as a function of adjacency order plot(measlesFit, type = "neweights", xlab = "adjacency order") ## normalization => same distance does not necessarily mean same weight. ## to extract the whole weight matrix W: getNEweights(measlesFit) ## visualize contributions of the three model components ## to the overall number of infections (aggregated over all districts) plot(measlesFit, total = TRUE) ## little contribution from neighbouring districts if (surveillance.options("allExamples")) { ## simpler model with autoregressive effects captured by the ne component measlesModel2 <- list( ne = list(f = ~ 1, weights = W_powerlaw(maxlag=5, from0=TRUE)), end = list(f = addSeason2formula(~-1 + ri(), S=1, period=52)), family = "NegBin1") measlesFit2 <- hhh4(measlesWeserEms, measlesModel2) ## omitting the separate AR component simplifies model extensions/selection ## and interpretation of covariate effects (only two predictors left) plot(measlesFit2, type = "neweights", exclude = NULL, xlab = "adjacency order") ## strong decay, again mostly within-district transmission ## (one could also try a purely autoregressive model) plot(measlesFit2, total = TRUE, legend.args = list(legend = c("epidemic", "endemic"))) ## almost the same RMSE as with separate AR and NE effects c(rmse1 = sqrt(mean(residuals(measlesFit, "response")^2)), rmse2 = sqrt(mean(residuals(measlesFit2, "response")^2))) }
data("measlesWeserEms") ## data contains adjaceny orders as required for parametric weights plot(measlesWeserEms, type = observed ~ unit, labels = TRUE) neighbourhood(measlesWeserEms)[1:6,1:6] max(neighbourhood(measlesWeserEms)) # max order is 5 ## fit a power-law decay of spatial interaction ## in a hhh4 model with seasonality and random intercepts in the endemic part measlesModel <- list( ar = list(f = ~ 1), ne = list(f = ~ 1, weights = W_powerlaw(maxlag=5)), end = list(f = addSeason2formula(~-1 + ri(), S=1, period=52)), family = "NegBin1") ## fit the model set.seed(1) # random intercepts are initialized randomly measlesFit <- hhh4(measlesWeserEms, measlesModel) summary(measlesFit) # "neweights.d" is the decay parameter d coefW(measlesFit) ## plot the spatio-temporal weights o_ji^-d / sum_k o_jk^-d ## as a function of adjacency order plot(measlesFit, type = "neweights", xlab = "adjacency order") ## normalization => same distance does not necessarily mean same weight. ## to extract the whole weight matrix W: getNEweights(measlesFit) ## visualize contributions of the three model components ## to the overall number of infections (aggregated over all districts) plot(measlesFit, total = TRUE) ## little contribution from neighbouring districts if (surveillance.options("allExamples")) { ## simpler model with autoregressive effects captured by the ne component measlesModel2 <- list( ne = list(f = ~ 1, weights = W_powerlaw(maxlag=5, from0=TRUE)), end = list(f = addSeason2formula(~-1 + ri(), S=1, period=52)), family = "NegBin1") measlesFit2 <- hhh4(measlesWeserEms, measlesModel2) ## omitting the separate AR component simplifies model extensions/selection ## and interpretation of covariate effects (only two predictors left) plot(measlesFit2, type = "neweights", exclude = NULL, xlab = "adjacency order") ## strong decay, again mostly within-district transmission ## (one could also try a purely autoregressive model) plot(measlesFit2, total = TRUE, legend.args = list(legend = c("epidemic", "endemic"))) ## almost the same RMSE as with separate AR and NE effects c(rmse1 = sqrt(mean(residuals(measlesFit, "response")^2)), rmse2 = sqrt(mean(residuals(measlesFit2, "response")^2))) }
hhh4
Model
The getNEweights
function extracts the (fitted) weight
matrix/array from a "hhh4"
object, after scaling and
normalization.
The coefW
function extracts the coefficients of parametric
neighbourhood weights from a hhh4
fit (or directly from a
corresponding coefficient vector), i.e., coefficients whose names
begin with “neweights”.
getNEweights(object, pars = coefW(object), scale = ne$scale, normalize = ne$normalize) coefW(object)
getNEweights(object, pars = coefW(object), scale = ne$scale, normalize = ne$normalize) coefW(object)
object |
an object of class |
pars |
coefficients for parametric neighbourhood weights,
such as for models using |
scale , normalize
|
parameters of the |
Sebastian Meyer
Data contain the date of hospitalization for 630 hemolytic-uremic syndrome (HUS) cases during the large STEC outbreak in Germany, 2011. Note: Only HUS cases which ultimately had a hospitalization date available/reported are included in the data set. The total number of HUS cases during the outbreak was 855 – see Höhle and an der Heiden (2014) as well as Frank et al. (2011) for details.
For each HUS case the attribute dHosp
contains the date of
hospitalization and the attribute dReport
contains the date of
first arrival of this hospitalization date at the Robert Koch
Institute (RKI). As described in Höhle and an der Heiden
(2014) the mechanisms of the delay were complicated and should be
interpreted with care. For example, the case report could have arrived
earlier, but without information about the hospitalization date.
The resulting reporting triangle corresponds to Fig. 1 of the Web appendix of Höhle and an der Heiden (2014). This means that the reports which arrived with a delay longer than 15 days are set to have have arrived after 15 days. Altogether, this gives small discrepancies when compared with the results of the paper. However, as mentioned in the paper, longer delays were not very relevant for the nowcasting.
data(husO104Hosp)
data(husO104Hosp)
A data.frame
object.
Data were collected during the outbreak as part of the mandatory reporting of notifiable diseases in Germany (Faensen et al., 2006). Here, reports are transmitted from the local health authorities via the state health authorities to the Robert Koch Institute, Berlin. The resulting reporting triangle corresponds to Fig. 1 of the Web appendix of Höhle and an der Heiden (2014).
Höhle M and an der Heiden, M (2014). Bayesian Nowcasting during the STEC O104:H4 Outbreak in Germany, 2011, In revision for Biometrics.
Frank C, Werber D, Cramer JP, Askar M, Faber M, an der Heiden M, Bernard H, Fruth A, Prager R, Spode A, Wadl M, Zoufaly A, Jordan S, Kemper MJ, Follin P, Müller L, King LA, Rosner B, Buchholz U, Stark K, Krause G; HUS Investigation Team (2011). Epidemic Profile of Shiga-Toxin Producing Escherichia coli O104:H4 Outbreak in Germany, N Engl J Med. 2011 Nov 10;365(19):1771-80.
Faensen D, Claus H, Benzler J, Ammon A, Pfoch T, Breuer T, Krause G (2014). SurvNet@RKI - a multistate electronic reporting system for communicable diseases, Euro Surveillance, 2006;11(4):100-103.
imdepi
contains data on the spatio-temporal location of 636
cases of invasive meningococcal disease (IMD) caused by the two most
common meningococcal finetypes in Germany, ‘B:P1.7-2,4:F1-5’ (of
serogroup B) and ‘C:P1.5,2:F3-3’ (of serogroup C).
data("imdepi")
data("imdepi")
imdepi
is an object of class
"epidataCS"
(a list with components events
,
stgrid
, W
and qmatrix
).
The imdepi
data is a simplified version of what has been
analyzed by Meyer et al. (2012). Simplification is with respect to the
temporal resolution of the stgrid
(see below) to be used in
twinstim
's endemic model component.
In what follows, we describe the elements events
,
stgrid
, W
, and qmatrix
of imdepi
in
greater detail.
imdepi$events
is a "SpatialPointsDataFrame"
object (ETRS89 projection, i.e. EPSG code 3035, with unit ‘km’)
containing 636 events, each with the following entries:
Time of the case occurrence measured in number of days
since origin. Note that a U(0,1)-distributed random number has
been subtracted from each of the original event times (days) to
break ties (using
untie(imdepi_tied, amount=list(t=1))
).
Tile ID in the spatio-temporal grid (stgrid
) of
endemic covariates, where the event is contained in.
This corresponds to one of the 413 districts of Germany.
Event type, a factor with levels "B"
and "C"
.
Maximum temporal interaction range for the event. Here set to 30 days.
Maximum spatial interaction range for the event. Here set to 200 km.
Sex of the case, i.e. a factor with levels "female"
and "male"
. Note: for some cases this
information is not available (NA
).
Factor giving the age group of the case,
i.e. 0-2, 3-18 or >=19. Note: for one case this
information is not available (NA
).
Block ID and start time (in days since origin) of the cell in the spatio-temporal endemic covariate grid, which the event belongs to.
Population density (per square km) at the location of the event (corresponds to population density of the district where the event is located).
There are further auxiliary columns attached to the events' data
the names of which begin with a . (dot): These are created during
conversion to the "epidataCS"
class and are necessary for
fitting the data with twinstim
, see the description of the
"epidataCS"
-class.
With coordinates(imdepi$events)
one obtains the (x,y) locations
of the events.
The district identifier in tile
is indexed according to
the German official municipality key (
“Amtlicher Gemeindeschlüssel”). See
https://de.wikipedia.org/wiki/Amtlicher_Gemeindeschl%C3%BCssel
for details.
The data component stgrid
contains the spatio-temporal grid of
endemic covariate information. In addition to the usual bookkeeping
variables this includes:
Area of the district tile
in square kilometers.
Population density (inhabitants per square kilometer) computed from DESTATIS (Federal Statistical Office) information (Date: 31.12.2008) on communities level (LAU2) aggregated to district level (NUTS3).
We have actually not included any time-dependent covariates here, we just established this grid with a (reduced -> fast) temporal resolution of monthly intervals so that we can model endemic time trends and seasonality (in this discretized time).
The entry W
contains the observation window as a
"SpatialPolygons"
object, in this case the
boundaries of Germany (stateD
). It was obtained as the
“UnaryUnion” of Germany's districts (districtsD
) as at
2009-01-01, simplified by the “modified Visvalingam” algorithm
(level 6.6%) available at https://MapShaper.org (v. 0.1.17).
The objects districtsD
and stateD
are contained in
system.file("shapes", "districtsD.RData", package="surveillance")
.
The entry qmatrix
is a identity matrix
indicating that no transmission between the two finetypes can occur.
IMD case reports: German Reference Centre for Meningococci at the Department of Hygiene and Microbiology, Julius-Maximilians-Universität Würzburg, Germany (https://www.hygiene.uni-wuerzburg.de/meningococcus/). Thanks to Dr. Johannes Elias and Prof. Dr. Ulrich Vogel for providing the data.
Shapefile of Germany's districts as at 2009-01-01: German Federal Agency for Cartography and Geodesy, Frankfurt am Main, Germany, https://gdz.bkg.bund.de/.
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
the data class "epidataCS"
, and function
twinstim
for model fitting.
data("imdepi") # Basic information print(imdepi, n=5, digits=2) # What is an epidataCS-object? str(imdepi, max.level=4) names(imdepi$events@data) # => events data.frame has hidden columns sapply(imdepi$events@data, class) # marks and print methods ignore these auxiliary columns # look at the B type only imdepiB <- subset(imdepi, type == "B") #<- subsetting applies to the 'events' component imdepiB # select only the last 10 events tail(imdepi, n=10) # there is also a corresponding 'head' method # Access event marks str(marks(imdepi)) # there is an update-method which assures that the object remains valid # when changing parameters like eps.s, eps.t or qmatrix update(imdepi, eps.t = 20) # Summary s <- summary(imdepi) s str(s) # Step function of number of infectives plot(s$counter, xlab = "Time [days]", ylab = "Number of infectious individuals", main = "Time series of IMD assuming 30 days infectious period") # distribution of number of potential sources of infection opar <- par(mfrow=c(1,2), las=1) for (type in c("B","C")) { plot(100*prop.table(table(s$nSources[s$eventTypes==type])), xlim=range(s$nSources), xlab = "Number of potential epidemic sources", ylab = "Proportion of events [%]") } par(opar) # a histogram of the number of events along time (using the # plot-method for the epidataCS-class, see ?plot.epidataCS) opar <- par(mfrow = c(2,1)) plot(imdepi, "time", subset = type == "B", main = "Finetype B") plot(imdepi, "time", subset = type == "C", main = "Finetype C") par(opar) # Plot the spatial distribution of the events in W plot(imdepi, "space", points.args = list(col=c("indianred", "darkblue"))) # or manually (no legends, no account for tied locations) plot(imdepi$W, lwd=2, asp=1) plot(imdepi$events, pch=c(3,4)[imdepi$events$type], cex=0.8, col=c("indianred", "darkblue")[imdepi$events$type], add=TRUE) ## Not run: # Show a dynamic illustration of the spatio-temporal dynamics of the # spread during the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, sleep=0.1) ## End(Not run)
data("imdepi") # Basic information print(imdepi, n=5, digits=2) # What is an epidataCS-object? str(imdepi, max.level=4) names(imdepi$events@data) # => events data.frame has hidden columns sapply(imdepi$events@data, class) # marks and print methods ignore these auxiliary columns # look at the B type only imdepiB <- subset(imdepi, type == "B") #<- subsetting applies to the 'events' component imdepiB # select only the last 10 events tail(imdepi, n=10) # there is also a corresponding 'head' method # Access event marks str(marks(imdepi)) # there is an update-method which assures that the object remains valid # when changing parameters like eps.s, eps.t or qmatrix update(imdepi, eps.t = 20) # Summary s <- summary(imdepi) s str(s) # Step function of number of infectives plot(s$counter, xlab = "Time [days]", ylab = "Number of infectious individuals", main = "Time series of IMD assuming 30 days infectious period") # distribution of number of potential sources of infection opar <- par(mfrow=c(1,2), las=1) for (type in c("B","C")) { plot(100*prop.table(table(s$nSources[s$eventTypes==type])), xlim=range(s$nSources), xlab = "Number of potential epidemic sources", ylab = "Proportion of events [%]") } par(opar) # a histogram of the number of events along time (using the # plot-method for the epidataCS-class, see ?plot.epidataCS) opar <- par(mfrow = c(2,1)) plot(imdepi, "time", subset = type == "B", main = "Finetype B") plot(imdepi, "time", subset = type == "C", main = "Finetype C") par(opar) # Plot the spatial distribution of the events in W plot(imdepi, "space", points.args = list(col=c("indianred", "darkblue"))) # or manually (no legends, no account for tied locations) plot(imdepi$W, lwd=2, asp=1) plot(imdepi$events, pch=c(3,4)[imdepi$events$type], cex=0.8, col=c("indianred", "darkblue")[imdepi$events$type], add=TRUE) ## Not run: # Show a dynamic illustration of the spatio-temporal dynamics of the # spread during the first year of type B with a step size of 7 days animate(imdepiB, interval=c(0,365), time.spacing=7, sleep=0.1) ## End(Not run)
twinstim
Fit for the imdepi
Data
data("imdepifit")
is a twinstim
model
fitted to the imdepi
data.
data("imdepifit")
data("imdepifit")
an object of class "twinstim"
obtained from the following call using data(imdepi)
:
twinstim(endemic = addSeason2formula(~offset(log(popdensity)) +
I(start/365 - 3.5), S = 1,
period = 365, timevar = "start"),
epidemic = ~type + agegrp,
siaf = siaf.gaussian(),
data = imdepi, subset = !is.na(agegrp),
optim.args = list(control = list(reltol = sqrt(.Machine$double.eps))),
model = FALSE, cumCIF = FALSE)
common methods for "twinstim"
fits,
exemplified using imdepifit
, e.g.,
summary.twinstim
, plot.twinstim
,
and simulate.twinstim
data("imdepi", "imdepifit") ## how this fit was obtained imdepifit$call
data("imdepi", "imdepifit") ## how this fit was obtained imdepifit$call
Weekly counts of new influenza and meningococcal infections in Germany 2001-2006.
data(influMen)
data(influMen)
A disProg
object containing
observations starting from week 1 in 2001 to week 52 in 2006.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/. Queried on 25 July 2007.
data(influMen) plot(influMen, as.one=FALSE, same.scale=FALSE)
data(influMen) plot(influMen, as.one=FALSE, same.scale=FALSE)
Generic function for plotting paths of point process intensities.
Methods currently defined in package surveillance are for
classes "twinSIR"
and "simEpidata"
(temporal), as well as
"twinstim"
and "simEpidataCS"
(spatio-temporal).
intensityplot(x, ...)
intensityplot(x, ...)
x |
An object for which an |
... |
Arguments passed to the corresponding method. |
The methods intensityplot.twinSIR
and
intensityplot.twinstim
.
This is a unifying wrapper around functionality of various packages dealing with spatial data. It computes the intersection of a circular domain and a polygonal domain (whose class defines the specific method).
Currently the only supported class is "owin"
from package spatstat.geom.
intersectPolyCircle(object, center, radius, ...) ## S3 method for class 'owin' intersectPolyCircle(object, center, radius, npoly = 32, ...)
intersectPolyCircle(object, center, radius, ...) ## S3 method for class 'owin' intersectPolyCircle(object, center, radius, npoly = 32, ...)
object |
a polygonal domain of one of the supported classes. |
center , radius , npoly
|
see |
... |
potential further arguments (from the generic). |
a polygonal domain of the same class as the input object
.
Sebastian Meyer
discpoly
to generate a polygonal approximation to a disc
letterR <- surveillance:::LETTERR # an "owin" (internally used for checks) plot(letterR, axes = TRUE) plot(intersectPolyCircle(letterR, center = c(-1, 2), radius = 2), add = TRUE, col = 4, lwd = 3)
letterR <- surveillance:::LETTERR # an "owin" (internally used for checks) plot(letterR, axes = TRUE) plot(intersectPolyCircle(letterR, center = c(-1, 2), radius = 2), add = TRUE, col = 4, lwd = 3)
The function isoWeekYear
extracts the year and week of a
Date
according to the ISO 8601 specification.
isoWeekYear(Y, M, D)
isoWeekYear(Y, M, D)
Y |
year(s) or a Date/POSIXt object. Can be a vector. |
M |
month(s), only used if |
D |
day(s), only used if |
A list with entries ISOYear
and ISOWeek
containing the
corresponding results.
As from surveillance 1.17.0, this function simply
calls strftime
with format strings "%G"
and "%V"
, respectively, as this is nowadays (R >= 3.1.0)
also supported on Windows.
dates <- as.Date(c("2002-12-31","2003-01-01","2003-01-06")) isoWeekYear(dates) ## the same using numeric inputs: isoWeekYear(Y = c(2002, 2003, 2003), M = c(12, 1, 1), D = c(31, 1, 6))
dates <- as.Date(c("2002-12-31","2003-01-01","2003-01-06")) isoWeekYear(dates) ## the same using numeric inputs: isoWeekYear(Y = c(2002, 2003, 2003), M = c(12, 1, 1), D = c(31, 1, 6))
Given temporal and spatial distances as well as corresponding critical
thresholds defining what “close” means, the function
knox
performs Knox (1963, 1964) test for space-time interaction.
The corresponding p-value can be calculated either by the Poisson
approximation or by a Monte Carlo permutation approach (Mantel, 1967)
with support for parallel computation via plapply
.
There is a simple plot
-method showing a truehist
of
the simulated null distribution together with the expected and observed
values.
This implementation of the Knox test is due to Meyer et al. (2016).
knox(dt, ds, eps.t, eps.s, simulate.p.value = TRUE, B = 999, ...) ## S3 method for class 'knox' plot(x, ...)
knox(dt, ds, eps.t, eps.s, simulate.p.value = TRUE, B = 999, ...) ## S3 method for class 'knox' plot(x, ...)
dt , ds
|
numeric vectors containing temporal and spatial distances, respectively.
Logical vectors indicating temporal/spatial closeness may also be
supplied, in which case |
eps.t , eps.s
|
Critical distances defining closeness in time and space, respectively. Distances lower than or equal to the critical distance are considered “"close"”. |
simulate.p.value |
logical indicating if a Monte Carlo permutation test should be
performed (as per default). Do not forget to set the
|
B |
number of permutations for the Monte Carlo approach. |
... |
arguments configuring |
x |
an object of class |
an object of class "knox"
(inheriting from "htest"
),
which is a list with the following components:
method |
a character string indicating the type of test performed, and whether the Poisson approximation or Monte Carlo simulation was used. |
data.name |
a character string giving the supplied |
statistic |
the number of close pairs. |
parameter |
if |
p.value |
the p-value for the test. In case
|
alternative |
the character string |
null.value |
the expected number of close pairs in the absence of space-time interaction. |
table |
the contingency table of |
The plot
-method invisibly returns NULL
.
A toLatex
-method exists, which generates LaTeX code for the
contingency table associated with the Knox test.
The Poisson approximation works well if the proportions of close pairs in both time and space are small (Kulldorff and Hjalmars, 1999), otherwise the Monte Carlo permutation approach is recommended.
Sebastian Meyer
Knox, G. (1963): Detection of low intensity epidemicity: application to cleft lip and palate. British Journal of Preventive & Social Medicine, 17, 121-127.
Knox, E. G. (1964): The detection of space-time interactions. Journal of the Royal Statistical Society. Series C (Applied Statistics), 13, 25-30.
Kulldorff, M. and Hjalmars, U. (1999): The Knox method and other tests for space-time interaction. Biometrics, 55, 544-552.
Mantel, N. (1967): The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209-220.
Meyer, S., Warnke, I., Rössler, W. and Held, L. (2016): Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-temporal Epidemiology, 17, 15-25. doi:10.1016/j.sste.2016.03.002. Eprint: https://arxiv.org/abs/1512.09052.
The function mantel.randtest
in package ade4
implements Mantel's (1967) space-time interaction test, i.e., using
the Pearson correlation between the spatial and temporal distances of
all event pairs as the test statistic, and assessing statistical
significance using a Monte Carlo permutation approach as with
simulate.p.value
here in the knox
function.
To combine information from different scales eps.t
and
eps.s
while also handling edge effects, the space-time
K-function test available via stKtest
can be used.
Function epitest
tests epidemicity in a
"twinstim"
point process model.
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Perform the Knox test using the Poisson approximation knoxtest <- knox( dt = dist(imdepiB$events$time), eps.t = 30, ds = dist(coordinates(imdepiB$events)), eps.s = 50, simulate.p.value = FALSE ) knoxtest ## The Poisson approximation works well for these data since ## the proportion of close pairs is rather small (204/56280). ## contingency table in LaTeX toLatex(knoxtest) ## Obtain the p-value via a Monte Carlo permutation test, ## where the permutations can be computed in parallel ## (using forking on Unix-alikes and a cluster on Windows, see ?plapply) knoxtestMC <- knox( dt = dist(imdepiB$events$time), eps.t = 30, ds = dist(coordinates(imdepiB$events)), eps.s = 50, simulate.p.value = TRUE, B = 99, # limited here for speed .parallel = 2, .seed = 1, .verbose = FALSE ) knoxtestMC plot(knoxtestMC)
data("imdepi") imdepiB <- subset(imdepi, type == "B") ## Perform the Knox test using the Poisson approximation knoxtest <- knox( dt = dist(imdepiB$events$time), eps.t = 30, ds = dist(coordinates(imdepiB$events)), eps.s = 50, simulate.p.value = FALSE ) knoxtest ## The Poisson approximation works well for these data since ## the proportion of close pairs is rather small (204/56280). ## contingency table in LaTeX toLatex(knoxtest) ## Obtain the p-value via a Monte Carlo permutation test, ## where the permutations can be computed in parallel ## (using forking on Unix-alikes and a cluster on Windows, see ?plapply) knoxtestMC <- knox( dt = dist(imdepiB$events$time), eps.t = 30, ds = dist(coordinates(imdepiB$events)), eps.s = 50, simulate.p.value = TRUE, B = 99, # limited here for speed .parallel = 2, .seed = 1, .verbose = FALSE ) knoxtestMC plot(knoxtestMC)
This plot function takes a univariate sample that should be tested for
a U(0,1) distribution, plots its empirical cumulative distribution
function (ecdf
), and adds a confidence band by inverting
the corresponding Kolmogorov-Smirnov test (ks.test
). The
uniform distribution is rejected if the ECDF is not completely inside
the confidence band.
ks.plot.unif(U, conf.level = 0.95, exact = NULL, col.conf = "gray", col.ref = "gray", xlab = expression(u[(i)]), ylab = "Cumulative distribution")
ks.plot.unif(U, conf.level = 0.95, exact = NULL, col.conf = "gray", col.ref = "gray", xlab = expression(u[(i)]), ylab = "Cumulative distribution")
U |
numeric vector containing the sample. Missing values are (silently) ignored. |
conf.level |
confidence level for the K-S-test (defaults to 0.95), can also be a vector of multiple levels. |
exact |
see |
col.conf |
colour of the confidence lines. |
col.ref |
colour of the diagonal reference line. |
xlab , ylab
|
axis labels. |
NULL
(invisibly).
Michael Höhle and Sebastian Meyer.
The code re-uses fragments from the ks.test source file https://svn.R-project.org/R/trunk/src/library/stats/R/ks.test.R, with Copyright (C) 1995-2022 The R Core Team, available under GPL-2 (or later), and C functionality from the source file https://svn.R-project.org/R/trunk/src/library/stats/src/ks.c, partially based on code published in Marsaglia et al. (2003), with Copyright (C) 1999-2022 The R Core Team, also available under GPL-2 (or later).
George Marsaglia and Wai Wan Tsang and Jingbo Wang (2003): Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8 (18). doi:10.18637/jss.v008.i18
ks.test
for the Kolmogorov-Smirnov test, as well as
checkResidualProcess
, which makes use of this plot
function.
samp <- runif(99) ks.plot.unif(samp, conf.level=c(0.95, 0.99), exact=TRUE) ks.plot.unif(samp, conf.level=c(0.95, 0.99), exact=FALSE)
samp <- runif(99) ks.plot.unif(samp, conf.level=c(0.95, 0.99), exact=TRUE) ks.plot.unif(samp, conf.level=c(0.95, 0.99), exact=FALSE)
spplot
Generate sp.layout
items for use by spplot
or plot these items directly in the traditional graphics system.
Function layout.labels
draws labels at the coordinates of the
spatial object, and layout.scalebar
returns a labeled scale bar.
layout.labels(obj, labels = TRUE, plot = FALSE) layout.scalebar(obj, corner = c(0.05, 0.95), scale = 1, labels = c(0, scale), height = 0.05, pos = 3, ..., plot = FALSE)
layout.labels(obj, labels = TRUE, plot = FALSE) layout.scalebar(obj, corner = c(0.05, 0.95), scale = 1, labels = c(0, scale), height = 0.05, pos = 3, ..., plot = FALSE)
obj |
an object inheriting from a |
labels |
specification of the labels. For
For |
corner |
the location of the scale bar in the unit square, where
|
scale |
the width of the scale bar in the units of |
height |
the height of the scale bar, see |
pos |
a position specifier for the labels (see |
... |
further arguments for |
plot |
logical indicating if the layout item should be plotted using the
traditional graphics system. By default ( |
For layout.labels
, a single sp.layout
item, which is
a list with first element "panel.text"
and subsequent elements
being arguments to that function based on the labels
specification.
For layout.scalebar
, a list of sp.layout
items
comprising the polygonal scale bar and the labels.
If these layout functions are called with plot = TRUE
,
the item is plotted directly using traditional graphics functions
and NULL
is returned.
Sebastian Meyer
## districts in the Regierungsbezirk Weser-Ems (longlat coordinates) data("measlesWeserEms") mapWE <- measlesWeserEms@map li1 <- layout.labels(mapWE, labels = list(font=2, labels="GEN")) li2 <- layout.scalebar(mapWE, corner = c(0.05, 0.05), scale = 20, labels = c("0", "20 km")) spplot(mapWE, zcol = "AREA", sp.layout = c(list(li1), li2), col.regions = rev(heat.colors(100)), scales = list(draw = TRUE)) ## districts in Bavaria (projected coordinates) load(system.file("shapes", "districtsD.RData", package = "surveillance")) bavaria <- districtsD[substr(row.names(districtsD), 1, 2) == "09", ] sb <- layout.scalebar(bavaria, corner = c(0.75,0.9), scale = 50, labels = c("0", "50 km"), cex = 0.8) spplot(bavaria, zcol = "POPULATION", sp.layout = sb, xlab = "x [km]", ylab = "y [km]", scales = list(draw = TRUE), col.regions = rev(heat.colors(100))) ## these layout functions also work in the traditional graphics system par(mar = c(0,0,0,0)) plot(bavaria, col = "lavender") layout.scalebar(bavaria, corner = c(0.75, 0.9), scale = 50, labels = c("0", "50 km"), plot = TRUE) layout.labels(bavaria, labels = list(cex = 0.8, labels = substr(bavaria$GEN, 1, 3)), plot = TRUE)
## districts in the Regierungsbezirk Weser-Ems (longlat coordinates) data("measlesWeserEms") mapWE <- measlesWeserEms@map li1 <- layout.labels(mapWE, labels = list(font=2, labels="GEN")) li2 <- layout.scalebar(mapWE, corner = c(0.05, 0.05), scale = 20, labels = c("0", "20 km")) spplot(mapWE, zcol = "AREA", sp.layout = c(list(li1), li2), col.regions = rev(heat.colors(100)), scales = list(draw = TRUE)) ## districts in Bavaria (projected coordinates) load(system.file("shapes", "districtsD.RData", package = "surveillance")) bavaria <- districtsD[substr(row.names(districtsD), 1, 2) == "09", ] sb <- layout.scalebar(bavaria, corner = c(0.75,0.9), scale = 50, labels = c("0", "50 km"), cex = 0.8) spplot(bavaria, zcol = "POPULATION", sp.layout = sb, xlab = "x [km]", ylab = "y [km]", scales = list(draw = TRUE), col.regions = rev(heat.colors(100))) ## these layout functions also work in the traditional graphics system par(mar = c(0,0,0,0)) plot(bavaria, col = "lavender") layout.scalebar(bavaria, corner = c(0.75, 0.9), scale = 50, labels = c("0", "50 km"), plot = TRUE) layout.labels(bavaria, labels = list(cex = 0.8, labels = substr(bavaria$GEN, 1, 3)), plot = TRUE)
The function is used to convert an individual line list of cases to an aggregated time series of counts based on event date information of the cases.
linelist2sts(linelist,dateCol, aggregate.by=c("1 day", "1 week", "7 day", "1 week", "1 month", "3 month", "1 year"), dRange=NULL, epochInPeriodStr=switch(aggregate.by, "1 day"="1", "1 week"="%u", "1 month"="%d","3 month"="%q","1 year"="%j"), startYearFormat=switch(aggregate.by,"1 day"="%Y", "7 day"="%G", "1 week"="%G","1 month"="%Y","3 month"="%Y","1 year"="%Y"), startEpochFormat=switch(aggregate.by,"1 day"="%j", "7 day"="%V", "1 week"="%V", "1 month"="%m", "3 month"="%Q", "1 year"="1") )
linelist2sts(linelist,dateCol, aggregate.by=c("1 day", "1 week", "7 day", "1 week", "1 month", "3 month", "1 year"), dRange=NULL, epochInPeriodStr=switch(aggregate.by, "1 day"="1", "1 week"="%u", "1 month"="%d","3 month"="%q","1 year"="%j"), startYearFormat=switch(aggregate.by,"1 day"="%Y", "7 day"="%G", "1 week"="%G","1 month"="%Y","3 month"="%Y","1 year"="%Y"), startEpochFormat=switch(aggregate.by,"1 day"="%j", "7 day"="%V", "1 week"="%V", "1 month"="%m", "3 month"="%Q", "1 year"="1") )
linelist |
A |
dateCol |
A character string stating the column name in
|
aggregate.by |
Temporal aggregation level given as a string, see
the |
dRange |
A vector containing the minimum and maximum date
for doing the aggregation. If not specified these dates are
extracted automatically by taking |
epochInPeriodStr |
|
startYearFormat |
|
startEpochFormat |
|
The date range is automatically extended such that the starting and
ending dates are always the first epoch within the period, i.e. for
aggregation by week it is moved to Mondays. This is controlled by the
epochInPeriodStr
parameter.
Please note that the formatting strings are implemented by the
formatDate
function, which uses strptime
formatting strings as well as
formatting of quarters via "%Q", "%OQ" and "%q".
The function returns an object of class "sts"
.
The freq
slot might not be appropriate.
Michael Höhle
seq.Date
, strptime
, formatDate
#Load O104 outbreak data data("husO104Hosp") #Convert line list to an sts object sts <- linelist2sts(husO104Hosp, dateCol="dHosp", aggregate.by="1 day") #Check that the number of cases is correct all.equal(sum(observed(sts)),nrow(husO104Hosp)) #Plot the result plot(sts,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange), xaxis.labelFormat="%d %b", xlab="",las=2,cex.axis=0.8)
#Load O104 outbreak data data("husO104Hosp") #Convert line list to an sts object sts <- linelist2sts(husO104Hosp, dateCol="dHosp", aggregate.by="1 day") #Check that the number of cases is correct all.equal(sum(observed(sts)),nrow(husO104Hosp)) #Plot the result plot(sts,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange), xaxis.labelFormat="%d %b", xlab="",las=2,cex.axis=0.8)
Compute run length for a count data or categorical CUSUM. The computations are based on a Markov representation of the likelihood ratio based CUSUM.
LRCUSUM.runlength(mu, mu0, mu1, h, dfun, n, g=5, outcomeFun=NULL, ...)
LRCUSUM.runlength(mu, mu0, mu1, h, dfun, n, g=5, outcomeFun=NULL, ...)
mu |
|
mu0 |
|
mu1 |
|
h |
The threshold h which is used for the CUSUM. |
dfun |
The probability mass function or density used to compute
the likelihood ratios of the CUSUM. In a negative binomial CUSUM
this is |
n |
Vector of length |
g |
The number of levels to cut the state space into when
performing the Markov chain approximation. Sometimes also denoted
|
outcomeFun |
A hook |
... |
Additional arguments to send to |
Brook and Evans (1972) formulated an approximate approach based
on Markov chains to determine the PMF of the run length of a
time-constant CUSUM detector. They describe the dynamics of the CUSUM
statistic by a Markov chain with a discretized state space of
size . This is adopted to the time varying case in
Höhle (2010) and implemented in R using the ... notation
such that it works for a very large class of distributions.
A list with five components
P |
An array of |
pmf |
Probability mass function (up to length |
cdf |
Cumulative density function (up to length |
arl |
If the model is time homogeneous (i.e. if |
M. Höhle
Höhle, M. (2010): Online change-point detection in categorical time series. In: T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures - Festschrift in Honour of Ludwig Fahrmeir, Physica-Verlag, pp. 377-397. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle2010-preprint.pdf
Höhle, M. and Mazick, A. (2010): Aberration detection in R illustrated by Danish mortality monitoring. In: T. Kass-Hout and X. Zhang (Eds.), Biosurveillance: A Health Protection Priority, CRCPress. Preprint available as https://staff.math.su.se/hoehle/pubs/hoehle_mazick2009-preprint.pdf
Brook, D. and Evans, D. A. (1972): An approach to the probability distribution of cusum run length. Biometrika 59(3):539-549.
###################################################### #Run length of a time constant negative binomial CUSUM ###################################################### #In-control and out of control parameters mu0 <- 10 alpha <- 1/2 kappa <- 2 #Density for comparison in the negative binomial distribution dY <- function(y,mu,log=FALSE, alpha, ...) { dnbinom(y, mu=mu, size=1/alpha, log=log) } #In this case "n" is the maximum value to investigate the LLR for #It is assumed that beyond n the LLR is too unlikely to be worth #computing. LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=5, dfun = dY, n=rep(100,length(mu0)), alpha=alpha) h.grid <- seq(3,6,by=0.3) arls <- sapply(h.grid, function(h) { LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=h, dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20)$arl }) plot(h.grid, arls,type="l",xlab="threshold h",ylab=expression(ARL[0])) ###################################################### #Run length of a time varying negative binomial CUSUM ###################################################### mu0 <- matrix(5*sin(2*pi/52 * 1:150) + 10,ncol=1) rl <- LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=2, dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20) plot(1:length(mu0),rl$pmf,type="l",xlab="t",ylab="PMF") plot(1:length(mu0),rl$cdf,type="l",xlab="t",ylab="CDF") ######################################################## # Further examples contain the binomial, beta-binomial # and multinomial CUSUMs. Hopefully, these will be added # in the future. ######################################################## #dfun function for the multinomial distribution (Note: Only k-1 categories are specified). dmult <- function(y, size,mu, log = FALSE) { return(dmultinom(c(y,size-sum(y)), size = size, prob=c(mu,1-sum(mu)), log = log)) } #Example for the time-constant multinomial distribution #with size 100 and in-control and out-of-control parameters as below. n <- 100 pi0 <- as.matrix(c(0.5,0.3,0.2)) pi1 <- as.matrix(c(0.38,0.46,0.16)) #ARL_0 LRCUSUM.runlength(mu=pi0[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE], h=5,dfun=dmult, n=n, g=15)$arl #ARL_1 LRCUSUM.runlength(mu=pi1[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE], h=5,dfun=dmult, n=n, g=15)$arl
###################################################### #Run length of a time constant negative binomial CUSUM ###################################################### #In-control and out of control parameters mu0 <- 10 alpha <- 1/2 kappa <- 2 #Density for comparison in the negative binomial distribution dY <- function(y,mu,log=FALSE, alpha, ...) { dnbinom(y, mu=mu, size=1/alpha, log=log) } #In this case "n" is the maximum value to investigate the LLR for #It is assumed that beyond n the LLR is too unlikely to be worth #computing. LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=5, dfun = dY, n=rep(100,length(mu0)), alpha=alpha) h.grid <- seq(3,6,by=0.3) arls <- sapply(h.grid, function(h) { LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=h, dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20)$arl }) plot(h.grid, arls,type="l",xlab="threshold h",ylab=expression(ARL[0])) ###################################################### #Run length of a time varying negative binomial CUSUM ###################################################### mu0 <- matrix(5*sin(2*pi/52 * 1:150) + 10,ncol=1) rl <- LRCUSUM.runlength( mu=t(mu0), mu0=t(mu0), mu1=kappa*t(mu0), h=2, dfun = dY, n=rep(100,length(mu0)), alpha=alpha,g=20) plot(1:length(mu0),rl$pmf,type="l",xlab="t",ylab="PMF") plot(1:length(mu0),rl$cdf,type="l",xlab="t",ylab="CDF") ######################################################## # Further examples contain the binomial, beta-binomial # and multinomial CUSUMs. Hopefully, these will be added # in the future. ######################################################## #dfun function for the multinomial distribution (Note: Only k-1 categories are specified). dmult <- function(y, size,mu, log = FALSE) { return(dmultinom(c(y,size-sum(y)), size = size, prob=c(mu,1-sum(mu)), log = log)) } #Example for the time-constant multinomial distribution #with size 100 and in-control and out-of-control parameters as below. n <- 100 pi0 <- as.matrix(c(0.5,0.3,0.2)) pi1 <- as.matrix(c(0.38,0.46,0.16)) #ARL_0 LRCUSUM.runlength(mu=pi0[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE], h=5,dfun=dmult, n=n, g=15)$arl #ARL_1 LRCUSUM.runlength(mu=pi1[1:2,,drop=FALSE],mu0=pi0[1:2,,drop=FALSE],mu1=pi1[1:2,,drop=FALSE], h=5,dfun=dmult, n=n, g=15)$arl
14 datasets for different diseases beginning in 2001 to the 3rd Quarter of 2004 including their defined outbreaks.
m1
'Masern' in the 'Landkreis Nordfriesland' (Germany, Schleswig-Holstein)
m2
'Masern' in the 'Stadt- und Landkreis Coburg' (Germany, Bayern)
m3
'Masern' in the 'Kreis Leer' (Germany, Niedersachsen)
m4
'Masern' in the 'Stadt- und Landkreis Aachen' (Germany, Nordrhein-Westfalen)
m5
'Masern' in the 'Stadt Verden' (Germany, Niedersachsen)
q1_nrwh
'Q-Fieber' in the 'Hochsauerlandkreis' (Germany, Westfalen)
and in the 'Landkreis Waldeck-Frankenberg' (Germany, Hessen)
q2
'Q-Fieber' in 'München' (Germany, Bayern)
s1
'Salmonella Oranienburg' in Germany
s2
'Salmonella Agona' in 12 'Bundesländern' of Germany
s3
'Salmonella Anatum' in Germany
k1
'Kryptosporidiose' in Germany, 'Baden-Württemberg'
n1
'Norovirus' in 'Stadtkreis Berlin Mitte' (Germany, Berlin)
n2
'Norovirus' in 'Torgau-Oschatz' (Germany, Sachsen)
h1_nrwrp
'Hepatitis A' in 'Oberbergischer Kreis, Olpe, Rhein-Sieg-kreis'
(Germany, Nordrhein-Westfalen) and 'Siegenwittgenstein Altenkirchen' (Germany, Rheinland-Pfalz)
data(m1)
data(m1)
disProg
objects each containing 209 observations (weekly on 52 weeks)
Number of counts in the corresponding week
Boolean whether there was an outbreak.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; m1 and m3 were queried on 10 November 2004. The rest during September 2004.
data(k1) survResObj <- algo.rki1(k1, control=list(range=27:192)) plot(survResObj, "RKI 1", "k1")
data(k1) survResObj <- algo.rki1(k1, control=list(range=27:192)) plot(survResObj, "RKI 1", "k1")
For a given number k
, magic.dim
provides a vector
containing two elements, the number of rows (k1) and columns (k2),
respectively, which can be used to set the
dimension of a single graphic device so that k1*k2 plots can be
drawn by row (or by column) on the device.
magic.dim(k)
magic.dim(k)
k |
an integer |
numeric vector with two elements
primeFactors
and bestCombination
which are
internally used to complete the task.
n2mfrow
is a similar function from package grDevices.
control
Settings for an hhh4
ModelGenerate control
Settings for an hhh4
Model
makeControl(f = list(~1), S = list(0, 0, 1), period = 52, offset = 1, ...)
makeControl(f = list(~1), S = list(0, 0, 1), period = 52, offset = 1, ...)
f , S , period
|
arguments for |
offset |
multiplicative component offsets in the order ( |
... |
further elements for the |
a list for use as the control
argument in hhh4
.
makeControl() ## a simplistic model for the fluBYBW data ## (first-order transmission only, no district-specific intercepts) data("fluBYBW") mycontrol <- makeControl( f = list(~1, ~1, ~t), S = c(1, 1, 3), offset = list(population(fluBYBW)), # recycled -> in all components ne = list(normalize = TRUE), verbose = TRUE) str(mycontrol) if (surveillance.options("allExamples")) ## fit this model fit <- hhh4(fluBYBW, mycontrol)
makeControl() ## a simplistic model for the fluBYBW data ## (first-order transmission only, no district-specific intercepts) data("fluBYBW") mycontrol <- makeControl( f = list(~1, ~1, ~t), S = c(1, 1, 3), offset = list(population(fluBYBW)), # recycled -> in all components ne = list(normalize = TRUE), verbose = TRUE) str(mycontrol) if (surveillance.options("allExamples")) ## fit this model fit <- hhh4(fluBYBW, mycontrol)
The generic function marks
is imported from package spatstat.geom.
See spatstat.geom::marks
for spatstat.geom's
own methods, and marks.epidataCS
for the
"epidataCS"
-specific method.
Weekly counts of new measles cases for the 17 administrative
districts (NUTS-3 level) of the “Weser-Ems” region of Lower
Saxony, Germany, during 2001 and 2002, as reported to the Robert Koch
institute according to the Infection Protection Act
(“Infektionsschutzgesetz”, IfSG).data("measlesWeserEms")
is a corrected version of
data("measles.weser")
(see Format section below).
These data are illustrated and analyzed in Meyer et al. (2017, Section 5),
see vignette("hhh4_spacetime")
.
data("measles.weser") data("measlesWeserEms")
data("measles.weser") data("measlesWeserEms")
data("measles.weser")
is an object of the old "disProg"
class, whereas data("measlesWeserEms")
is of the new class
"sts"
.
Furthermore, the following updates have been applied for
data("measlesWeserEms")
:
it includes the two districts “SK Delmenhorst” (03401) and
“SK Wilhemshaven” (03405) with zero counts, which are ignored in
data("measles.weser")
.
it corrects the time lag error for year 2002 caused by a
redundant pseudo-week “0” with 0 counts only
(the row measles.weser$observed[53,]
is nonsense).
it has one more case attributed to “LK Oldenburg”
(03458) during 2001/W17, i.e., 2 cases instead of 1. This reflects
the official data as of “Jahrbuch 2005”, whereas
data("measles.weser")
is as of “Jahrbuch 2004”.
it contains a map of the region (as a
"SpatialPolygonsDataFrame"
) with the following
variables:
GEN
district label.
AREA
district area in m^2.
POPULATION
number of inhabitants (as of 31/12/2003).
vaccdoc.2004
proportion with a vaccination card among screened abecedarians (2004).
vacc1.2004
proportion with at least one vaccination against measles among abecedarians presenting a vaccination card (2004).
vacc2.2004
proportion of doubly vaccinated abecedarians among the ones presenting their vaccination card at school entry in the year 2004.
it uses the correct format for the official district keys, i.e., 5 digits (initial 0).
its attached neighbourhood matrix is more general: a distance matrix
(neighbourhood orders) instead of just an adjacency indicator matrix
(special case nbOrder == 1
).
population fractions represent data as of 31/12/2003 (LSN,
2004, document “A I 2 - hj 2 / 2003”). There are only
minor differences to the ones used for data("measles.weser")
.
Measles counts were obtained from the public SurvStat database of the Robert Koch institute: https://survstat.rki.de/.
A shapefile of Germany's districts as of 01/01/2009 was obtained from
the German Federal Agency for Cartography and Geodesy
(https://gdz.bkg.bund.de/).
The map of the 17 districts of the “Weser-Ems” region
(measlesWeserEms@map
) is a simplified subset of this
shapefile using a 30% reduction via the Douglas-Peucker reduction method
as implemented at https://MapShaper.org.
Population numbers were obtained from the Federal Statistical Office of Lower Saxony (LSN).
Vaccination coverage was obtained from the public health department of Lower Saxony (NLGA, “Impfreport”).
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
## old "disProg" object data("measles.weser") measles.weser plot(measles.weser, as.one=FALSE) ## new "sts" object (with corrections) data("measlesWeserEms") measlesWeserEms plot(measlesWeserEms)
## old "disProg" object data("measles.weser") measles.weser plot(measles.weser, as.one=FALSE) ## new "sts" object (with corrections) data("measlesWeserEms") measlesWeserEms plot(measlesWeserEms)
Weekly number of measles cases in the 16 states (Bundeslaender) of Germany for years 2005 to 2007.
data(measlesDE)
data(measlesDE)
An "sts"
object containing
observations starting from week 1 in 2005.
The population
slot contains the population fractions
of each state at 31.12.2006, obtained from the Federal Statistical
Office of Germany.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on 14 October 2009.
Herzog, S. A., Paul, M. and Held, L. (2011): Heterogeneity in vaccination coverage explains the size and occurrence of measles epidemics in German surveillance data. Epidemiology and Infection, 139, 505-515. doi:10.1017/S0950268810001664
data(measlesDE) plot(measlesDE) ## aggregate to bi-weekly intervals measles2w <- aggregate(measlesDE, nfreq = 26) plot(measles2w, type = observed ~ time) ## use a date index for nicer x-axis plotting epoch(measles2w) <- seq(as.Date("2005-01-03"), by = "2 weeks", length.out = nrow(measles2w)) plot(measles2w, type = observed ~ time)
data(measlesDE) plot(measlesDE) ## aggregate to bi-weekly intervals measles2w <- aggregate(measlesDE, nfreq = 26) plot(measles2w, type = observed ~ time) ## use a date index for nicer x-axis plotting epoch(measles2w) <- seq(as.Date("2005-01-03"), by = "2 weeks", length.out = nrow(measles2w)) plot(measles2w, type = observed ~ time)
Monthly counts of meningococcal infections in France 1985-1997. Here, the data is split into 4 age groups (<1, 1-5, 5-20, >20).
data(meningo.age)
data(meningo.age)
An object of class disProg
with 156 observations in each of 4 age groups.
Month index
Matrix with number of counts in the corresponding month and age group
Boolean whether there was an outbreak – dummy not implemented
Neighbourhood matrix, all age groups are adjacent
Population fractions
??
data(meningo.age) plot(meningo.age, title="Meningococcal infections in France 1985-97") plot(meningo.age, as.one=FALSE)
data(meningo.age) plot(meningo.age, title="Meningococcal infections in France 1985-97") plot(meningo.age, as.one=FALSE)
Coverage levels at school entry for the first and second dose of the combined measles-mumps-rubella (MMR) vaccine in 2006, estimated from children presenting vaccination documents at school entry examinations.
data(MMRcoverageDE)
data(MMRcoverageDE)
A data.frame
containing 19 rows and 5 columns with variables
Names of states: the 16 federal states are followed by the total of Germany, as well as the total of West and East Germany.
Number of children examined.
Percentage of children who presented vaccination documents.
Percentage of children with vaccination documents, who received at least 1 dose of MMR vaccine.
Percentage of children with vaccination documents, who received at least 2 doses of MMR vaccine.
Coverage levels were derived from vaccination documents presented at medical examinations, which are conducted by local health authorities at school entry each year. Records include information about the receipt of 1st and 2nd doses of MMR, but no information about dates. Note that information from children who did not present a vaccination document on the day of the medical examination, is not included in the estimated coverage.
Robert Koch-Institut (2008) Zu den Impfquoten bei den Schuleingangsuntersuchungen in Deutschland 2006. Epidemiologisches Bulletin, 7, 55-57
Herzog, S.A., Paul, M. and Held, L. (2011) Heterogeneity in vaccination coverage explains the size and occurrence of measles epidemics in German surveillance data. Epidemiology and Infection, 139, 505–515.
Weekly number of all cause mortality from 1994-2008 in each of the eight age groups <1, 1-4, 5-14, 15-44, 45-64, 65-74, 75-84 and 85+ years, see Höhle and Mazick (2010).
data(momo)
data(momo)
An object of class "sts"
containing the weekly
number of all-cause deaths in Denmark, 1994-2008 (782 weeks), for each
of the eight age groups <1, 1-4, 5-14, 15-44, 45-64, 65-74,
75-84 and 85+ years. A special feature of the EuroMOMO data is that
weeks follow the ISO 8601 standard, which can be
handled by the "sts"
class.
The population
slot of the momo
object contains the
population size in each of the eight age groups.
These are yearly data obtained from the StatBank Denmark.
European monitoring of excess mortality for public health action (EuroMOMO) project. https://www.euromomo.eu/.
Department of Epidemiology, Statens Serum Institute, Copenhagen, Denmark StatBank Denmark, Statistics Denmark, https://www.statistikbanken.dk/
Höhle, M. and Mazick, A. (2010). Aberration detection in R
illustrated by Danish mortality monitoring. In T. Kass-Hout and X.
Zhang (eds.), Biosurveillance: A Health Protection Priority,
chapter 12. Chapman & Hall/CRC.
Preprint available at
https://staff.math.su.se/hoehle/pubs/hoehle_mazick2009-preprint.pdf
data("momo") momo ## show the period 2000-2008 with customized x-axis annotation ## (this is Figure 1 in Hoehle and Mazick, 2010) oopts <- surveillance.options("stsTickFactors" = c("%G" = 1.5, "%Q"=.75)) plot(momo[year(momo) >= 2000,], ylab = "", xlab = "Time (weeks)", par.list = list(las = 1), col = c(gray(0.5), NA, NA), xaxis.tickFreq = list("%G"=atChange, "%Q"=atChange), xaxis.labelFreq = list("%G"=atChange), xaxis.labelFormat = "%G") surveillance.options(oopts) if (surveillance.options("allExamples")) { ## stratified monitoring from 2007-W40 using the Farrington algorithm phase2 <- which(epoch(momo) >= "2007-10-01") momo2 <- farrington(momo, control = list(range=phase2, alpha=0.01, b=5, w=4)) print(colSums(alarms(momo2))) plot(momo2, col = c(8, NA, 4), same.scale = FALSE) ## stripchart of alarms (Figure 5 in Hoehle and Mazick, 2010) plot(momo2, type = alarm ~ time, xlab = "Time (weeks)", main = "", alarm.symbol = list(pch=3, col=1, cex=1.5)) }
data("momo") momo ## show the period 2000-2008 with customized x-axis annotation ## (this is Figure 1 in Hoehle and Mazick, 2010) oopts <- surveillance.options("stsTickFactors" = c("%G" = 1.5, "%Q"=.75)) plot(momo[year(momo) >= 2000,], ylab = "", xlab = "Time (weeks)", par.list = list(las = 1), col = c(gray(0.5), NA, NA), xaxis.tickFreq = list("%G"=atChange, "%Q"=atChange), xaxis.labelFreq = list("%G"=atChange), xaxis.labelFormat = "%G") surveillance.options(oopts) if (surveillance.options("allExamples")) { ## stratified monitoring from 2007-W40 using the Farrington algorithm phase2 <- which(epoch(momo) >= "2007-10-01") momo2 <- farrington(momo, control = list(range=phase2, alpha=0.01, b=5, w=4)) print(colSums(alarms(momo2))) plot(momo2, col = c(8, NA, 4), same.scale = FALSE) ## stripchart of alarms (Figure 5 in Hoehle and Mazick, 2010) plot(momo2, type = alarm ~ time, xlab = "Time (weeks)", main = "", alarm.symbol = list(pch=3, col=1, cex=1.5)) }
The generic function multiplicity
is imported from package spatstat.geom.
See spatstat.geom::multiplicity
for
spatstat.geom's own methods, and multiplicity.Spatial
for the
added method for "Spatial"
objects.
The generic function multiplicity
defined in spatstat.geom is
intended to count the number of duplicates of each element of an object.
spatstat.geom already offers methods for point patterns, matrices and
data frames, and here we add a method for Spatial
objects from
the sp package. It is a wrapper for the default method, which
effectively computes the distance matrix of the points,
and then just counts the number of zeroes in each row.
## S3 method for class 'Spatial' multiplicity(x)
## S3 method for class 'Spatial' multiplicity(x)
x |
a |
an integer vector containing the number of instances of each point of the object.
multiplicity
in package spatstat.geom.
See the Examples of the hagelloch
data for a specific
use of multiplicity
.
foo <- SpatialPoints(matrix(c(1,2, 2,3, 1,2, 4,5), 4, 2, byrow=TRUE)) multiplicity(foo) # the following function determines the multiplicities in a matrix # or data frame and returns unique rows with appended multiplicity countunique <- function(x) unique(cbind(x, count=multiplicity(x))) countunique(coordinates(foo))
foo <- SpatialPoints(matrix(c(1,2, 2,3, 1,2, 4,5), 4, 2, byrow=TRUE)) multiplicity(foo) # the following function determines the multiplicities in a matrix # or data frame and returns unique rows with appended multiplicity countunique <- function(x) unique(cbind(x, count=multiplicity(x))) countunique(coordinates(foo))
Given a square binary adjacency matrix, the function
nbOrder
determines the integer matrix of neighbourhood orders
(shortest-path distance).
nbOrder(neighbourhood, maxlag = Inf)
nbOrder(neighbourhood, maxlag = Inf)
neighbourhood |
a square, numeric or logical, and usually symmetric matrix with
finite entries (and usually zeros on the diagonal) which indicates
vertex adjacencies, i.e., first-order neighbourhood (interpreted as
|
maxlag |
positive scalar integer specifying an upper bound for the
neighbourhood order. The default ( |
An integer matrix of neighbourhood orders, i.e., the shortest-path
distance matrix of the vertices.
The dimnames
of the input neighbourhood
matrix are preserved.
Sebastian Meyer
nblag
from the spdep package
## generate adjacency matrix set.seed(1) n <- 6 adjmat <- matrix(0, n, n) adjmat[lower.tri(adjmat)] <- sample(0:1, n*(n-1)/2, replace=TRUE) adjmat <- adjmat + t(adjmat) adjmat ## determine neighbourhood order matrix nblags <- nbOrder(adjmat) nblags
## generate adjacency matrix set.seed(1) n <- 6 adjmat <- matrix(0, n, n) adjmat[lower.tri(adjmat)] <- sample(0:1, n*(n-1)/2, replace=TRUE) adjmat <- adjmat + t(adjmat) adjmat ## determine neighbourhood order matrix nblags <- nbOrder(adjmat) nblags
Nowcasting can help to obtain up-to-date information on trends during a situation where reports about events arrive with delay. For example in public health reporting, reports about important indicators (such as occurrence of cases) are prone to be delayed due to for example manual quality checking and reporting system hierarchies. Altogether, the delays are subject to a delay distribution, which may, or may not, vary over time.
nowcast(now, when, data, dEventCol="dHospital", dReportCol="dReport", method=c("bayes.notrunc", "bayes.notrunc.bnb", "lawless", "bayes.trunc", "unif", "bayes.trunc.ddcp"), aggregate.by="1 day", D=15, m=NULL, m.interpretation=c("hoehle_anderheiden2014", "lawless1994"), control=list( dRange=NULL, alpha=0.05, nSamples=1e3, N.tInf.prior=c("poisgamma","pois","unif"), N.tInf.max=300, gd.prior.kappa=0.1, ddcp=list(ddChangepoint=NULL, cp_order=c("zero","one"), Wextra=NULL, logLambda=c("iidLogGa","tps","rw1","rw2"), responseDistr=c("poisson", "negbin"), mcmc=c(burnin=2500, sample=10000, thin=1, adapt=1000, store.samples=FALSE)), score=FALSE, predPMF=FALSE))
nowcast(now, when, data, dEventCol="dHospital", dReportCol="dReport", method=c("bayes.notrunc", "bayes.notrunc.bnb", "lawless", "bayes.trunc", "unif", "bayes.trunc.ddcp"), aggregate.by="1 day", D=15, m=NULL, m.interpretation=c("hoehle_anderheiden2014", "lawless1994"), control=list( dRange=NULL, alpha=0.05, nSamples=1e3, N.tInf.prior=c("poisgamma","pois","unif"), N.tInf.max=300, gd.prior.kappa=0.1, ddcp=list(ddChangepoint=NULL, cp_order=c("zero","one"), Wextra=NULL, logLambda=c("iidLogGa","tps","rw1","rw2"), responseDistr=c("poisson", "negbin"), mcmc=c(burnin=2500, sample=10000, thin=1, adapt=1000, store.samples=FALSE)), score=FALSE, predPMF=FALSE))
now |
an object of class |
when |
a vector of |
data |
A data frame with one row per case – for each case on needs information on the day of the event (e.g. hospitalization) and the day of report of this event. |
dEventCol |
The name of the column in |
dReportCol |
Name of the column in |
method |
A vector of strings denoting the different methods for doing the nowcasting. Note that results of the first name in this list are officially returned by the function. However, it is possible to specify several methods here, e.g., in order to compare score evaluations. Details of the methods are described in Höhle and an der Heiden (2014).
|
aggregate.by |
Time scale used for the temporal aggregation of
the records in the data |
D |
Maximum possible or maximum relevant delay (unit:
|
m |
Size of the moving window for the estimation of the delay
distribution. Default: |
m.interpretation |
This parameter controls the interpretation of
the sliding window used to estimate the delay distribution. If
Alternatively, when |
control |
A list with named arguments controlling the functionality of the nowcasting.
|
The methodological details of the nowcasting procedures are described in Höhle M and an der Heiden M (2014).
nowcast
returns an object of "stsNC"
. The
upperbound
slot contains the median of the method specified at
the first position the argument method
. The slot pi
(for
prediction interval)
contains the equal tailed (1-)*100% prediction
intervals, which are calculated based on the predictive distributions
in slot
predPMF
.
Furthermore, slot truth
contains an sts
object
containing the true number of cases (if possible to compute it is based on
the data in data
). Finally, slot SR
contains the results
for the proper scoring rules (requires truth to be calculable).
Note: The bayes.trunc.ddcp
uses the JAGS software together with
the R package runjags to handle the parallelization of
the MCMC using the "rjparallel"
method of
run.jags
, which additionally requires the
rjags package. You need to manually install
JAGS on your computer for the package to work – see
https://mcmc-jags.sourceforge.io/
and the documentation of runjags for details.
Note: The function is still under development and might change in the future. Unfortunately, little emphasis has so far been put on making the function easy to understand and use.
Michael Höhle
Höhle, M. and an der Heiden, M. (2014): Bayesian nowcasting
during the STEC O104:H4 outbreak in Germany, 2011. Biometrics
70(4):993-1002. doi:10.1111/biom.12194.
A preprint is available as
https://staff.math.su.se/hoehle/pubs/hoehle_anderheiden2014-preprint.pdf.
Günther, F. and Bender, A. and Katz, K. and
Küchenhoff, H. and Höhle, M. (2020):
Nowcasting the COVID-19 pandemic in Bavaria.
Biometrical Journal. doi:10.1002/bimj.202000112
Preprint available at doi:10.1101/2020.06.26.20140210.
data("husO104Hosp") #Extract the reporting triangle at a specific day t.repTriangle <- as.Date("2011-07-04") #Use 'void' nowcasting procedure (we just want the reporting triangle) nc <- nowcast(now=t.repTriangle,when=t.repTriangle, dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp, D=15,method="unif") #Show reporting triangle reportingTriangle(nc) #Perform Bayesian nowcasting assuming the delay distribution is stable over time nc.control <- list(N.tInf.prior=structure("poisgamma", mean.lambda=50,var.lambda=3000), nSamples=1e2) t.repTriangle <- as.Date("2011-06-10") when <- seq(t.repTriangle-3,length.out=10,by="-1 day") nc <- nowcast(now=t.repTriangle,when=when, dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp, D=15,method="bayes.trunc",control=nc.control) #Show time series and posterior median forecast/nowcast plot(nc,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2)) ## Not run: ### Using runjags to do a Bayesian model with changepoint(s) ### -- this might take a while nc.control.ddcp <- modifyList(nc.control, list(gd.prior.kappa=0.1, ddcp=list(ddChangepoint=as.Date(c("2011-05-23")), logLambda="tps", tau.gamma=1, mcmc=c(burnin=1000,sample=1000,thin=1, adapt=1000,store.samples=FALSE)))) nc.ddcp <- nowcast(now=t.repTriangle,when=when, dEventCol="dHosp",dReportCol="dReport", data=husO104Hosp, aggregate.by="1 day", method="bayes.trunc.ddcp", D=15, control=nc.control.ddcp) plot(nc.ddcp,legend.opts=NULL, xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2)) lambda <- attr(delayCDF(nc.ddcp)[["bayes.trunc.ddcp"]],"model")$lambda showIdx <- seq(which( max(when) == epoch(nc.ddcp))) #seq(ncol(lambda)) matlines( showIdx,t(lambda)[showIdx,],col="gray",lwd=c(1,2,1),lty=c(2,1,2)) legend(x="topright",c(expression(lambda(t)),"95% CI"),col="gray",lwd=c(2,1),lty=c(1,2)) ## End(Not run)
data("husO104Hosp") #Extract the reporting triangle at a specific day t.repTriangle <- as.Date("2011-07-04") #Use 'void' nowcasting procedure (we just want the reporting triangle) nc <- nowcast(now=t.repTriangle,when=t.repTriangle, dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp, D=15,method="unif") #Show reporting triangle reportingTriangle(nc) #Perform Bayesian nowcasting assuming the delay distribution is stable over time nc.control <- list(N.tInf.prior=structure("poisgamma", mean.lambda=50,var.lambda=3000), nSamples=1e2) t.repTriangle <- as.Date("2011-06-10") when <- seq(t.repTriangle-3,length.out=10,by="-1 day") nc <- nowcast(now=t.repTriangle,when=when, dEventCol="dHosp",dReportCol="dReport",data=husO104Hosp, D=15,method="bayes.trunc",control=nc.control) #Show time series and posterior median forecast/nowcast plot(nc,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2)) ## Not run: ### Using runjags to do a Bayesian model with changepoint(s) ### -- this might take a while nc.control.ddcp <- modifyList(nc.control, list(gd.prior.kappa=0.1, ddcp=list(ddChangepoint=as.Date(c("2011-05-23")), logLambda="tps", tau.gamma=1, mcmc=c(burnin=1000,sample=1000,thin=1, adapt=1000,store.samples=FALSE)))) nc.ddcp <- nowcast(now=t.repTriangle,when=when, dEventCol="dHosp",dReportCol="dReport", data=husO104Hosp, aggregate.by="1 day", method="bayes.trunc.ddcp", D=15, control=nc.control.ddcp) plot(nc.ddcp,legend.opts=NULL, xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2)) lambda <- attr(delayCDF(nc.ddcp)[["bayes.trunc.ddcp"]],"model")$lambda showIdx <- seq(which( max(when) == epoch(nc.ddcp))) #seq(ncol(lambda)) matlines( showIdx,t(lambda)[showIdx,],col="gray",lwd=c(1,2,1),lty=c(2,1,2)) legend(x="topright",c(expression(lambda(t)),"95% CI"),col="gray",lwd=c(2,1),lty=c(1,2)) ## End(Not run)
CUSUM for paired binary data as described in Steiner et al. (1999).
pairedbinCUSUM(stsObj, control = list(range=NULL,theta0,theta1, h1,h2,h11,h22)) pairedbinCUSUM.runlength(p,w1,w2,h1,h2,h11,h22, sparse=FALSE)
pairedbinCUSUM(stsObj, control = list(range=NULL,theta0,theta1, h1,h2,h11,h22)) pairedbinCUSUM.runlength(p,w1,w2,h1,h2,h11,h22, sparse=FALSE)
stsObj |
Object of class |
control |
Control object as a list containing several parameters.
|
p |
Vector giving the probability of the four different possible states, i.e. c((death=0,near-miss=0),(death=1,near-miss=0), (death=0,near-miss=1),(death=1,near-miss=1)). |
w1 |
The parameters |
w2 |
As for |
h1 |
decision barrier for 1st individual cusums |
h2 |
decision barrier for 2nd cusums |
h11 |
together with |
h22 |
together with |
sparse |
Boolean indicating whether to use sparse matrix
computations from the |
For details about the method see the Steiner et al. (1999) reference listed below. Basically, two individual CUSUMs are run each based on a logistic regression model. The combined CUSUM not only signals if one of its two individual CUSUMs signals, but also if the two CUSUMs simultaneously cross the secondary limits.
An sts
object with observed
, alarm
,
etc. slots trimmed to the control$range
indices.
S. Steiner and M. Höhle
Steiner, S. H., Cook, R. J., and Farewell, V. T. (1999), Monitoring paired binary surgical outcomes using cumulative sum charts, Statistics in Medicine, 18, pp. 69–86.
#Set in-control and out-of-control parameters as in paper theta0 <- c(-2.3,-4.5,2.5) theta1 <- c(-1.7,-2.9,2.5) #Small helper function to compute the paired-binary likelihood #of the length two vector yz when the true parameters are theta dPBin <- function(yz,theta) { exp(dbinom(yz[1],size=1,prob=plogis(theta[1]),log=TRUE) + dbinom(yz[2],size=1,prob=plogis(theta[2]+theta[3]*yz[1]),log=TRUE)) } #Likelihood ratio for all four possible configurations p <- c(dPBin(c(0,0), theta=theta0), dPBin(c(0,1), theta=theta0), dPBin(c(1,0), theta=theta0), dPBin(c(1,1), theta=theta0)) if (surveillance.options("allExamples")) #Compute ARL using slow, non-sparse matrix operations pairedbinCUSUM.runlength(p,w1=c(-1,37,-9,29),w2=c(-1,7),h1=70,h2=32, h11=38,h22=17) #Sparse computations can be considerably (!) faster pairedbinCUSUM.runlength(p,w1=c(-1,37,-9,29),w2=c(-1,7),h1=70,h2=32, h11=38,h22=17,sparse=TRUE) #Use paired binary CUSUM on the De Leval et al. (1994) arterial switch #operation data on 104 newborn babies data("deleval") #Switch between death and near misses observed(deleval) <- observed(deleval)[,c(2,1)] #Run paired-binary CUSUM without generating alarms. pb.surv <- pairedbinCUSUM(deleval,control=list(theta0=theta0, theta1=theta1,h1=Inf,h2=Inf,h11=Inf,h22=Inf)) plot(pb.surv, xaxis.labelFormat=NULL, ylab="CUSUM Statistic") ###################################################################### #Scale the plots so they become comparable to the plots in Steiner et #al. (1999). To this end a small helper function is defined. ###################################################################### ###################################################################### #Log LR for conditional specification of the paired model ###################################################################### LLR.pairedbin <- function(yz,theta0, theta1) { #In control alphay0 <- theta0[1] ; alphaz0 <- theta0[2] ; beta0 <- theta0[3] #Out of control alphay1 <- theta1[1] ; alphaz1 <- theta1[2] ; beta1 <- theta1[3] #Likelihood ratios llry <- (alphay1-alphay0)*yz[1]+log(1+exp(alphay0))-log(1+exp(alphay1)) llrz <- (alphaz1-alphaz0)*yz[2]+log(1+exp(alphaz0+beta0*yz[1]))- log(1+exp(alphaz1+beta1*yz[1])) return(c(llry=llry,llrz=llrz)) } val <- expand.grid(0:1,0:1) table <- t(apply(val,1, LLR.pairedbin, theta0=theta0, theta1=theta1)) w1 <- min(abs(table[,1])) w2 <- min(abs(table[,2])) S <- upperbound(pb.surv) / cbind(rep(w1,nrow(observed(pb.surv))),w2) #Show results opar <- par(mfcol=c(2,1)) plot(1:nrow(deleval),S[,1],type="l",main="Near Miss",xlab="Patient No.", ylab="CUSUM Statistic") lines(c(0,1e99), c(32,32),lty=2,col=2) lines(c(0,1e99), c(17,17),lty=2,col=3) plot(1:nrow(deleval),S[,2],type="l",main="Death",xlab="Patient No.", ylab="CUSUM Statistic") lines(c(0,1e99), c(70,70),lty=2,col=2) lines(c(0,1e99), c(38,38),lty=2,col=3) par(opar) ###################################################################### # Run the CUSUM with thresholds as in Steiner et al. (1999). # After each alarm the CUSUM statistic is set to zero and # monitoring continues from this point. Triangles indicate alarm # in the respective CUSUM (nearmiss or death). If in both # simultaneously then an alarm is caused by the secondary limits. ###################################################################### pb.surv2 <- pairedbinCUSUM(deleval,control=list(theta0=theta0, theta1=theta1,h1=70*w1,h2=32*w2,h11=38*w1,h22=17*w2)) plot(pb.surv2, xaxis.labelFormat=NULL)
#Set in-control and out-of-control parameters as in paper theta0 <- c(-2.3,-4.5,2.5) theta1 <- c(-1.7,-2.9,2.5) #Small helper function to compute the paired-binary likelihood #of the length two vector yz when the true parameters are theta dPBin <- function(yz,theta) { exp(dbinom(yz[1],size=1,prob=plogis(theta[1]),log=TRUE) + dbinom(yz[2],size=1,prob=plogis(theta[2]+theta[3]*yz[1]),log=TRUE)) } #Likelihood ratio for all four possible configurations p <- c(dPBin(c(0,0), theta=theta0), dPBin(c(0,1), theta=theta0), dPBin(c(1,0), theta=theta0), dPBin(c(1,1), theta=theta0)) if (surveillance.options("allExamples")) #Compute ARL using slow, non-sparse matrix operations pairedbinCUSUM.runlength(p,w1=c(-1,37,-9,29),w2=c(-1,7),h1=70,h2=32, h11=38,h22=17) #Sparse computations can be considerably (!) faster pairedbinCUSUM.runlength(p,w1=c(-1,37,-9,29),w2=c(-1,7),h1=70,h2=32, h11=38,h22=17,sparse=TRUE) #Use paired binary CUSUM on the De Leval et al. (1994) arterial switch #operation data on 104 newborn babies data("deleval") #Switch between death and near misses observed(deleval) <- observed(deleval)[,c(2,1)] #Run paired-binary CUSUM without generating alarms. pb.surv <- pairedbinCUSUM(deleval,control=list(theta0=theta0, theta1=theta1,h1=Inf,h2=Inf,h11=Inf,h22=Inf)) plot(pb.surv, xaxis.labelFormat=NULL, ylab="CUSUM Statistic") ###################################################################### #Scale the plots so they become comparable to the plots in Steiner et #al. (1999). To this end a small helper function is defined. ###################################################################### ###################################################################### #Log LR for conditional specification of the paired model ###################################################################### LLR.pairedbin <- function(yz,theta0, theta1) { #In control alphay0 <- theta0[1] ; alphaz0 <- theta0[2] ; beta0 <- theta0[3] #Out of control alphay1 <- theta1[1] ; alphaz1 <- theta1[2] ; beta1 <- theta1[3] #Likelihood ratios llry <- (alphay1-alphay0)*yz[1]+log(1+exp(alphay0))-log(1+exp(alphay1)) llrz <- (alphaz1-alphaz0)*yz[2]+log(1+exp(alphaz0+beta0*yz[1]))- log(1+exp(alphaz1+beta1*yz[1])) return(c(llry=llry,llrz=llrz)) } val <- expand.grid(0:1,0:1) table <- t(apply(val,1, LLR.pairedbin, theta0=theta0, theta1=theta1)) w1 <- min(abs(table[,1])) w2 <- min(abs(table[,2])) S <- upperbound(pb.surv) / cbind(rep(w1,nrow(observed(pb.surv))),w2) #Show results opar <- par(mfcol=c(2,1)) plot(1:nrow(deleval),S[,1],type="l",main="Near Miss",xlab="Patient No.", ylab="CUSUM Statistic") lines(c(0,1e99), c(32,32),lty=2,col=2) lines(c(0,1e99), c(17,17),lty=2,col=3) plot(1:nrow(deleval),S[,2],type="l",main="Death",xlab="Patient No.", ylab="CUSUM Statistic") lines(c(0,1e99), c(70,70),lty=2,col=2) lines(c(0,1e99), c(38,38),lty=2,col=3) par(opar) ###################################################################### # Run the CUSUM with thresholds as in Steiner et al. (1999). # After each alarm the CUSUM statistic is set to zero and # monitoring continues from this point. Triangles indicate alarm # in the respective CUSUM (nearmiss or death). If in both # simultaneously then an alarm is caused by the secondary limits. ###################################################################### pb.surv2 <- pairedbinCUSUM(deleval,control=list(theta0=theta0, theta1=theta1,h1=70*w1,h2=32*w2,h11=38*w1,h22=17*w2)) plot(pb.surv2, xaxis.labelFormat=NULL)
The difference between mean scores
from model 1 and
mean scores
from model 2 is used as the test statistic.
Under the null hypothesis of no difference, the actually observed
difference between mean scores should not be notably different from
the distribution of the test statistic under permutation.
As the computation of all possible permutations is only feasible for
small datasets, a random sample of permutations is used to obtain the
null distribution. The resulting p-value thus depends on the
.Random.seed
.
permutationTest(score1, score2, nPermutation = 9999, plot = FALSE, verbose = FALSE)
permutationTest(score1, score2, nPermutation = 9999, plot = FALSE, verbose = FALSE)
score1 , score2
|
numeric vectors of scores from models 1 and 2, respectively. |
nPermutation |
number of Monte Carlo replicates. |
plot |
logical indicating if a |
verbose |
logical indicating if the results should be printed in one line. |
For each permutation, we first randomly assign the membership of the n
individual scores to either model 1 or 2 with probability 0.5. We then
compute the respective difference in mean for model 1 and 2 in this
permuted set of scores. The Monte Carlo p-value is then given by
(1 + #{permuted differences larger than observed difference (in
absolute value)}) / (1 + nPermutation
).
a list of the following elements:
diffObs |
observed difference in mean scores, i.e.,
|
pVal.permut |
p-value of the permutation test |
pVal.t |
p-value of the corresponding
|
Michaela Paul with contributions by Sebastian Meyer
Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177
Package coin for a comprehensive permutation test framework.
permutationTest(rnorm(50, 1.5), rnorm(50, 1), plot = TRUE)
permutationTest(rnorm(50, 1.5), rnorm(50, 1), plot = TRUE)
See Czado et al. (2009).
pit(x, ...) ## Default S3 method: pit(x, pdistr, J = 10, relative = TRUE, ..., plot = list())
pit(x, ...) ## Default S3 method: pit(x, pdistr, J = 10, relative = TRUE, ..., plot = list())
x |
numeric vector representing the observed counts. |
pdistr |
either a list of predictive cumulative distribution functions for
the observations |
J |
the number of bins of the histogram. |
relative |
logical indicating if relative frequency or the density should be plotted.
Due to a historical bug, |
... |
ignored if |
plot |
a list of arguments for |
an object of class "pit"
, which inherits from class
"histogram"
(see hist
).
It is returned invisibly if a plot is produced.
Michaela Paul and Sebastian Meyer
Czado, C., Gneiting, T. and Held, L. (2009): Predictive model assessment for count data. Biometrics, 65 (4), 1254-1261. doi:10.1111/j.1541-0420.2009.01191.x
## Simulation example of Czado et al. (2009, Section 2.4) set.seed(100) x <- rnbinom(200, mu = 5, size = 2) pdistrs <- list("NB(5,0)" = function (x) ppois(x, lambda=5), "NB(5,1/2)" = function (x) pnbinom(x, mu=5, size=2), "NB(5,1)" = function (x) pnbinom(x, mu=5, size=1)) ## Reproduce Figure 1 op <- par(mfrow = c(1,3)) for (i in seq_along(pdistrs)) { pit(x, pdistr = pdistrs[[i]], J = 10, plot = list(ylim = c(0,2.75), main = names(pdistrs)[i])) box() } par(op) ## Alternative call using ... arguments for pdistr (less efficient) stopifnot(identical(pit(x, "pnbinom", mu = 5, size = 2, plot = FALSE), pit(x, pdistrs[[2]], plot = FALSE)))
## Simulation example of Czado et al. (2009, Section 2.4) set.seed(100) x <- rnbinom(200, mu = 5, size = 2) pdistrs <- list("NB(5,0)" = function (x) ppois(x, lambda=5), "NB(5,1/2)" = function (x) pnbinom(x, mu=5, size=2), "NB(5,1)" = function (x) pnbinom(x, mu=5, size=1)) ## Reproduce Figure 1 op <- par(mfrow = c(1,3)) for (i in seq_along(pdistrs)) { pit(x, pdistr = pdistrs[[i]], J = 10, plot = list(ylim = c(0,2.75), main = names(pdistrs)[i])) box() } par(op) ## Alternative call using ... arguments for pdistr (less efficient) stopifnot(identical(pit(x, "pnbinom", mu = 5, size = 2, plot = FALSE), pit(x, pdistrs[[2]], plot = FALSE)))
lapply
Verbose and parallelized version of lapply
wrapping around
mclapply
and parLapply
in the base package parallel.
This wrapper can take care of the .Random.seed
and
print progress information (not for cluster-based parallelization).
With the default arguments it equals lapply
enriched by a progress bar.
plapply(X, FUN, ..., .parallel = 1, .seed = NULL, .verbose = TRUE)
plapply(X, FUN, ..., .parallel = 1, .seed = NULL, .verbose = TRUE)
X , FUN , ...
|
see |
.parallel |
the number of processes to use in parallel operation, or a
|
.seed |
If set (non- |
.verbose |
if and how progress information should be displayed, i.e., what to
do on each exit of |
a list of the results of calling FUN
on each value of X
.
Sebastian Meyer
## example inspired by help("lapply") x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) ## if neither parallel nor verbose then this simply equals lapply() plapply(x, quantile, probs = 1:3/4, .verbose = FALSE) ## verbose lapply() -- not really useful for such fast computations res <- plapply(x, quantile, probs = 1:3/4, .verbose = TRUE) res <- plapply(x, quantile, probs = 1:3/4, .verbose = "|") res <- plapply(x, quantile, probs = 1:3/4, .verbose = quote(cat("length(x) =", length(x), "\n"))) ## setting the seed for reproducibility of results involving the RNG samp <- plapply(as.list(1:3), runif, .seed = 1) ## parallel lapply() res <- plapply(x, quantile, probs = 1:3/4, .parallel = 2, .verbose = FALSE) ## using a predefined cluster library("parallel") cl <- makeCluster(getOption("cl.cores", 2)) res <- plapply(x, quantile, probs = 1:3/4, .parallel = cl) stopCluster(cl)
## example inspired by help("lapply") x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) ## if neither parallel nor verbose then this simply equals lapply() plapply(x, quantile, probs = 1:3/4, .verbose = FALSE) ## verbose lapply() -- not really useful for such fast computations res <- plapply(x, quantile, probs = 1:3/4, .verbose = TRUE) res <- plapply(x, quantile, probs = 1:3/4, .verbose = "|") res <- plapply(x, quantile, probs = 1:3/4, .verbose = quote(cat("length(x) =", length(x), "\n"))) ## setting the seed for reproducibility of results involving the RNG samp <- plapply(as.list(1:3), runif, .seed = 1) ## parallel lapply() res <- plapply(x, quantile, probs = 1:3/4, .parallel = 2, .verbose = FALSE) ## using a predefined cluster library("parallel") cl <- makeCluster(getOption("cl.cores", 2)) res <- plapply(x, quantile, probs = 1:3/4, .parallel = cl) stopCluster(cl)
"SpatialPolygons"
Wrapping around functionality of the spdep package, this function
computes the symmetric, binary (0/1), adjacency matrix from a
"SpatialPolygons"
object.
It essentially applies
nb2mat(poly2nb(SpP, ...), style="B",
zero.policy=zero.policy)
.
poly2adjmat(SpP, ..., zero.policy = TRUE)
poly2adjmat(SpP, ..., zero.policy = TRUE)
SpP |
an object inheriting from |
... |
arguments passed to |
zero.policy |
logical indicating if islands are allowed, see
|
a symmetric numeric indicator matrix of size length(SpP)
^2
representing polygon adjacencies.
(of this wrapper) Sebastian Meyer
poly2nb
in package spdep
if (requireNamespace("spdep")) { ## generate adjacency matrix for districts of Bayern and Baden-Wuerttemberg data("fluBYBW") adjmat <- poly2adjmat(fluBYBW@map) ## same as already stored in the neighbourhood slot (in different order) stopifnot(all.equal(adjmat, neighbourhood(fluBYBW)[rownames(adjmat),colnames(adjmat)])) ## a visual check of the district-specific number of neighbours plot(fluBYBW@map) text(coordinates(fluBYBW@map), labels=rowSums(adjmat==1), font=2, col=2) }
if (requireNamespace("spdep")) { ## generate adjacency matrix for districts of Bayern and Baden-Wuerttemberg data("fluBYBW") adjmat <- poly2adjmat(fluBYBW@map) ## same as already stored in the neighbourhood slot (in different order) stopifnot(all.equal(adjmat, neighbourhood(fluBYBW)[rownames(adjmat),colnames(adjmat)])) ## a visual check of the district-specific number of neighbours plot(fluBYBW@map) text(coordinates(fluBYBW@map), labels=rowSums(adjmat==1), font=2, col=2) }
Determines which polygons of a "SpatialPolygons"
object are at the border, i.e. have coordinates in common with the
spatial union of all polygons (constructed using
unionSpatialPolygons
).
polyAtBorder(SpP, snap = sqrt(.Machine$double.eps), method = "sf", ...)
polyAtBorder(SpP, snap = sqrt(.Machine$double.eps), method = "sf", ...)
SpP |
an object of class |
snap |
tolerance used to consider coordinates as identical. |
method |
method to use for |
... |
further arguments passed to the chosen |
logical vector of the same length as SpP
also inheriting its
row.names
.
Sebastian Meyer
## Load districts of Germany load(system.file("shapes", "districtsD.RData", package = "surveillance")) ## Determine districts at the border and check the result on the map if (requireNamespace("sf")) { atBorder <- polyAtBorder(districtsD, method = "sf") if (interactive()) plot(districtsD, col = atBorder) table(atBorder) } ## For method = "polyclip", a higher snapping tolerance is required ## to obtain the correct result if (requireNamespace("polyclip")) { atBorder <- polyAtBorder(districtsD, snap = 1e-6, method = "polyclip") if (interactive()) plot(districtsD, col = atBorder) table(atBorder) }
## Load districts of Germany load(system.file("shapes", "districtsD.RData", package = "surveillance")) ## Determine districts at the border and check the result on the map if (requireNamespace("sf")) { atBorder <- polyAtBorder(districtsD, method = "sf") if (interactive()) plot(districtsD, col = atBorder) table(atBorder) } ## For method = "polyclip", a higher snapping tolerance is required ## to obtain the correct result if (requireNamespace("polyclip")) { atBorder <- polyAtBorder(districtsD, snap = 1e-6, method = "polyclip") if (interactive()) plot(districtsD, col = atBorder) table(atBorder) }
Computes the prime number factorization of an integer.
primeFactors(x)
primeFactors(x)
x |
an integer |
vector with prime number factorization of x
Print a single quality value object in a nicely formatted way
## S3 method for class 'algoQV' print(x,...)
## S3 method for class 'algoQV' print(x,...)
x |
Quality Values object generated with |
... |
Further arguments (not really used) |
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the quality values in a nice formatted way algo.quality(survResObj)
# Create a test object disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 200, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 1.7) # Let this object be tested from rki1 survResObj <- algo.rki1(disProgObj, control = list(range = 50:200)) # Compute the quality values in a nice formatted way algo.quality(survResObj)
The S3 generic function R0
defined in package surveillance is intended to
compute reproduction numbers from fitted epidemic models.
The package currently defines a method for the "twinstim"
class, which
computes expected numbers of infections caused by infected individuals depending on the event type
and marks attached to the individual, which contribute to the infection pressure
in the epidemic predictor of that class.
There is also a method for simulated "epidataCS"
(just a wrapper for the "twinstim"
-method).
R0(object, ...) ## S3 method for class 'twinstim' R0(object, newevents, trimmed = TRUE, newcoef = NULL, ...) ## S3 method for class 'simEpidataCS' R0(object, trimmed = TRUE, ...) simpleR0(object, eta = coef(object)[["e.(Intercept)"]], eps.s = NULL, eps.t = NULL, newcoef = NULL)
R0(object, ...) ## S3 method for class 'twinstim' R0(object, newevents, trimmed = TRUE, newcoef = NULL, ...) ## S3 method for class 'simEpidataCS' R0(object, trimmed = TRUE, ...) simpleR0(object, eta = coef(object)[["e.(Intercept)"]], eps.s = NULL, eps.t = NULL, newcoef = NULL)
object |
A fitted epidemic model object for which an |
newevents |
an optional For the |
trimmed |
logical indicating if the individual reproduction numbers should be
calculated by integrating the epidemic intensities over the
observation period and region only ( |
newcoef |
the model parameters to use when calculating reproduction numbers.
The default ( |
... |
additional arguments passed to methods.
Currently unused for the |
eta |
a value for the epidemic linear predictor, see details. |
eps.s , eps.t
|
the spatial/temporal radius of interaction.
If |
For the "twinstim"
class, the individual-specific expected
number of infections caused by individual (event)
inside its theoretical (untrimmed) spatio-temporal range of interaction
given by its
eps.t
() and
eps.s
() values is defined as follows (cf. Meyer et al, 2012):
Here, denotes the disc centred at (0,0)' with
radius
,
is the epidemic linear predictor,
is the temporal interaction function, and
is the spatial interaction function. For a type-specific
twinstim
, there is an additional factor for the number of event
types which can be infected by the type of event and the
interaction functions may be type-specific as well.
Alternatively to the equation above,
the trimmed
(observed) reproduction numbers
are obtain by integrating over the observed infectious domains of the
individuals, i.e. integrate over the intersection of the
influence region with the observation region
W
(i.e. over )
and
over the intersection of the observed infectious period with
the observation period
(i.e. over
).
The function simpleR0
computes
where defaults to
disregarding any epidemic
effects of types and marks. It is thus only
suitable for simple epidemic
twinstim
models with
epidemic = ~1
, a diagonal (or secondary diagonal) qmatrix
,
and type-invariant interaction functions.
simpleR0
mainly exists for use by epitest
.
(Numerical) Integration is performed exactly as during the fitting of
object
, for instance object$control.siaf
is queried if
necessary.
For the R0
methods,
a numeric vector of estimated reproduction numbers from the fitted
model object
corresponding to the rows of newevents
(if
supplied) or the original fitted events including events of the prehistory.
For simpleR0
, a single number (see details).
Sebastian Meyer
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
## load the 'imdepi' data and a model fit data("imdepi", "imdepifit") ## calculate individual and type-specific reproduction numbers R0s <- R0(imdepifit) tapply(R0s, imdepi$events@data[names(R0s), "type"], summary) ## untrimmed R0 for specific event settings refevent <- data.frame(agegrp = "[0,3)", type = "B", eps.s = Inf, eps.t = 30) setting2 <- data.frame(agegrp = "[3,19)", type = "C", eps.s = Inf, eps.t = 14) newevents <- rbind("ref" = refevent, "event2" = setting2) (R0_examples <- R0(imdepifit, newevents = newevents, trimmed = FALSE)) stopifnot(all.equal(R0_examples[["ref"]], simpleR0(imdepifit))) ### compute a Monte Carlo confidence interval ## use a simpler model with constant 'siaf' for speed simplefit <- update(imdepifit, epidemic=~type, siaf=NULL, subset=NULL) ## we'd like to compute the mean R0's by event type meanR0ByType <- function (newcoef) { R0events <- R0(simplefit, newcoef=newcoef) tapply(R0events, imdepi$events@data[names(R0events),"type"], mean) } (meansMLE <- meanR0ByType(newcoef=NULL)) ## sample B times from asymptotic multivariate normal of the MLE B <- 5 # CAVE: toy example! In practice this has to be much larger set.seed(123) parsamples <- MASS::mvrnorm(B, mu=coef(simplefit), Sigma=vcov(simplefit)) ## for each sample compute the 'meanR0ByType' meansMC <- apply(parsamples, 1, meanR0ByType) ## get the quantiles and print the result cisMC <- apply(cbind(meansMLE, meansMC), 1, quantile, probs=c(0.025,0.975)) print(rbind(MLE=meansMLE, cisMC)) ### R0 for a simple epidemic model ### without epidemic covariates, i.e., all individuals are equally infectious mepi1 <- update(simplefit, epidemic = ~1, subset = type == "B", model = TRUE, verbose = FALSE) ## using the default spatial and temporal ranges of interaction (R0B <- simpleR0(mepi1)) # eps.s=200, eps.t=30 stopifnot(identical(R0B, R0(mepi1, trimmed = FALSE)[[1]])) ## assuming smaller interaction ranges (but same infection intensity) simpleR0(mepi1, eps.s = 50, eps.t = 15)
## load the 'imdepi' data and a model fit data("imdepi", "imdepifit") ## calculate individual and type-specific reproduction numbers R0s <- R0(imdepifit) tapply(R0s, imdepi$events@data[names(R0s), "type"], summary) ## untrimmed R0 for specific event settings refevent <- data.frame(agegrp = "[0,3)", type = "B", eps.s = Inf, eps.t = 30) setting2 <- data.frame(agegrp = "[3,19)", type = "C", eps.s = Inf, eps.t = 14) newevents <- rbind("ref" = refevent, "event2" = setting2) (R0_examples <- R0(imdepifit, newevents = newevents, trimmed = FALSE)) stopifnot(all.equal(R0_examples[["ref"]], simpleR0(imdepifit))) ### compute a Monte Carlo confidence interval ## use a simpler model with constant 'siaf' for speed simplefit <- update(imdepifit, epidemic=~type, siaf=NULL, subset=NULL) ## we'd like to compute the mean R0's by event type meanR0ByType <- function (newcoef) { R0events <- R0(simplefit, newcoef=newcoef) tapply(R0events, imdepi$events@data[names(R0events),"type"], mean) } (meansMLE <- meanR0ByType(newcoef=NULL)) ## sample B times from asymptotic multivariate normal of the MLE B <- 5 # CAVE: toy example! In practice this has to be much larger set.seed(123) parsamples <- MASS::mvrnorm(B, mu=coef(simplefit), Sigma=vcov(simplefit)) ## for each sample compute the 'meanR0ByType' meansMC <- apply(parsamples, 1, meanR0ByType) ## get the quantiles and print the result cisMC <- apply(cbind(meansMLE, meansMC), 1, quantile, probs=c(0.025,0.975)) print(rbind(MLE=meansMLE, cisMC)) ### R0 for a simple epidemic model ### without epidemic covariates, i.e., all individuals are equally infectious mepi1 <- update(simplefit, epidemic = ~1, subset = type == "B", model = TRUE, verbose = FALSE) ## using the default spatial and temporal ranges of interaction (R0B <- simpleR0(mepi1)) # eps.s=200, eps.t=30 stopifnot(identical(R0B, R0(mepi1, trimmed = FALSE)[[1]])) ## assuming smaller interaction ranges (but same infection intensity) simpleR0(mepi1, eps.s = 50, eps.t = 15)
The generic functions ranef
and fixef
are imported from package nlme.
See nlme::ranef
for nlme's own
description, and ranef.hhh4
or fixef.hhh4
for the added methods for "hhh4"
models.
The reference values are formed based on computations
of seq
for Date class arguments.
refvalIdxByDate(t0, b, w, epochStr, epochs)
refvalIdxByDate(t0, b, w, epochStr, epochs)
t0 |
A Date object describing the time point |
b |
Number of years to go back in time |
w |
Half width of window to include reference values for |
epochStr |
One of |
epochs |
Vector containing the epoch value of the sts/disProg object |
Using the Date class the reference values are formed as follows:
Starting from t0
go i, i= 1,...,b
years back in time.
For each year, go w
epochs back and include from here to
w
epochs after t0
.
In case of weeks we always go back to the closest Monday of this date. In case of months we also go back in time to closest 1st of month.
a vector of indices in epochs which match
Extract the “residual process” (cf. Ogata, 1988) of a fitted
point process model specified through the conditional intensity
function, for instance a model of class "twinSIR"
or
"twinstim"
(and also "simEpidataCS"
).
The residuals are defined as the fitted cumulative intensities at the
event times, and are generalized residuals similar to those discussed in
Cox and Snell (1968).
## S3 method for class 'twinSIR' residuals(object, ...) ## S3 method for class 'twinstim' residuals(object, ...) ## S3 method for class 'simEpidataCS' residuals(object, ...)
## S3 method for class 'twinSIR' residuals(object, ...) ## S3 method for class 'twinstim' residuals(object, ...) ## S3 method for class 'simEpidataCS' residuals(object, ...)
object |
an object of one of the aforementioned model classes. |
... |
unused (argument of the generic). |
For objects of class twinstim
, the residuals may already be
stored in the object as component object$tau
if the model was
fitted with cumCIF = TRUE
(and they always are for
"simEpidataCS"
). In this case, the residuals
method just extracts these values. Otherwise, the residuals have to
be calculated, which is only possible with access to the model
environment, i.e. object
must have been fitted with
model = TRUE
. The calculated residuals are then also appended
to object
for future use. However, if cumCIF
and
model
were both set to true in the object
fit, then it
is not possible to calculate the residuals and the method returns an
error.
Numeric vector of length the number of events of the corresponding point
process fitted by object
. This is the observed residual process.
Sebastian Meyer
Ogata, Y. (1988) Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83, 9-27
Cox, D. R. & Snell, E. J. (1968) A general definition of residuals. Journal of the Royal Statistical Society. Series B (Methodological), 30, 248-275
checkResidualProcess
to graphically check the
goodness-of-fit of the underlying model.
Monthly reported number of rotavirus infections in the federal state of Brandenburg stratified by five age categories (00-04, 05-09, 10-14, 15-69, 70+) during 2002-2013.
data(rotaBB)
data(rotaBB)
A sts
object.
The data were queried on 19 Feb 2014 from the Survstat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/).
A dataset containing the reported number of cases of Salmonella in
Germany 2001-2014 aggregated by data of disease onset. The slot
control
contains a matrix reportingTriangle$n
with the
reporting triangle as described in Salmon et al. (2015).
data(salmAllOnset)
data(salmAllOnset)
A sts-object
Salmon, M., Schumacher, D., Stark, K., Höhle, M. (2015): Bayesian outbreak detection in the presence of reporting delays. Biometrical Journal, 57 (6), 1051-1067.
Reported number of cases of Salmonella in Germany 2004-2014 (early 2014) that were hospitalized. The corresponding
total number of cases is indicated in the slot populationFrac
and multinomialTS
is TRUE
.
data(salmHospitalized)
data(salmHospitalized)
An "sts"
object.
The data are queried from the Survstat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/).
Reported number of cases of the Salmonella Newport serovar in the 16 German federal states 2004-2013.
data(salmNewport)
data(salmNewport)
A sts
object.
The data were queried from the SurvStat@RKI database of the German Robert Koch Institute (https://survstat.rki.de/). A detailed description of the 2011 outbreak can be found in the publication
Bayer, C., Bernard, H., Prager, R., Rabsch, W., Hiller, P., Malorny, B., Pfefferkorn, B., Frank, C., de Jong, A., Friesema, I., Start, K., Rosner, B.M. (2014), An outbreak of Salmonella Newport associated with mung bean sprouts in Germany and the Netherlands, October to November 2011, Eurosurveillance 19(1):pii=20665.
Reported number of cases of the Salmonella Agona serovar in the UK 1990-1995. Note however that the counts do not correspond exactly to the ones used by Farrington et. al (1996).
data(salmonella.agona)
data(salmonella.agona)
A disProg
object with 312 observations starting from week 1 in 1990.
A statistical algorithm for the early detection of outbreaks of infectious disease, Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996). , J. R. Statist. Soc. A, 159, 547-563.
Proper scoring rules for Poisson or negative binomial predictions
of count data are described in Czado et al. (2009).
The following scores are implemented:
logarithmic score (logs
),
ranked probability score (rps
),
Dawid-Sebastiani score (dss
),
squared error score (ses
).
scores(x, ...) ## Default S3 method: scores(x, mu, size = NULL, which = c("logs", "rps", "dss", "ses"), sign = FALSE, ...) logs(x, mu, size = NULL) rps(x, mu, size = NULL, k = 40, tolerance = sqrt(.Machine$double.eps)) dss(x, mu, size = NULL) ses(x, mu, size = NULL)
scores(x, ...) ## Default S3 method: scores(x, mu, size = NULL, which = c("logs", "rps", "dss", "ses"), sign = FALSE, ...) logs(x, mu, size = NULL) rps(x, mu, size = NULL, k = 40, tolerance = sqrt(.Machine$double.eps)) dss(x, mu, size = NULL) ses(x, mu, size = NULL)
x |
the observed counts. All functions are vectorized and also accept matrices or arrays. Dimensions are preserved. |
mu |
the means of the predictive distributions for the
observations |
size |
either |
which |
a character vector specifying which scoring rules to apply.
By default, all four proper scores are calculated.
The normalized squared error score ( |
sign |
a logical indicating if the function should also return
|
... |
unused (argument of the generic). |
k |
scalar argument controlling the finite sum approximation for the
|
tolerance |
absolute tolerance for the finite sum approximation employed in the
|
The scoring functions return the individual scores for the predictions
of the observations in x
(maintaining their dimension attributes).
The default scores
-method applies the selected (which
)
scoring functions (and calculates sign(x-mu)
) and returns the
results in an array (via simplify2array
), where the last
dimension corresponds to the different scores.
Sebastian Meyer and Michaela Paul
Czado, C., Gneiting, T. and Held, L. (2009): Predictive model assessment for count data. Biometrics, 65 (4), 1254-1261. doi:10.1111/j.1541-0420.2009.01191.x
The R package scoringRules implements the logarithmic score and the (continuous) ranked probability score for many distributions.
mu <- c(0.1, 1, 3, 6, 3*pi, 100) size <- 0.5 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) scores(y, mu = mu, size = size) scores(y, mu = mu, size = 1) # ses ignores the variance scores(y, mu = 1, size = size) ## apply a specific scoring rule scores(y, mu = mu, size = size, which = "rps") rps(y, mu = mu, size = size) ## rps() gives NA (with a warning) if the NegBin is too wide rps(1e5, mu = 1e5, size = 1e-5)
mu <- c(0.1, 1, 3, 6, 3*pi, 100) size <- 0.5 set.seed(1) y <- rnbinom(length(mu), mu = mu, size = size) scores(y, mu = mu, size = size) scores(y, mu = mu, size = 1) # ses ignores the variance scores(y, mu = 1, size = size) ## apply a specific scoring rule scores(y, mu = mu, size = size, which = "rps") rps(y, mu = mu, size = size) ## rps() gives NA (with a warning) if the NegBin is too wide rps(1e5, mu = 1e5, size = 1e-5)
Number of salmonella hadar cases in Germany 2001-2006. An increase is seen during 2006.
data(shadar)
data(shadar)
A disProg
object containing
observations starting from week 1 in 2001 to week 35 in 2006.
Robert Koch-Institut: SurvStat: https://survstat.rki.de/; Queried on September 2006.
Robert Koch Institut, Epidemiologisches Bulletin 31/2006.
data(shadar) plot(shadar)
data(shadar) plot(shadar)
Simulation of epidemics which were introduced by point sources.
The basis of this programme is a combination of a Hidden Markov Model
(to get random timepoints for outbreaks) and a simple model
(compare sim.seasonalNoise
) to simulate the baseline.
sim.pointSource(p = 0.99, r = 0.01, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K)
sim.pointSource(p = 0.99, r = 0.01, length = 400, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K)
p |
probability to get a new outbreak at time i if there was one at time i-1, default 0.99. |
r |
probability to get no new outbreak at time i if there was none at time i-1, default 0.01. |
length |
number of weeks to model, default 400. |
A |
amplitude (range of sinus), default = 1. |
alpha |
parameter to move along the y-axis (negative values not allowed) with alpha > = A, default = 1. |
beta |
regression coefficient, default = 0. |
phi |
factor to create seasonal moves (moves the curve along the x-axis), default = 0. |
frequency |
factor to determine the oscillation-frequency, default = 1. |
state |
use a state chain to define the status at this timepoint (outbreak or not). If not given a Markov chain is generated by the programme, default NULL. |
K |
additional weight for an outbreak which influences the distribution parameter mu, default = 0. |
a disProg
(disease progress) object including a list of the
observed, the state chain and nearly all input parameters.
M. Höhle, A. Riebler, C. Lang
set.seed(123) disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 2) plot(disProgObj) ## with predefined state chain state <- rep(c(0,0,0,0,0,0,0,0,1,1), 20) disProgObj <- sim.pointSource(state = state, K = 1.2) plot(disProgObj) ## simulate epidemic, send to RKI 1 system, plot, and compute quality values testSim <- function (..., K = 0, range = 200:400) { disProgObj <- sim.pointSource(..., K = K) survResults <- algo.call(disProgObj, control = list(list(funcName = "rki1", range = range))) plot(survResults[[1]], "RKI 1", "Simulation") algo.compare(survResults) } testSim(K = 2) testSim(r = 0.5, K = 5) # larger and more frequent outbreaks
set.seed(123) disProgObj <- sim.pointSource(p = 0.99, r = 0.5, length = 208, A = 1, alpha = 1, beta = 0, phi = 0, frequency = 1, state = NULL, K = 2) plot(disProgObj) ## with predefined state chain state <- rep(c(0,0,0,0,0,0,0,0,1,1), 20) disProgObj <- sim.pointSource(state = state, K = 1.2) plot(disProgObj) ## simulate epidemic, send to RKI 1 system, plot, and compute quality values testSim <- function (..., K = 0, range = 200:400) { disProgObj <- sim.pointSource(..., K = K) survResults <- algo.call(disProgObj, control = list(list(funcName = "rki1", range = range))) plot(survResults[[1]], "RKI 1", "Simulation") algo.compare(survResults) } testSim(K = 2) testSim(r = 0.5, K = 5) # larger and more frequent outbreaks
Generation of a cyclic model of a Poisson distribution as background data for a simulated timevector.
The mean of the Poisson distribution is modelled as:
sim.seasonalNoise(A = 1, alpha = 1, beta = 0, phi = 0, length, frequency = 1, state = NULL, K = 0)
sim.seasonalNoise(A = 1, alpha = 1, beta = 0, phi = 0, length, frequency = 1, state = NULL, K = 0)
A |
amplitude (range of sinus), default = 1. |
alpha |
parameter to move along the y-axis (negative values not allowed) with alpha > = A, default = 1. |
beta |
regression coefficient, default = 0. |
phi |
factor to create seasonal moves (moves the curve along the x-axis), default = 0. |
length |
number of weeks to model. |
frequency |
factor to determine the oscillation-frequency, default = 1. |
state |
if a state chain is entered the outbreaks will be additional weighted by K. |
K |
additional weight for an outbreak which influences the distribution parameter mu, default = 0. |
an object of class seasonNoise
which includes the modelled
timevector, the parameter mu
and all input parameters.
M. Höhle, A. Riebler, C. Lang
season <- sim.seasonalNoise(length = 300) plot(season$seasonalBackground,type = "l") # use a negative timetrend beta season <- sim.seasonalNoise(beta = -0.003, length = 300) plot(season$seasonalBackground,type = "l")
season <- sim.seasonalNoise(length = 300) plot(season$seasonalBackground,type = "l") # use a negative timetrend beta season <- sim.seasonalNoise(beta = -0.003, length = 300) plot(season$seasonalBackground,type = "l")
Shiryaev-Roberts based prospective spatio-temporal cluster detection as in Assuncao & Correa (2009).
stcd(x, y,t,radius,epsilon,areaA, areaAcapBk, threshold, cusum=FALSE)
stcd(x, y,t,radius,epsilon,areaA, areaAcapBk, threshold, cusum=FALSE)
x |
Vector containing spatial x coordinate of the events. |
y |
Vector containing spatial y coordinate of the events. |
t |
Vector containing the time points of the events. It is assumed that the vector is sorted (early->last). |
radius |
Radius of the cluster to detect. |
epsilon |
Relative change of event-intensity within the cluster to detect. See reference paper for an explicit definition. |
areaA |
Area of the observation region A (single number) – This argument is currently ignored! |
areaAcapBk |
Area of A \ B(s_k,rho) for all k=1,...,n (vector). This argument is currently ignored! |
threshold |
Threshold limit for the alarm and should be equal to the desired Average-Run-Length (ARL) of the detector. |
cusum |
(logical) If
. Note that this has implications on what threshold will sound the alarm (CUSUM threshold needs to be smaller). |
Shiryaev-Roberts based spatio-temporal cluster detection based on the work in Assuncao and Correa (2009). The implementation is based on C++ code originally written by Marcos Oliveira Prates, UFMG, Brazil and provided by Thais Correa, UFMG, Brazil during her research stay in Munich. This stay was financially supported by the Munich Center of Health Sciences.
Note that the vectors x
, y
and t
need to be of the
same length. Furthermore, the vector t
needs to be sorted (to
improve speed, the latter is not verified within the function).
The current implementation uses a call to a C++ function to perform the actual computations of the test statistic. The function is currently experimental – data type and results may be subject to changes.
A list with three components
R |
A vector of the same length as the input containing the value of the test statistic for each observation. |
idxFA |
Index in the x,y,t vector causing a possible alarm. If no
cluster was detected, then a value of |
idxCC |
index in the x,y,t vector of the event containing the
cluster. If no cluster was detected, then a value of |
M. O. Prates, T. Correa and M. Höhle
Assuncao, R. and Correa, T. (2009), Surveillance to detect emerging space-time clusters, Computational Statistics & Data Analysis, 53(8):2817-2830.
if (require("splancs")) { # load the data from package "splancs" data(burkitt, package="splancs") # order the times burkitt <- burkitt[order(burkitt$t), ] #Parameters for the SR detection epsilon <- 0.5 # relative change within the cluster radius <- 20 # radius threshold <- 161 # threshold limit res <- stcd(x=burkitt$x, y=burkitt$y, t=burkitt$t, radius=radius, epsilon=epsilon, areaA=1, areaAcapBk=1, threshold=threshold) #Index of the event which.max(res$R >= threshold) }
if (require("splancs")) { # load the data from package "splancs" data(burkitt, package="splancs") # order the times burkitt <- burkitt[order(burkitt$t), ] #Parameters for the SR detection epsilon <- 0.5 # relative change within the cluster radius <- 20 # radius threshold <- 161 # threshold limit res <- stcd(x=burkitt$x, y=burkitt$y, t=burkitt$t, radius=radius, epsilon=epsilon, areaA=1, areaAcapBk=1, threshold=threshold) #Index of the event which.max(res$R >= threshold) }
The function stKtest
wraps functions in package splancs to
perform the K-function based Monte Carlo permutation test for space-time
clustering (Diggle et al, 1995) for "epidataCS"
.
The implementation is due to Meyer et al. (2016).
stKtest(object, eps.s = NULL, eps.t = NULL, B = 199, cores = 1, seed = NULL, poly = object$W) ## S3 method for class 'stKtest' plot(x, which = c("D", "R", "MC"), args.D = list(), args.D0 = args.D, args.R = list(), args.MC = list(), mfrow = sort(n2mfrow(length(which))), ...)
stKtest(object, eps.s = NULL, eps.t = NULL, B = 199, cores = 1, seed = NULL, poly = object$W) ## S3 method for class 'stKtest' plot(x, which = c("D", "R", "MC"), args.D = list(), args.D0 = args.D, args.R = list(), args.MC = list(), mfrow = sort(n2mfrow(length(which))), ...)
object |
an object of class |
eps.s , eps.t
|
numeric vectors defining the spatial and temporal
grids of critical distances over which to evaluate the test.
The default ( |
B |
the number of permutations. |
cores |
the number of parallel processes over which to distribute the requested number of permutations. |
seed |
argument for |
poly |
the polygonal observation region of the events (as an object handled
by |
x |
an |
which |
a character vector indicating which diagnostic plots to produce.
The full set is |
args.D , args.D0 , args.R , args.MC
|
argument lists for the plot functions |
mfrow |
|
... |
ignored (argument of the generic). |
an object of class "stKtest"
(inheriting from "htest"
),
which is a list with the following components:
method |
a character string indicating the type of test performed. |
data.name |
a character string naming the supplied |
statistic |
the sum |
parameter |
the number |
p.value |
the p-value for the test. |
pts |
the coordinate matrix of the event locations (for
|
stK |
the estimated K-function as returned by
|
seD |
the standard error of the estimated |
mctest |
the observed and permutation values of the test
statistic as returned by |
The plot
-method invisibly returns NULL
.
Sebastian Meyer
Diggle, P. J.; Chetwynd, A. G.; Häggkvist, R. and Morris, S. E. (1995): Second-order analysis of space-time clustering Statistical Methods in Medical Research, 4, 124-136.
Meyer, S., Warnke, I., Rössler, W. and Held, L. (2016): Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-temporal Epidemiology, 17, 15-25. doi:10.1016/j.sste.2016.03.002. Eprint: https://arxiv.org/abs/1512.09052.
the simple knox
test and function epitest
for testing "twinstim"
models.
if (requireNamespace("splancs")) { data("imdepi") imdepiB <- subset(imdepi, type == "B") mainpoly <- coordinates(imdepiB$W@polygons[[1]]@Polygons[[5]]) SGRID <- c(10, 25, 50, 100, 150) TGRID <- c(1, 7, 14, 21) B <- 19 # limited here for speed imdBstKtest <- stKtest(imdepiB, eps.s = SGRID, eps.t = TGRID, B = B, cores = 2, seed = 1, poly = list(mainpoly)) print(imdBstKtest) plot(imdBstKtest) }
if (requireNamespace("splancs")) { data("imdepi") imdepiB <- subset(imdepi, type == "B") mainpoly <- coordinates(imdepiB$W@polygons[[1]]@Polygons[[5]]) SGRID <- c(10, 25, 50, 100, 150) TGRID <- c(1, 7, 14, 21) B <- 19 # limited here for speed imdBstKtest <- stKtest(imdepiB, eps.s = SGRID, eps.t = TGRID, B = B, cores = 2, seed = 1, poly = list(mainpoly)) print(imdBstKtest) plot(imdBstKtest) }
The animate
-method for sts
objects
iterates over time points, plotting maps of the current|cumulative
counts|incidence via stsplot_space
, optionally
including a time series chart below the map to track the epidemic curve.
It is worth using functionality of the animation package
(e.g., saveHTML
) to directly export the
animation into a useful format.
## S3 method for class 'sts' animate(object, tps = NULL, cumulative = FALSE, population = NULL, at = 10, ..., timeplot = list(pos = 1, size = 0.3, fill = TRUE), sleep = 0.5, verbose = interactive(), draw = TRUE)
## S3 method for class 'sts' animate(object, tps = NULL, cumulative = FALSE, population = NULL, at = 10, ..., timeplot = list(pos = 1, size = 0.3, fill = TRUE), sleep = 0.5, verbose = interactive(), draw = TRUE)
object |
an object of class |
tps |
a numeric vector of one or more time points at which to plot the map.
The default |
cumulative |
logical specifying if the cumulative counts/incidence over time
should be plotted. The cumulative incidence is relative to the
population from the first time point |
population , at , ...
|
arguments for |
timeplot |
if a list and package gridExtra is available,
a time series chart of the counts along
the selected time points |
sleep |
time to wait ( |
verbose |
logical indicating if a |
draw |
logical indicating if the produced plots at each time point should
be drawn directly (the default) or not.
The setting |
(invisibly) a list of the length(tps)
sequential plot objects.
These are of class "gtable"
(from gtable)
if the timeplot
is included, otherwise of class
"\code{trellis"
.
Sebastian Meyer
the other plot types documented in stsplot
for static
time series plots and maps.
data("measlesWeserEms") ## animate the weekly counts of measles (during weeks 12-16 only, for speed) if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(measlesWeserEms, tps=12:16), title="Evolution of the measles epidemic in the Weser-Ems region", ani.width=500, ani.height=600) setwd(oldwd) } ## Not run: ## animate the weekly incidence of measles (per 100'000 inhabitants), ## and label the time series plot with dates in a specified format animate(measlesWeserEms, tps=12:16, population = measlesWeserEms@map$POPULATION / 100000, timeplot = list(as.Date = TRUE, scales = list(x = list(format = "%G/%V")))) ## End(Not run)
data("measlesWeserEms") ## animate the weekly counts of measles (during weeks 12-16 only, for speed) if (interactive() && require("animation")) { oldwd <- setwd(tempdir()) # to not clutter up the current working dir saveHTML(animate(measlesWeserEms, tps=12:16), title="Evolution of the measles epidemic in the Weser-Ems region", ani.width=500, ani.height=600) setwd(oldwd) } ## Not run: ## animate the weekly incidence of measles (per 100'000 inhabitants), ## and label the time series plot with dates in a specified format animate(measlesWeserEms, tps=12:16, population = measlesWeserEms@map$POPULATION / 100000, timeplot = list(as.Date = TRUE, scales = list(x = list(format = "%G/%V")))) ## End(Not run)
Function for simulating a time series and creating an
sts
object.
As the counts are generated using a negative binomial distribution
one also gets the (1-alpha) quantile for each timepoint (can be interpreted
as an in-control upperbound for in-control values).
The baseline and outbreaks are created as in Noufaily et al. (2012).
sts_creation(theta, beta, gamma1, gamma2, m, overdispersion, dates, sizesOutbreak, datesOutbreak, delayMax, alpha, densityDelay)
sts_creation(theta, beta, gamma1, gamma2, m, overdispersion, dates, sizesOutbreak, datesOutbreak, delayMax, alpha, densityDelay)
theta |
baseline frequency of reports |
beta |
time trend |
gamma1 |
seasonality |
gamma2 |
seasonality |
m |
seasonality |
overdispersion |
|
dates |
dates of the time series |
sizesOutbreak |
sizes of all the outbreaks (vector) |
datesOutbreak |
dates of all the outbreaks (vector) |
delayMax |
maximal delay in time units |
alpha |
alpha for getting the (1-alpha) quantile of the negative binomial distribution at each timepoint |
densityDelay |
density distribution for the delay |
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
set.seed(12345) # Time series parameters scenario4 <- c(1.6,0,0.4,0.5,2) theta <- 1.6 beta <- 0 gamma1 <-0.4 gamma2 <- 0.5 overdispersion <- 1 m <- 1 # Dates firstDate <- "2006-01-01" lengthT=350 dates <- as.Date(firstDate) + 7 * 0:(lengthT - 1) # Maximal delay in weeks D=10 # Dates and sizes of the outbreaks datesOutbreak <- as.Date(c("2008-03-30","2011-09-25")) sizesOutbreak <- c(2,5) # Delay distribution data("salmAllOnset") in2011 <- which(isoWeekYear(epoch(salmAllOnset))$ISOYear == 2011) rT2011 <- salmAllOnset@control$reportingTriangle$n[in2011,] densityDelay <- apply(rT2011,2,sum, na.rm=TRUE)/sum(rT2011, na.rm=TRUE) # alpha for the upperbound alpha <- 0.05 # Create the sts with the full time series stsSim <- sts_creation(theta=theta,beta=beta,gamma1=gamma1,gamma2=gamma2,m=m, overdispersion=overdispersion, dates=dates, sizesOutbreak=sizesOutbreak,datesOutbreak=datesOutbreak, delayMax=D,densityDelay=densityDelay, alpha=alpha) plot(stsSim)
set.seed(12345) # Time series parameters scenario4 <- c(1.6,0,0.4,0.5,2) theta <- 1.6 beta <- 0 gamma1 <-0.4 gamma2 <- 0.5 overdispersion <- 1 m <- 1 # Dates firstDate <- "2006-01-01" lengthT=350 dates <- as.Date(firstDate) + 7 * 0:(lengthT - 1) # Maximal delay in weeks D=10 # Dates and sizes of the outbreaks datesOutbreak <- as.Date(c("2008-03-30","2011-09-25")) sizesOutbreak <- c(2,5) # Delay distribution data("salmAllOnset") in2011 <- which(isoWeekYear(epoch(salmAllOnset))$ISOYear == 2011) rT2011 <- salmAllOnset@control$reportingTriangle$n[in2011,] densityDelay <- apply(rT2011,2,sum, na.rm=TRUE)/sum(rT2011, na.rm=TRUE) # alpha for the upperbound alpha <- 0.05 # Create the sts with the full time series stsSim <- sts_creation(theta=theta,beta=beta,gamma1=gamma1,gamma2=gamma2,m=m, overdispersion=overdispersion, dates=dates, sizesOutbreak=sizesOutbreak,datesOutbreak=datesOutbreak, delayMax=D,densityDelay=densityDelay, alpha=alpha) plot(stsSim)
"sts"
Objects Using ggplot2
A simple ggplot2 variant of stsplot_time
,
based on a “tidy” version of the "sts"
object via
tidy.sts
.
It uses a date axis and thus only works for time series indexed by
dates or with a standard frequency (daily, (bi-)weekly, or monthly).
autoplot.sts(object, population = FALSE, units = NULL, as.one = FALSE, scales = "fixed", width = NULL, ...)
autoplot.sts(object, population = FALSE, units = NULL, as.one = FALSE, scales = "fixed", width = NULL, ...)
object |
an object of class |
population |
logical indicating whether |
units |
optional integer or character vector to select the units
(=columns of |
as.one |
logical indicating if all time series should be plotted
in one panel with |
scales |
passed to |
width |
bar width, passed to |
... |
unused (argument of the generic). |
a "ggplot"
object.
Sebastian Meyer
stsplot_time
for the traditional plots.
## compare traditional plot() with ggplot2-based autoplot.sts() if (requireNamespace("ggplot2")) { data("measlesDE") plot(measlesDE, units = 1:2) autoplot.sts(measlesDE, units = 1:2) } ## weekly incidence: population(measlesDE) gives population fractions, ## which we need to multiply by the total population if (require("ggplot2", quietly = TRUE)) { autoplot.sts(measlesDE, population = 1000000/82314906) + ylab("Weekly incidence [per 1'000'000 inhabitants]") }
## compare traditional plot() with ggplot2-based autoplot.sts() if (requireNamespace("ggplot2")) { data("measlesDE") plot(measlesDE, units = 1:2) autoplot.sts(measlesDE, units = 1:2) } ## weekly incidence: population(measlesDE) gives population fractions, ## which we need to multiply by the total population if (require("ggplot2", quietly = TRUE)) { autoplot.sts(measlesDE, population = 1000000/82314906) + ylab("Weekly incidence [per 1'000'000 inhabitants]") }
sts
object with a given observation dateFunction for creating an sts
object with a given observation date.
sts_observation(sts, dateObservation, cut = TRUE)
sts_observation(sts, dateObservation, cut = TRUE)
sts |
sts-object we want to set at a previous state. Needs to include a reporting triangle. |
dateObservation |
Date for which we want the state. Needs to be in the reporting triangle dates. |
cut |
Boolean indicating whether to have 0 counts after the observation date or to simply cut the sts-object |
data("salmAllOnset") salmAllOnsety2014m01d20 <- sts_observation(salmAllOnset, dateObservation="2014-01-20",cut=FALSE) plot(salmAllOnset) lines(observed(salmAllOnsety2014m01d20),type="h",col="red")
data("salmAllOnset") salmAllOnsety2014m01d20 <- sts_observation(salmAllOnset, dateObservation="2014-01-20",cut=FALSE) plot(salmAllOnset) lines(observed(salmAllOnsety2014m01d20),type="h",col="red")
"sts"
– surveillance time seriesThis is a lightweight S4 class to implement (multivariate) time
series of counts, typically from public health surveillance.
The "sts"
class supersedes the informal "disProg"
class
used in early versions of package surveillance. Converters are
available, see disProg2sts
.
The constructor function sts
can be used to setup an
"sts"
object.
For areal time series, it can also capture a map
of the regions, where the counts originate from.
See Section “Slots” below for a description of all class
components, and Section “Methods” for a list of extraction,
conversion and visualization methods.
sts(observed, start = c(2000, 1), frequency = 52, epoch = NULL, population = NULL, map = NULL, ...)
sts(observed, start = c(2000, 1), frequency = 52, epoch = NULL, population = NULL, map = NULL, ...)
observed |
a vector (for a single time series) or matrix (one
time series per column) of counts. A purely numeric data frame will
also do (transformed via |
start , frequency
|
basic characteristics of the time series data
just like for simple |
epoch |
observation times, either as an integer sequence (default)
or as a |
population |
a vector of length the number of columns in
|
map |
optional spatial data representing the regions, either of
class |
... |
further named arguments with names corresponding to slot
names (see the list below). For instance, in the public health surveillance context,
the |
epoch
:a numeric vector specifying
the time of observation, typically a week index. Depending on
the freq
slot, it could also index days or months.
Furthermore, if epochAsDate=TRUE
then epoch
is the integer representation of Date
s
giving the exact date of the observation.
freq
:number of observations per year, e.g., 52 for weekly data, 12 for monthly data.
start
:vector of length two denoting the year and the sample number (week, month, etc.) of the first observation.
observed
:matrix of size length(epoch)
times the
number of regions containing the weekly/monthly number of counts in
each region. The colnames of the matrix should match the ID values of
the shapes in the map
slot.
state
:matrix with the same dimensions as observed
containing Booleans whether at the specific time point there was an
outbreak in the region.
alarm
:matrix with the same dimensions as
observed
specifying whether an outbreak detection algorithm
declared a specific time point in the region as having an alarm.
upperbound
:matrix with upper-bound values.
neighbourhood
:symmetric matrix of size
describing the neighbourhood structure. It
may either be a binary adjacency matrix or contain neighbourhood orders
(see the Examples for how to infer the latter from the
map
).
populationFrac
:matrix
of population
fractions or absolute numbers (see multinomialTS
below)
with dimensions dim(observed)
.
map
:object of class "SpatialPolygons"
(or "SpatialPolygonsDataFrame"
)
providing a shape of the areas which are monitored or modelled.
control
:list
of settings; this is a
rather free data type to be returned by the surveillance algorithms.
epochAsDate
:a Boolean indicating
if the epoch
slot corresponds to Date
s.
multinomialTS
:a Boolean
stating whether to interpret the object as observed
out of
population
, i.e. a multinomial interpretation instead of a
count interpretation.
There is an extraction (and replacement) method for almost every slot.
The name of the method corresponds to the slot name, with three exceptions:
the freq
slot can be extracted by frequency()
,
the populationFrac
slot is accessed by population()
,
and the alarm
slot is accessed by alarms()
.
signature(x = "sts")
:
extract the epoch
slot. If the sts
object is indexed
by dates (epochAsDate
= TRUE), the returned vector is of
class Date
, otherwise numeric (usually the integer
sequence 1:nrow(x)
).
By explicitly requesting epoch(x, as.Date = TRUE)
, dates
can also be extracted if the sts
object is not internally
indexed by dates but has a standard frequency of 12 (monthly) or
52 (weekly). The transformation is based on start
and
freq
and will return the first day of each month
(freq=12
) and the Monday of each week (freq=52
),
respectively.
signature(x = "sts")
:
extract the freq
slot.
signature(x = "sts")
:
extract the start
slot.
signature(x = "sts")
:
extract the observed
slot.
signature(x = "sts")
:
extract the alarm
slot.
signature(x = "sts")
:
extract the upperbound
slot.
signature(x = "sts")
:
extract the neighbourhood
slot.
signature(x = "sts")
:
extract the populationFrac
slot.
signature(x = "sts")
:
extract the control
slot.
signature(x = "sts")
:
extract the multinomialTS
slot.
signature(x = "sts")
:
extract matrix dimensions of observed
.
This method also enables nrow(x)
and ncol(x)
.
signature(x = "sts")
:
extract the dimnames
of the observed
matrix.
This method also enables rownames(x)
and colnames(x)
.
signature(x = "sts")
:
extract the corresponding year of each observation.
signature(x = "sts")
:
extract the epoch number within the year.
signature(x = "sts")
:
subset rows (time points) and/or columns (units),
see help("[,sts-method")
.
signature(x = "sts")
:
see aggregate.sts
.
signature(x = "sts")
:
the default as.data.frame
call will collect the following
slots into a data frame: observed
, epoch
,
state
, alarm
, upperbound
, and
populationFrac
. Additional columns will be created for
freq
(potentially varying by year for weekly or daily data
if x@epochAsDate
is TRUE
) and
epochInPeriod
(the epoch fraction within the current year).
Calling the as.data.frame
method with the argument
tidy = TRUE
will return tidy.sts(x)
,
which reshapes multivariate sts
objects to the
“long” format (one row per epoch and observational unit).
The tidy format is particularly useful for standard regression
models and customized plotting.
signature(from="sts", to="ts")
and
signature(from="ts", to="sts")
,
to be called via as(stsObj, "ts")
(or as.ts(stsObj)
)
and as(tsObj, "sts")
, respectively.
convert to the xts package format.
signature(x = "sts", y = "missing")
:
entry point to a collection of plot variants.
The type
of plot is specified using a formula,
see plot.sts
for details.
a ggplot2 variant of the standard
time-series-type plot, see autoplot.sts
.
see animate.sts
.
see toLatex.sts
.
Michael Höhle and Sebastian Meyer
showClass("sts") ## create an sts object from time-series data salmonellaDF <- read.table(system.file("extdata/salmonella.agona.txt", package = "surveillance"), header = TRUE) str(salmonellaDF) salmonella <- with(salmonellaDF, sts(observed = observed, state = state, start = c(1990, 1), frequency = 52)) salmonella plot(salmonella) ## these data are also available as a legacy "disProg" object in the package data(salmonella.agona) stopifnot(all.equal(salmonella, disProg2sts(salmonella.agona))) ## A typical dataset with weekly counts of measles from several districts data("measlesWeserEms") measlesWeserEms ## reconstruct data("measlesWeserEms") from its components counts <- observed(measlesWeserEms) map <- measlesWeserEms@map populationFrac <- population(measlesWeserEms) weserems_nbOrder <- neighbourhood(measlesWeserEms) ## orders of adjacency can also be determined from the map if (requireNamespace("spdep")) { stopifnot(identical(weserems_nbOrder, nbOrder(poly2adjmat(map)))) } mymeasles <- sts(counts, start = c(2001, 1), frequency = 52, population = populationFrac, neighbourhood = weserems_nbOrder, map = map) stopifnot(identical(mymeasles, measlesWeserEms)) ## convert ts/mts object to sts z <- ts(matrix(rpois(300,10), 100, 3), start = c(1961, 1), frequency = 12) z.sts <- as(z, "sts") plot(z.sts) ## conversion of "sts" objects to the quasi-standard "xts" class if (requireNamespace("xts")) { z.xts <- as.xts.sts(z.sts) plot(z.xts) }
showClass("sts") ## create an sts object from time-series data salmonellaDF <- read.table(system.file("extdata/salmonella.agona.txt", package = "surveillance"), header = TRUE) str(salmonellaDF) salmonella <- with(salmonellaDF, sts(observed = observed, state = state, start = c(1990, 1), frequency = 52)) salmonella plot(salmonella) ## these data are also available as a legacy "disProg" object in the package data(salmonella.agona) stopifnot(all.equal(salmonella, disProg2sts(salmonella.agona))) ## A typical dataset with weekly counts of measles from several districts data("measlesWeserEms") measlesWeserEms ## reconstruct data("measlesWeserEms") from its components counts <- observed(measlesWeserEms) map <- measlesWeserEms@map populationFrac <- population(measlesWeserEms) weserems_nbOrder <- neighbourhood(measlesWeserEms) ## orders of adjacency can also be determined from the map if (requireNamespace("spdep")) { stopifnot(identical(weserems_nbOrder, nbOrder(poly2adjmat(map)))) } mymeasles <- sts(counts, start = c(2001, 1), frequency = 52, population = populationFrac, neighbourhood = weserems_nbOrder, map = map) stopifnot(identical(mymeasles, measlesWeserEms)) ## convert ts/mts object to sts z <- ts(matrix(rpois(300,10), 100, 3), start = c(1961, 1), frequency = 12) z.sts <- as(z, "sts") plot(z.sts) ## conversion of "sts" objects to the quasi-standard "xts" class if (requireNamespace("xts")) { z.xts <- as.xts.sts(z.sts) plot(z.xts) }
sts
which
allows the user to store the results of back-projecting or nowcasting
surveillance time seriesA class inheriting from class sts
, but with additional slots
to store the result and associated confidence intervals from back
projection of a sts
object.
The slots are as for "sts"
. However, two
additional slots exists.
ci
:An array containing the upper and lower limit of the confidence interval.
lambda
:Back projection component
The methods are the same as for "sts"
.
signature(from = "sts", to = "stsBP")
:
convert an object of class sts
to class stsBP
.
M. Höhle
sts
which
allows the user to store the results of back-projecting
surveillance time seriesA class inheriting from class sts
, but with additional slots
to store the results of nowcasting.
The slots are as for "sts"
. However, a number of
additional slots exists.
reportingTriangle
:An array containing the upper and lower limit of the confidence interval.
predPMF
:Predictive distribution for each nowcasted time point.
pi
:A prediction interval for each nowcasted time
point. This is calculated based on predPMF
.
truth
:An object of type sts
containing the
true number of cases.
delayCDF
:List with the CDF of the estimated delay distribution for each method.
SR
:Possible output of proper scoring rules
The methods are the same as for "sts"
.
signature(from = "sts", to = "stsNC")
:
convert an object of class sts
to class stsNC
.
signature(x = "stsNC")
: extract the
reportingTriangle
slot of an stsNC
object.
signature(x = "stsNC")
: extract the
delayCDF
slot of an stsNC
object.
signature(x = "stsNC")
: extract the
scoring rules result slot of an stsNC
object.
signature(x = "stsNC")
: extract the
prediction interval slot of an stsNC
object.
M. Höhle
Animate a sequence of nowcasts stored as a list.
animate_nowcasts(nowcasts,linelist_truth, method="bayes.trunc.ddcp", control=list(dRange=NULL,anim.dRange=NULL, plot.dRange=NULL, consistent=FALSE, sys.sleep = 1, ylim=NULL,cex.names=0.7, col=c("violetred3","#2171B5","orange","blue","black", "greenyellow")), showLambda=TRUE)
animate_nowcasts(nowcasts,linelist_truth, method="bayes.trunc.ddcp", control=list(dRange=NULL,anim.dRange=NULL, plot.dRange=NULL, consistent=FALSE, sys.sleep = 1, ylim=NULL,cex.names=0.7, col=c("violetred3","#2171B5","orange","blue","black", "greenyellow")), showLambda=TRUE)
nowcasts |
A list of objects of class |
linelist_truth |
True linelist |
method |
Which method to show (has to be present in the nowcasts) |
control |
List with control options |
showLambda |
Boolean indicating whether to show the estimate for
the epidemic curve (only applied to |
This function is experimental and not yet fully documented.
M. Höhle
https://staff.math.su.se/hoehle/blog/2016/07/19/nowCast.html for a worked through example.
Reported number of cases of the Salmonella Newport serovar in Germany
2001-2015, by date of disease onset. The slot control
contains
a matrix reportingTriangle$n
with the reporting triangle as
described in Salmon et al. (2015).
data(stsNewport)
data(stsNewport)
A sts
object.
Salmon, M., Schumacher, D., Stark, K., Höhle, M. (2015): Bayesian outbreak detection in the presence of reporting delays. Biometrical Journal, 57 (6), 1051-1067.
This page gives an overview of plot types
for objects of class "sts"
.
## S4 method for signature 'sts,missing' plot(x, type = observed ~ time | unit, ...)
## S4 method for signature 'sts,missing' plot(x, type = observed ~ time | unit, ...)
x |
an object of class |
type |
see Details. |
... |
arguments passed to the |
There are various types of plots which can be produced from an
"sts"
object. The type
argument specifies the desired
plot as a formula, which defaults to observed ~ time | unit
,
i.e., plot the time series of each unit separately. Arguments to
specific plot functions can be passed as further arguments (...).
The following list describes the plot variants:
observed ~ time | unit
The default type shows
ncol(x)
plots, each containing the time series of one
observational unit. The actual plotting per unit is done by the
function stsplot_time1
, called sequentially from
stsplot_time
.
A ggplot2-based alternative for this type of plot is
provided through an autoplot
-method
for "sts"
objects.
observed ~ time
The observations in x
are
first aggregated
over units
and the resulting univariate time-series is plotted via the
function stsplot_time
.
alarm ~ time
Generates a so called alarmplot for a
multivariate sts
object. For each time point and each
series it is shown whether there is an alarm. In case of
hierarchical surveillance the user can pass
an additional argument lvl
, which is a vector of the
same length as rows in x
specifying for each time series
its level.
observed ~ unit
produces a map of counts (or incidence) per region aggregated over
time. See stsplot_space
for optional arguments,
details and examples.
NULL
(invisibly).
The methods are called for their side-effects.
the documentation of the individual plot types
stsplot_time
, stsplot_space
,
as well as the animate
method.
This is the plot
variant of type=observed~unit
for
"sts"
objects, i.e.,
plot(stsObj, type=observed~unit, ...)
calls the function
documented below. It produces an spplot
where regions are color-coded according to disease incidence
(either absolute counts or relative to population) over a given
time period.
stsplot_space(x, tps = NULL, map = x@map, population = NULL, main = NULL, labels = FALSE, ..., at = 10, col.regions = NULL, colorkey = list(space = "bottom", labels = list(at=at)), total.args = NULL, gpar.missing = list(col = "darkgrey", lty = 2, lwd = 2), sp.layout = NULL, xlim = bbox(map)[1, ], ylim = bbox(map)[2, ])
stsplot_space(x, tps = NULL, map = x@map, population = NULL, main = NULL, labels = FALSE, ..., at = 10, col.regions = NULL, colorkey = list(space = "bottom", labels = list(at=at)), total.args = NULL, gpar.missing = list(col = "darkgrey", lty = 2, lwd = 2), sp.layout = NULL, xlim = bbox(map)[1, ], ylim = bbox(map)[2, ])
x |
an object of class |
tps |
a numeric vector of one or more time points.
The unit-specific sum over all time points |
map |
an object inheriting from |
population |
if
|
main |
a main title for the plot. If |
labels |
determines if and how the regions of the |
... |
further arguments for |
at |
either a number of levels (default: 10) for the categorization
(color-coding) of counts/incidence,
or a numeric vector of specific break points,
or a named list of a number of levels ( |
col.regions |
a vector of fill colors, sufficiently long to serve all levels
(determined by |
colorkey |
a list describing the color key, see
|
total.args |
an optional list of arguments for |
gpar.missing |
list of graphical parameters for
|
sp.layout |
optional list of additional layout items, see |
xlim , ylim
|
numeric vectors of length 2 specifying the axis limits. |
a lattice plot of class
"trellis"
, but see
spplot
.
Sebastian Meyer
the central stsplot
-documentation for an overview of
plot types, and animate.sts
for animations of
"sts"
objects.
data("measlesWeserEms") # default plot: total region-specific counts over all weeks plot(measlesWeserEms, type = observed ~ unit) stsplot_space(measlesWeserEms) # the same # cumulative incidence (per 100'000 inhabitants), # with region labels and white borders plot(measlesWeserEms, observed ~ unit, population = measlesWeserEms@map$POPULATION / 100000, labels = list(labels = "GEN", cex = 0.7, font = 3), col = "white", lwd = 2, sub = "cumulative incidence (per 100'000 inhabitants)") # incidence in a particular week, manual color breaks, display total plot(measlesWeserEms, observed ~ unit, tps = 62, population = measlesWeserEms@map$POPULATION / 100000, at = c(0, 1, 5), total.args = list(x = 0, label = "Overall incidence: ")) # if we had only observed a subset of the regions plot(measlesWeserEms[,5:11], observed ~ unit, gpar.missing = list(col = "gray", lty = 4))
data("measlesWeserEms") # default plot: total region-specific counts over all weeks plot(measlesWeserEms, type = observed ~ unit) stsplot_space(measlesWeserEms) # the same # cumulative incidence (per 100'000 inhabitants), # with region labels and white borders plot(measlesWeserEms, observed ~ unit, population = measlesWeserEms@map$POPULATION / 100000, labels = list(labels = "GEN", cex = 0.7, font = 3), col = "white", lwd = 2, sub = "cumulative incidence (per 100'000 inhabitants)") # incidence in a particular week, manual color breaks, display total plot(measlesWeserEms, observed ~ unit, tps = 62, population = measlesWeserEms@map$POPULATION / 100000, at = c(0, 1, 5), total.args = list(x = 0, label = "Overall incidence: ")) # if we had only observed a subset of the regions plot(measlesWeserEms[,5:11], observed ~ unit, gpar.missing = list(col = "gray", lty = 4))
"sts"
Objects
These are the plot
variants of type=observed~time|unit
,
type=observed~time
, and type=alarm~time
for "sts"
objects (see the central "sts"
plot
-method for
an overview of plot types).
stsplot_time(x, units=NULL, as.one=FALSE, same.scale=TRUE, par.list=list(), ...) stsplot_time1(x, k=1, ylim=NULL, axes=TRUE, xaxis.tickFreq=list("%Q"=atChange), xaxis.labelFreq=xaxis.tickFreq, xaxis.labelFormat="%G\n\n%OQ", epochsAsDate=x@epochAsDate, xlab="time", ylab="No. infected", main=NULL, type="s", lty=c(1,1,2), col=c(NA,1,4), lwd=c(1,1,1), outbreak.symbol=list(pch=3, col=3, cex=1, lwd=1), alarm.symbol=list(pch=24, col=2, cex=1, lwd=1), legend.opts=list(), dx.upperbound=0L, hookFunc=function(){}, .hookFuncInheritance=function() {}, ...) stsplot_alarm(x, lvl=rep(1,ncol(x)), xaxis.tickFreq=list("%Q"=atChange), xaxis.labelFreq=xaxis.tickFreq, xaxis.labelFormat="%G\n\n%OQ", epochsAsDate=x@epochAsDate, xlab="time", ylab="", main=NULL, outbreak.symbol=list(pch=3, col=3, cex=1, lwd=1), alarm.symbol=list(pch=24, col=2, cex=1, lwd=1), cex.yaxis=1, ...)
stsplot_time(x, units=NULL, as.one=FALSE, same.scale=TRUE, par.list=list(), ...) stsplot_time1(x, k=1, ylim=NULL, axes=TRUE, xaxis.tickFreq=list("%Q"=atChange), xaxis.labelFreq=xaxis.tickFreq, xaxis.labelFormat="%G\n\n%OQ", epochsAsDate=x@epochAsDate, xlab="time", ylab="No. infected", main=NULL, type="s", lty=c(1,1,2), col=c(NA,1,4), lwd=c(1,1,1), outbreak.symbol=list(pch=3, col=3, cex=1, lwd=1), alarm.symbol=list(pch=24, col=2, cex=1, lwd=1), legend.opts=list(), dx.upperbound=0L, hookFunc=function(){}, .hookFuncInheritance=function() {}, ...) stsplot_alarm(x, lvl=rep(1,ncol(x)), xaxis.tickFreq=list("%Q"=atChange), xaxis.labelFreq=xaxis.tickFreq, xaxis.labelFormat="%G\n\n%OQ", epochsAsDate=x@epochAsDate, xlab="time", ylab="", main=NULL, outbreak.symbol=list(pch=3, col=3, cex=1, lwd=1), alarm.symbol=list(pch=24, col=2, cex=1, lwd=1), cex.yaxis=1, ...)
x |
an object of class |
units |
optional integer or character vector to select the units (=columns of
|
as.one |
logical indicating if all time series should be plotted
in a single frame (using |
same.scale |
logical indicating if all time series should be
plotted with the same |
par.list |
a list of arguments delivered to a call of
|
k |
the unit to plot, i.e., an element of |
ylim |
the y limits of the plot(s). Ignored if
|
axes |
a logical value indicating whether both axes should be drawn on the plot. |
xaxis.tickFreq , xaxis.labelFreq , xaxis.labelFormat
|
arguments for |
epochsAsDate |
Boolean indicating whether to treat the epochs as
Date objects (or to transform them to dates such that the new x-axis
formatting is applied).
Default: Value of the |
xlab |
a title for the x axis. See |
ylab |
a title for the y axis. See |
main |
an overall title for the plot: see 'title'. |
type |
type of plot to do. |
lty |
vector of length 3 specifying the line type for the three
lines in the plot – see |
col |
Vector of length 3 specifying the color to use in the
plot. The first color is the fill color of the polygons for the
counts bars ( |
lwd |
Vector of length 3 specifying the line width of the three
elements to plot. See also the |
alarm.symbol |
a list with entries |
outbreak.symbol |
a list with entries |
legend.opts |
a list of arguments for the
where individual elements are only |
dx.upperbound |
horizontal change in the plotting of the upperbound line. Sometimes it can be convenient to offset this line a little for better visibility. |
lvl |
A vector of length |
cex.yaxis |
The magnification to be used for y-axis annotation. |
hookFunc |
a function that is called after all the basic plotting has be done, i.e., it is not possible to control formatting with this function. See Examples. |
.hookFuncInheritance |
a function which is altered by sub-classes plot method. Do not alter this function manually. |
... |
further arguments for the function |
The time series plot relies on the work-horse stsplot_time1
.
Its arguments are (almost) similar to plot.survRes
.
NULL
(invisibly).
The functions are called for their side-effects.
Michael Höhle and Sebastian Meyer
There is an autoplot
-method, which
implements ggplot2-based time-series plots of "sts"
objects.
The stsplot
help page gives an overview of other
types of plots for "sts"
objects.
data("ha.sts") print(ha.sts) plot(ha.sts, type=observed ~ time | unit) # default multivariate type plot(ha.sts, units=c("mitt", "pank")) # selected units plot(ha.sts, type=observed ~ time) # aggregated over all districts ## Hook function example hookFunc <- function() grid(NA,NULL,lwd=1) plot(ha.sts, hookFunc=hookFunc) ## another multivariate time series example plotted "as.one" data("measlesDE") plot(measlesDE, units=1:2, as.one=TRUE, legend.opts=list(cex=0.8)) ## more sophisticated plots are offered by package "xts" if (requireNamespace("xts")) plot(as.xts.sts(measlesDE)) ## Use ISO8601 date formatting (see ?strptime) and no legend data("salmNewport") plot(aggregate(salmNewport,by="unit"), xlab="Time (weeks)", xaxis.tickFreq=list("%m"=atChange,"%G"=atChange), xaxis.labelFreq=list("%G"=atMedian),xaxis.labelFormat="%G") ## Formatting also works for daily data (illustrated by artificial ## outbreak converted to sts object via 'linelist2sts') set.seed(123) exposureTimes <- as.Date("2014-03-12") + sample(x=0:25,size=99,replace=TRUE) sts <- linelist2sts(data.frame(exposure=exposureTimes), dateCol="exposure",aggregate.by="1 day") ## Plot it with larger ticks for days than usual surveillance.options("stsTickFactors"=c("%d"=1, "%W"=0.33, "%V"=0.33, "%m"=1.75, "%Q"=1.25, "%Y"=1.5, "%G"=1.5)) plot(sts,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)")
data("ha.sts") print(ha.sts) plot(ha.sts, type=observed ~ time | unit) # default multivariate type plot(ha.sts, units=c("mitt", "pank")) # selected units plot(ha.sts, type=observed ~ time) # aggregated over all districts ## Hook function example hookFunc <- function() grid(NA,NULL,lwd=1) plot(ha.sts, hookFunc=hookFunc) ## another multivariate time series example plotted "as.one" data("measlesDE") plot(measlesDE, units=1:2, as.one=TRUE, legend.opts=list(cex=0.8)) ## more sophisticated plots are offered by package "xts" if (requireNamespace("xts")) plot(as.xts.sts(measlesDE)) ## Use ISO8601 date formatting (see ?strptime) and no legend data("salmNewport") plot(aggregate(salmNewport,by="unit"), xlab="Time (weeks)", xaxis.tickFreq=list("%m"=atChange,"%G"=atChange), xaxis.labelFreq=list("%G"=atMedian),xaxis.labelFormat="%G") ## Formatting also works for daily data (illustrated by artificial ## outbreak converted to sts object via 'linelist2sts') set.seed(123) exposureTimes <- as.Date("2014-03-12") + sample(x=0:25,size=99,replace=TRUE) sts <- linelist2sts(data.frame(exposure=exposureTimes), dateCol="exposure",aggregate.by="1 day") ## Plot it with larger ticks for days than usual surveillance.options("stsTickFactors"=c("%d"=1, "%W"=0.33, "%V"=0.33, "%m"=1.75, "%Q"=1.25, "%Y"=1.5, "%G"=1.5)) plot(sts,xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)")
"sts"
SlotsFor almost every slot of the "sts"
class, package surveillance
defines a generic function of the same name (and a replacement
version) to extract (or set) the corresponding slot.
See the "sts"
class documentation.
"sts"
ObjectsThe [
-method extracts parts of an
"sts"
object
using row (time) and column (unit) indices.
## S4 method for signature 'sts' x[i, j, ..., drop = FALSE]
## S4 method for signature 'sts' x[i, j, ..., drop = FALSE]
x |
an object of class |
i |
optional row index (integer or logical vector). |
j |
optional column index (character, integer, or logical vector). |
drop |
logical: Should subsetting by |
... |
ignored. |
Row indices are used to select a subset of the original time period.
The start
and epoch
slots of the time series are
adjusted accordingly.
A warning is issued if an irregular integer sequence is used to
extract rows, e.g., x[c(1,2,4),]
, which could destroy the
structure of the time series (freq
).
Column indices work as usual when indexing matrices,
so may select units by name, position or a vector of booleans.
When subsetting columns, population fractions are recomputed if and
only if x
is no multinomialTS
and already contains
population fractions.
NA
indices are not supported, negative indices are.
Note that a [<-
method (i.e., subassignment) is not implemented.
an object of class "sts"
.
data("ha.sts") # Show a (subset of a) single time series plot(ha.sts[,7]) plot(ha.sts[year(ha.sts)==2006, 7]) # Map a single time point plot(ha.sts[5*52+26,], type=observed~unit) plot(ha.sts, type=observed~unit, tps=5*52+26) # same -> ?stsplot_space # Restrict the data (and the map) to a subset of the districts plot(ha.sts[,c("pank","lich")], type=observed~unit, labels=TRUE) plot(ha.sts[,c("pank","lich"),drop=TRUE], type=observed~unit, labels=TRUE)
data("ha.sts") # Show a (subset of a) single time series plot(ha.sts[,7]) plot(ha.sts[year(ha.sts)==2006, 7]) # Map a single time point plot(ha.sts[5*52+26,], type=observed~unit) plot(ha.sts, type=observed~unit, tps=5*52+26) # same -> ?stsplot_space # Restrict the data (and the map) to a subset of the districts plot(ha.sts[,c("pank","lich")], type=observed~unit, labels=TRUE) plot(ha.sts[,c("pank","lich"),drop=TRUE], type=observed~unit, labels=TRUE)
Query, set or reset options specific to the surveillance
package, similar to what options
does for global settings.
surveillance.options(...) reset.surveillance.options()
surveillance.options(...) reset.surveillance.options()
... |
Either empty, or a sequence of option names (as strings),
or a sequence of
|
reset.surveillance.options
reverts all options to their default
values and (invisibly) returns these in a list.
For surveillance.options
, the following holds:
If no arguments are given, the current values of all package options are returned in a list.
If one option name is given, the current value of this option is returned (not in a list, just the value).
If several option names are given, the current values of these options are returned in a list.
If name=value
pairs are given, the named options
are set to the given values, and the previous values of
these options are returned in a list.
surveillance.options()
surveillance.options()
"sts"
Object to a Data Frame in Long (Tidy) Format
The resulting data frame will have a row for each time point and
observational unit, and columns corresponding to the slots of the
"sts"
object (except for populationFrac
,
which is named population
).
Some time variables are added for convenience:
year
, epochInYear
, epochInPeriod
, date
(the latter gives NA
dates if epoch(x, as.Date=TRUE)
fails, i.e., for non-standard frequency(x)
if x@epochAsDate
is false).
tidy.sts(x, ...)
tidy.sts(x, ...)
x |
an object of class |
... |
unused. |
Sebastian Meyer
data("momo") momodat <- tidy.sts(momo) head(momodat) ## tidy.sts(stsObj) is the same as as.data.frame(stsObj, tidy = TRUE) stopifnot(identical(as.data.frame(momo, tidy = TRUE), momodat))
data("momo") momodat <- tidy.sts(momo) head(momodat) ## tidy.sts(stsObj) is the same as as.data.frame(stsObj, tidy = TRUE) stopifnot(identical(as.data.frame(momo, tidy = TRUE), momodat))
toLatex
-Method for "sts"
ObjectsConvert "sts"
objects to a
character vector with LaTeX markup.
## S4 method for signature 'sts' toLatex(object, caption = "",label=" ", columnLabels = NULL, subset = NULL, alarmPrefix = "\\textbf{\\textcolor{red}{", alarmSuffix = "}}", ubColumnLabel = "UB", ...)
## S4 method for signature 'sts' toLatex(object, caption = "",label=" ", columnLabels = NULL, subset = NULL, alarmPrefix = "\\textbf{\\textcolor{red}{", alarmSuffix = "}}", ubColumnLabel = "UB", ...)
object |
an |
caption |
A caption for the table. Default is the empty string. |
label |
A label for the table. Default is the empty string. |
columnLabels |
A list of labels for each column of the resulting table. Default is NULL |
subset |
A range of values which should be displayed. If Null, then all data in the sts objects will be displayed. Else only a subset of data. Therefore range needs to be a numerical vector of indexes from 1 to length(@observed). |
alarmPrefix |
A latex compatible prefix string wrapped around a table cell iff there is an alarm;i.e. alarm = TRUE |
alarmSuffix |
A latex compatible suffix string wrapped around a table cell iff there is an alarm;i.e. alarm[i,j] = TRUE |
ubColumnLabel |
The label of the upper bound column; default is \"UB\". |
... |
further arguments passed to |
An object of class "Latex"
.
Dirk Schumacher
# Create a test object data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) control <- list(range=(260:312), noPeriods=1,populationOffset=FALSE, fitFun="algo.farrington.fitGLM.flexible", b=4,w=3,weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05,trend=TRUE, thresholdMethod="delta",alpha=0.1) salm <- farringtonFlexible(salm,control=control) toLatex(salm, sanitize.text.function=identity, comment=FALSE)
# Create a test object data("salmonella.agona") # Create the corresponding sts object from the old disProg object salm <- disProg2sts(salmonella.agona) control <- list(range=(260:312), noPeriods=1,populationOffset=FALSE, fitFun="algo.farrington.fitGLM.flexible", b=4,w=3,weightsThreshold=1, pastWeeksNotIncluded=3, pThresholdTrend=0.05,trend=TRUE, thresholdMethod="delta",alpha=0.1) salm <- farringtonFlexible(salm,control=control) toLatex(salm, sanitize.text.function=identity, comment=FALSE)
twinSIR
is used to fit additive-multiplicative intensity models for
epidemics as described in Höhle (2009). Estimation is driven
by (penalized) maximum likelihood in the point process frame work. Optimization
(maximization) of the (penalized) likelihood function is performed by means of
optim
.
The implementation is illustrated in Meyer et al. (2017, Section 4),
see vignette("twinSIR")
.
twinSIR(formula, data, weights, subset, knots = NULL, nIntervals = 1, lambda.smooth = 0, penalty = 1, optim.args = list(), model = TRUE, keep.data = FALSE)
twinSIR(formula, data, weights, subset, knots = NULL, nIntervals = 1, lambda.smooth = 0, penalty = 1, optim.args = list(), model = TRUE, keep.data = FALSE)
formula |
an object of class |
data |
an object inheriting from class |
weights |
an optional vector of weights to be used in the fitting process. Should be
|
subset |
an optional vector specifying a subset of observations to be used in the
fitting process. The subset |
knots |
numeric vector or |
nIntervals |
the number of intervals of constant log-baseline hazard. Defaults to 1, which means an overall constant log-baseline hazard will be fitted. |
lambda.smooth |
numeric, the smoothing parameter |
penalty |
either a single number denoting the order of the difference used to penalize
the log-baseline coefficients (defaults to 1), or a more specific penalty
matrix |
optim.args |
a list with arguments passed to the
|
model |
logical indicating if the model frame, the |
keep.data |
logical indicating if the |
A model is specified through the formula
, which has the form
~ epidemicTerm1 + epidemicTerm2 + cox(endemicVar1) *
cox(endemicVar2)
,
i.e. the right hand side has the usual form as in lm
with
some variables marked as being endemic by the special function
cox
. The left hand side of the formula is empty and will be
set internally to cbind(start, stop, event)
, which is similar to
Surv(start, stop, event, type="counting")
in package survival.
Basically, the additive-multiplicative model for the infection intensity
for individual
is
where
is the at-risk indicator, indicating if individual is
“at risk” of becoming infected at time point
.
This variable is part of the event history
data
.
is the epidemic component of the infection intensity, defined as
where is the set of infectious individuals just before time
point
,
is the coordinate vector of individual
and the function
is defined as
with unknown transmission parameters and known distance
functions
. This set of distance functions results in the set of
epidemic variables normally calculated by the converter function
as.epidata
, considering the equality
with being the
'th epidemic variable for individual
.
is the endemic (cox
) component of the infection intensity, defined
as
where is the log-baseline hazard function,
is the vector of endemic covariates of individual
and
is the vector of unknown coefficients.
To fit the model, the log-baseline hazard function is approximated by a
piecewise constant function with known knots, but unknown levels,
which will be estimated. The approximation is specified by the arguments
knots
or nIntervals
.
If a big number of knots
(or nIntervals
) is chosen, the
corresponding log-baseline parameters can be rendered identifiable by
the use of penalized likelihood inference. At present, it is the job
of the user to choose an adequate value of the smoothing parameter
lambda.smooth
. Alternatively, a data driven
lambda.smooth
smoothing parameter selection based on a mixed
model representation of an equivalent truncated power spline is offered (see
reference for further details). The following two steps are iterated
until convergence:
Given fixed smoothing parameter, the penalized likelihood is optimized for the regression components using a L-BFGS-B approach
Given fixed regression parameters, a Laplace approximation of the marginal likelihood for the smoothing parameter is numerically optimized.
Depending on the data, convergence might take a couple of iterations.
Note also that it is unwise to include endemic covariates with huge values,
as they affect the intensities on the exponential scale (after
multiplication by the parameter vector ).
With large covariate values, the
optim
method "L-BFGS-B" will likely terminate due to an infinite
log-likelihood or score function in some iteration.
twinSIR
returns an object of class
"twinSIR"
, which is a list containing the following components:
coefficients |
a named vector of coefficients. |
loglik |
the maximum of the (penalized) log-likelihood function. |
counts |
the number of log-likelihood and score function evaluations. |
converged |
logical indicating convergence of the optimization algorithm. |
fisherinfo.observed |
if requested, the negative Hessian from
|
fisherinfo |
an estimation of the Expected Fisher Information matrix. |
method |
the optimization algorithm used. |
intervals |
a numeric vector ( |
nEvents |
a numeric vector containing the number of infections in each of
the above |
model |
if requested, the model information used. This is a list with
components |
data |
if requested, the supplied |
call |
the matched call. |
formula |
the specified |
terms |
the |
There are some restrictions to modelling the infection intensity
without a baseline hazard rate, i.e. without an intercept in the
formula
.
Reason: At some point, the optimization algorithm L-BFGS-B tries to set all
transmission parameters to the boundary value 0 and to calculate
the (penalized) score function with this set of parameters (all 0). The problem
then is that the values of the infection intensities
are 0
for all
and
and especially at observed event times, which is
impossible. Without a baseline, it is not allowed to have all alpha's set to 0,
because then we would not observe any infections. Unfortunately, L-BFGS-B can
not consider this restriction. Thus, if one wants to fit a model without
baseline hazard, the control parameter
lower
must be specified in
optim.args
so that some alpha is strictly positive, e.g.
optim.args = list(lower = c(0,0.001,0.001,0))
and the initial parameter
vector par
must not be the zero vector.
Michael Höhle and Sebastian Meyer
Höhle, M. (2009), Additive-multiplicative regression models for spatio-temporal epidemics, Biometrical Journal, 51 (6), 961-978.
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
as.epidata
for the necessary data input structure,
plot.twinSIR
for plotting the path of the infection intensity,
profile.twinSIR
for profile likelihood estimation.
and simulate.twinSIR
for the simulation of epidemics following
the fitted model.
Furthermore, the standard extraction methods
vcov
, logLik
,
AIC
and
extractAIC
are implemented for
objects of class "twinSIR"
.
data("hagelloch") summary(hagelloch) # simple model with an overall constant baseline hazard rate fit1 <- twinSIR(~ household + cox(AGE), data = hagelloch) fit1 summary(fit1) # see also help("summary.twinSIR") plot(fit1) # see also help("plot.twinSIR") checkResidualProcess(fit1) # could be better # fit a piecewise constant baseline hazard rate with 3 intervals using # _un_penalized ML and estimated coefs from fit1 as starting values fit2 <- twinSIR(~ household, data = hagelloch, nIntervals = 3, optim.args = list(par = coef(fit1)[c(1,2,2,2)])) summary(fit2) # fit a piecewise constant baseline hazard rate with 7 intervals # using _penalized_ ML fit3 <- twinSIR(~ household, data = hagelloch, nIntervals = 7, lambda.smooth = 0.1, penalty = 1) summary(fit3) checkResidualProcess(fit3) # plot the estimated log-baseline levels plot(x=fit2$intervals, y=coef(fit2)[c(2,2:4)], type="S", ylim=c(-6, -1)) lines(x=fit3$intervals, y=coef(fit3)[c(2,2:8)], type="S", col=2) legend("right", legend=c("unpenalized 3", "penalized 7"), lty=1, col=1:2, bty="n") ## special use case: fit the model to a subset of the events only, ## while preserving epidemic contributions from the remainder ## (maybe some buffer area nodes) fit_subset <- twinSIR(~ household, data = hagelloch, subset = CL=="preschool") summary(fit_subset)
data("hagelloch") summary(hagelloch) # simple model with an overall constant baseline hazard rate fit1 <- twinSIR(~ household + cox(AGE), data = hagelloch) fit1 summary(fit1) # see also help("summary.twinSIR") plot(fit1) # see also help("plot.twinSIR") checkResidualProcess(fit1) # could be better # fit a piecewise constant baseline hazard rate with 3 intervals using # _un_penalized ML and estimated coefs from fit1 as starting values fit2 <- twinSIR(~ household, data = hagelloch, nIntervals = 3, optim.args = list(par = coef(fit1)[c(1,2,2,2)])) summary(fit2) # fit a piecewise constant baseline hazard rate with 7 intervals # using _penalized_ ML fit3 <- twinSIR(~ household, data = hagelloch, nIntervals = 7, lambda.smooth = 0.1, penalty = 1) summary(fit3) checkResidualProcess(fit3) # plot the estimated log-baseline levels plot(x=fit2$intervals, y=coef(fit2)[c(2,2:4)], type="S", ylim=c(-6, -1)) lines(x=fit3$intervals, y=coef(fit3)[c(2,2:8)], type="S", col=2) legend("right", legend=c("unpenalized 3", "penalized 7"), lty=1, col=1:2, bty="n") ## special use case: fit the model to a subset of the events only, ## while preserving epidemic contributions from the remainder ## (maybe some buffer area nodes) fit_subset <- twinSIR(~ household, data = hagelloch, subset = CL=="preschool") summary(fit_subset)
twinSIR
Models
intensityplot
methods to plot the evolution of the total infection
intensity, its epidemic proportion or its endemic proportion over time.
The default plot
method for objects of class "twinSIR"
is just a wrapper for the intensityplot
method.
The implementation is illustrated in Meyer et al. (2017, Section 4),
see vignette("twinSIR")
.
## S3 method for class 'twinSIR' plot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), ...) ## S3 method for class 'twinSIR' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = TRUE, theta = NULL, plot = TRUE, add = FALSE, rug.opts = list(), ...) ## S3 method for class 'simEpidata' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = TRUE, theta = NULL, plot = TRUE, add = FALSE, rug.opts = list(), ...)
## S3 method for class 'twinSIR' plot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), ...) ## S3 method for class 'twinSIR' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = TRUE, theta = NULL, plot = TRUE, add = FALSE, rug.opts = list(), ...) ## S3 method for class 'simEpidata' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = TRUE, theta = NULL, plot = TRUE, add = FALSE, rug.opts = list(), ...)
x |
an object of class |
which |
|
aggregate |
logical. Determines whether lines for all individual infection
intensities should be drawn ( |
theta |
numeric vector of model coefficients. If |
plot |
logical indicating if a plot is desired, defaults to |
add |
logical. If |
rug.opts |
either a list of arguments passed to the function |
... |
For the |
numeric matrix with the first column "stop"
and as many rows as there
are "stop"
time points in the event history x
. The other
columns depend on the argument aggregate
: if TRUE
, there
is only one other column named which
, which contains the values of
which
at the respective "stop"
time points. Otherwise, if
aggregate = FALSE
, there is one column for each individual, each of
them containing the individual which
at the respective "stop"
time points.
Sebastian Meyer
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
twinSIR
for a description of the intensity model, and
simulate.twinSIR
for the simulation of epidemic data
according to a twinSIR
specification.
data("hagelloch") plot(hagelloch) # a simplistic twinSIR model fit <- twinSIR(~ household, data = hagelloch) # overall total intensity plot(fit, which = "total") # overall epidemic proportion epi <- plot(fit, which = "epidemic", ylim = c(0, 1)) head(epi) # add overall endemic proportion = 1 - epidemic proportion ende <- plot(fit, which = "endemic", add = TRUE, col = 2) legend("topleft", legend = "endemic proportion", lty = 1, col = 2, bty = "n") # individual intensities tmp <- plot(fit, which = "total", aggregate = FALSE, col = rgb(0, 0, 0, alpha = 0.1), main = expression("Individual infection intensities " * lambda[i](t) == Y[i](t) %.% (e[i](t) + h[i](t)))) # return value: matrix of individual intensity paths str(tmp) # plot intensity path only for individuals 3 and 99 matplot(x = tmp[,1], y = tmp[,1+c(3,99)], type = "S", ylab = "Force of infection", xlab = "time", main = expression("Paths of the infection intensities " * lambda[3](t) * " and " * lambda[99](t))) legend("topright", legend = paste("Individual", c(3,99)), col = 1:2, lty = 1:2)
data("hagelloch") plot(hagelloch) # a simplistic twinSIR model fit <- twinSIR(~ household, data = hagelloch) # overall total intensity plot(fit, which = "total") # overall epidemic proportion epi <- plot(fit, which = "epidemic", ylim = c(0, 1)) head(epi) # add overall endemic proportion = 1 - epidemic proportion ende <- plot(fit, which = "endemic", add = TRUE, col = 2) legend("topleft", legend = "endemic proportion", lty = 1, col = 2, bty = "n") # individual intensities tmp <- plot(fit, which = "total", aggregate = FALSE, col = rgb(0, 0, 0, alpha = 0.1), main = expression("Individual infection intensities " * lambda[i](t) == Y[i](t) %.% (e[i](t) + h[i](t)))) # return value: matrix of individual intensity paths str(tmp) # plot intensity path only for individuals 3 and 99 matplot(x = tmp[,1], y = tmp[,1+c(3,99)], type = "S", ylab = "Force of infection", xlab = "time", main = expression("Paths of the infection intensities " * lambda[3](t) * " and " * lambda[99](t))) legend("topright", legend = paste("Individual", c(3,99)), col = 1:2, lty = 1:2)
"twinSIR"
Objects
Besides print
and summary
methods there are also some standard
extraction methods defined for objects of class "twinSIR"
:
vcov
, logLik
and especially AIC
and
extractAIC
, which extract Akaike's Information Criterion. Note that
special care is needed, when fitting models with parameter constraints such as
the epidemic effects in
twinSIR
models. Parameter
constraints reduce the average increase in the maximized loglikelihood - thus
the penalty for constrained parameters should be smaller than the factor 2 used
in the ordinary definition of AIC. To this end, these two methods offer the
calculation of the so-called one-sided AIC (OSAIC).
## S3 method for class 'twinSIR' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'twinSIR' summary(object, correlation = FALSE, symbolic.cor = FALSE, ...) ## S3 method for class 'twinSIR' AIC(object, ..., k = 2, one.sided = NULL, nsim = 1e3) ## S3 method for class 'twinSIR' extractAIC(fit, scale = 0, k = 2, one.sided = NULL, nsim = 1e3, ...) ## S3 method for class 'twinSIR' vcov(object, ...) ## S3 method for class 'twinSIR' logLik(object, ...) ## S3 method for class 'summary.twinSIR' print(x, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ...)
## S3 method for class 'twinSIR' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'twinSIR' summary(object, correlation = FALSE, symbolic.cor = FALSE, ...) ## S3 method for class 'twinSIR' AIC(object, ..., k = 2, one.sided = NULL, nsim = 1e3) ## S3 method for class 'twinSIR' extractAIC(fit, scale = 0, k = 2, one.sided = NULL, nsim = 1e3, ...) ## S3 method for class 'twinSIR' vcov(object, ...) ## S3 method for class 'twinSIR' logLik(object, ...) ## S3 method for class 'summary.twinSIR' print(x, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ...)
x , object , fit
|
an object of class |
digits |
integer, used for number formatting with |
correlation |
logical. if |
symbolic.cor |
logical. If |
... |
For the |
k |
numeric specifying the "weight" of the penalty to be used;
in an unconstrained fit |
one.sided |
logical or |
nsim |
when there are more than two epidemic covariates in the fit, the weights in the OSAIC formula have to be determined by simulation. Default is to use 1000 samples. Note that package quadprog is additionally required in this case. |
scale |
unused (argument of the generic). |
signif.stars |
logical. If |
The print
and summary
methods allow the compact or comprehensive
representation of the fitting results, respectively. The former only prints
the original function call, the estimated coefficients and the maximum
log-likelihood value. The latter prints the whole coefficient matrix with
standard errors, z- and p-values (see printCoefmat
), and
additionally the number of infections per log-baseline interval
,
the (one-sided) AIC and the number of log-likelihood evaluations. They both
append a big “WARNING”, if the optimization algorithm did not converge.
The estimated coefficients may be extracted by using the default
coef
-method from package stats.
The two AIC functions differ only in that AIC
can take more than one
fitted model object and that extractAIC
always returns the number of
parameters in the model (AIC
only does with more than one fitted model
object).
Concerning the choice of one-sided AIC: parameter constraints – such as the
non-negative constraints for the epidemic effects alpha in twinSIR
models – reduce the average increase in the maximized loglikelihood. Thus,
the penalty for constrained parameters should be smaller than the factor 2
used in the ordinary definition of AIC. One-sided AIC (OSAIC) suggested by
Hughes and King (2003) is such a proposal when out of
parameters have non-negative constraints:
where are
-specific weights. For more details see
Section 5.2 in Höhle (2009).
The print
methods return their first argument, invisibly, as
they always should. The vcov
and logLik
methods return the estimated variance-covariance
matrix of the parameters (here, the inverse of the estimate of the
expected Fisher information matrix), and the maximum log-likelihood
value of the model, respectively.
The summary
method returns a list containing some summary
statistics of the fitted model, which is nicely printed by the
corresponding print
method.
For the AIC
and extractAIC
methods, see
the documentation of the corresponding generic functions.
Michael Höhle and Sebastian Meyer
Hughes A, King M (2003) Model selection using AIC in the presence of one-sided information. Journal of Statistical Planning and Inference 115, pp. 397–411.
Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.
data("hagelloch") # a simplistic twinSIR model fit <- twinSIR(~ household + cox(AGE), data = hagelloch) coef(fit) vcov(fit) logLik(fit) summary(fit, correlation = TRUE, symbolic.cor = TRUE) # AIC or OSAIC AIC(fit) AIC(fit, one.sided = FALSE) extractAIC(fit) extractAIC(fit, one.sided = FALSE) # comparing models via AIC fit2 <- update(fit, nIntervals = 2) AIC(fit, fit2) # the 2nd column should be named "OSAIC" here
data("hagelloch") # a simplistic twinSIR model fit <- twinSIR(~ household + cox(AGE), data = hagelloch) coef(fit) vcov(fit) logLik(fit) summary(fit, correlation = TRUE, symbolic.cor = TRUE) # AIC or OSAIC AIC(fit) AIC(fit, one.sided = FALSE) extractAIC(fit) extractAIC(fit, one.sided = FALSE) # comparing models via AIC fit2 <- update(fit, nIntervals = 2) AIC(fit, fit2) # the 2nd column should be named "OSAIC" here
Function to compute estimated and profile likelihood based confidence
intervals. Computations might be cumbersome!
There is a simple plot
-method for the result.
## S3 method for class 'twinSIR' profile(fitted, profile, alpha = 0.05, control = list(fnscale = -1, factr = 10, maxit = 100), ...)
## S3 method for class 'twinSIR' profile(fitted, profile, alpha = 0.05, control = list(fnscale = -1, factr = 10, maxit = 100), ...)
fitted |
an object of class |
profile |
a list with elements being numeric vectors of length 4. These vectors must
have the form
|
alpha |
|
control |
control object to use in |
... |
unused (argument of the generic). |
a list with profile log-likelihood evaluations on the grid and highest likelihood
and Wald confidence intervals. The argument profile
is also returned.
The result has class "profile.twinSIR"
, for which a simple (undocumented)
plot
-method is available.
Michael Höhle and Sebastian Meyer
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) gridsize <- if (interactive()) 35 else 5 # for fast tests prof <- profile(fit, list(c(1, NA, NA, gridsize))) prof$ci.hl plot(prof)
data("hagelloch") fit <- twinSIR(~ household, data = hagelloch) gridsize <- if (interactive()) 35 else 5 # for fast tests prof <- profile(fit, list(c(1, NA, NA, gridsize))) prof$ci.hl plot(prof)
This function simulates the infection (and removal) times of an epidemic. Besides the classical SIR type of epidemic, also SI, SIRS and SIS epidemics are supported. Simulation works via the conditional intensity of infection of an individual, given some (time varying) endemic covariates and/or some distance functions (epidemic components) as well as the fixed positions of the individuals. The lengths of the infectious and removed periods are generated following a pre-specified function (can be deterministic).
The simulate
method for objects of class
"twinSIR"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
simEpidata(formula, data, id.col, I0.col, coords.cols, subset, beta, h0, f = list(), w = list(), alpha, infPeriod, remPeriod = function(ids) rep(Inf, length(ids)), end = Inf, trace = FALSE, .allocate = NULL) ## S3 method for class 'twinSIR' simulate(object, nsim = 1, seed = 1, infPeriod = NULL, remPeriod = NULL, end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL, data = object$data, ...)
simEpidata(formula, data, id.col, I0.col, coords.cols, subset, beta, h0, f = list(), w = list(), alpha, infPeriod, remPeriod = function(ids) rep(Inf, length(ids)), end = Inf, trace = FALSE, .allocate = NULL) ## S3 method for class 'twinSIR' simulate(object, nsim = 1, seed = 1, infPeriod = NULL, remPeriod = NULL, end = diff(range(object$intervals)), trace = FALSE, .allocate = NULL, data = object$data, ...)
formula |
an object of class |
data |
a data.frame containing the variables in For the |
id.col |
only if |
I0.col |
only if |
coords.cols |
only if |
subset |
an optional vector specifying a subset of the covariate history to be used in the simulation. |
beta |
numeric vector of length equal the number of endemic ( |
h0 |
either a single number to specify a constant baseline hazard
(equal to |
f , w
|
see |
alpha |
a named numeric vector of coefficients for the epidemic
covariates generated by |
infPeriod |
a function generating lengths of infectious periods. It should take one
parameter (e.g. Note that it is even possible to simulate an SI-epidemic by setting
In other words: once an individual became infected it spreads the disease forever, i.e. it will never be removed. |
remPeriod |
a function generating lengths of removal periods. Per default, once an
individual was removed it will stay in this state forever ( |
end |
a single positive numeric value specifying the time point at which the
simulation should be forced to end. By default, this is |
trace |
logical (or integer) indicating if (or how often) the sets of susceptible
and infected individuals as well as the rejection indicator (of the
rejection sampling step) should be |
.allocate |
number of blocks to initially allocate for the event history (i.e.
|
object |
an object of class |
nsim |
number of epidemics to simulate. Defaults to 1. |
seed |
an integer that will be used in the call to |
... |
unused (argument of the generic). |
A model is specified through the formula
, which has the form
cbind(start, stop) ~ cox(endemicVar1) * cox(endemicVar2)
,
i.e. the right hand side has the usual form as in lm
, but
all variables are marked as being endemic by the special function
cox
. The effects of those predictor terms are specified by
beta
. The left hand side of the formula denotes the start
and stop columns in data
. This can be omitted, if data
inherits
from class "epidata"
in which case cbind(start, stop)
will be
used. The epidemic model component is specified by the arguments
f
and w
(and the associated coefficients alpha
).
If the epidemic model component is empty and infPeriod
always returns Inf
, then one actually simulates from a pure Cox model.
The simulation algorithm used is Ogata's modified thinning. For details, see Höhle (2009), Section 4.
An object of class "simEpidata"
, which is a data.frame
with the
columns "id"
, "start"
, "stop"
, "atRiskY"
,
"event"
, "Revent"
and the coordinate columns (with the original
names from data
), which are all obligatory. These columns are followed
by all the variables appearing on the rhs of the formula
. Last but not
least, the generated columns with epidemic covariates corresponding to the
functions in the lists f
and w
are appended.
Note that objects of class "simEpidata"
also inherit from class
"epidata"
, thus all "epidata"
methods can be
applied.
The data.frame
is given the additional attributes
"eventTimes" |
numeric vector of infection time points (sorted chronologically). |
"timeRange" |
numeric vector of length 2: |
"coords.cols" |
numeric vector containing the column indices of the coordinate columns in the resulting data-frame. |
"f" |
this equals the argument |
"w" |
this equals the argument |
"config" |
a list with elements |
call |
the matched call. |
terms |
the |
If nsim > 1
epidemics are simulated by the
simulate
-method for fitted "twinSIR"
models, these are
returned in a list.
Sebastian Meyer and Michael Höhle
Höhle, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.
The plot.epidata
and animate.epidata
methods
for plotting and animating (simulated) epidemic data, respectively.
The intensityplot.simEpidata
method for plotting paths of
infection intensities.
Function twinSIR
for fitting spatio-temporal epidemic intensity
models to epidemic data.
## Generate a data frame containing a hypothetic population with 100 individuals set.seed(1234) n <- 100 pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y"))) pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2], gender=sample(0:1, n, replace=TRUE), I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious start=rep(0,n), stop=rep(Inf,n)) ## Simulate an SIR epidemic in this population set.seed(123) infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n) epi <- simEpidata( cbind(start,stop) ~ cox(gender), data = pop, id.col = "id", I0.col = "I0col", coords.cols = c("x","y"), beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1), infPeriod = function(ids) infPeriods[ids], ##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30 # -> SIRS ) ## extract event times by id head(summary(epi)$byID) ## Plot the numbers of susceptible, infectious and removed individuals plot(epi) ## load the 1861 Hagelloch measles epidemic data("hagelloch") summary(hagelloch) plot(hagelloch) ## fit a simplistic twinSIR model fit <- twinSIR(~ household, data = hagelloch) ## simulate a new epidemic from the above model ## with simulation period = observation period, re-using observed infPeriods sim1 <- simulate(fit, data = hagelloch) plot(sim1) ## check if we find similar parameters in the simulated epidemic fitsim1 <- update(fit, data = sim1) cbind(base = coef(fit), new = coef(fitsim1)) if (surveillance.options("allExamples")) { ## simulate only 10 days, using random infPeriods ~ Exp(0.1) sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10, infPeriod = function(ids) rexp(length(ids), rate = 0.1)) plot(sim2) ## simulate from a different model with manually specified parameters set.seed(321) simepi <- simEpidata(~ cox(AGE), data = hagelloch, beta = c(0.1), h0 = -4, alpha = c(household = 0.05), f = list(household = function(u) u == 0), infPeriod = function(ids) rexp(length(ids), rate=1/8)) plot(simepi) intensityplot(simepi) ## see if we correctly estimate the parameters fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi) cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi)) }
## Generate a data frame containing a hypothetic population with 100 individuals set.seed(1234) n <- 100 pos <- matrix(rnorm(n*2), ncol=2, dimnames=list(NULL, c("x", "y"))) pop <- data.frame(id=1:n, x=pos[,1], y=pos[,2], gender=sample(0:1, n, replace=TRUE), I0col=c(rep(1,3),rep(0,n-3)), # 3 initially infectious start=rep(0,n), stop=rep(Inf,n)) ## Simulate an SIR epidemic in this population set.seed(123) infPeriods <- setNames(c(1:3/10, rexp(n-3, rate=1)), 1:n) epi <- simEpidata( cbind(start,stop) ~ cox(gender), data = pop, id.col = "id", I0.col = "I0col", coords.cols = c("x","y"), beta = c(-2), h0 = -1, alpha = c(B1=0.1), f = list(B1=function(u) u<=1), infPeriod = function(ids) infPeriods[ids], ##remPeriod = function(ids) rexp(length(ids), rate=0.1), end = 30 # -> SIRS ) ## extract event times by id head(summary(epi)$byID) ## Plot the numbers of susceptible, infectious and removed individuals plot(epi) ## load the 1861 Hagelloch measles epidemic data("hagelloch") summary(hagelloch) plot(hagelloch) ## fit a simplistic twinSIR model fit <- twinSIR(~ household, data = hagelloch) ## simulate a new epidemic from the above model ## with simulation period = observation period, re-using observed infPeriods sim1 <- simulate(fit, data = hagelloch) plot(sim1) ## check if we find similar parameters in the simulated epidemic fitsim1 <- update(fit, data = sim1) cbind(base = coef(fit), new = coef(fitsim1)) if (surveillance.options("allExamples")) { ## simulate only 10 days, using random infPeriods ~ Exp(0.1) sim2 <- simulate(fit, data = hagelloch, seed = 2, end = 10, infPeriod = function(ids) rexp(length(ids), rate = 0.1)) plot(sim2) ## simulate from a different model with manually specified parameters set.seed(321) simepi <- simEpidata(~ cox(AGE), data = hagelloch, beta = c(0.1), h0 = -4, alpha = c(household = 0.05), f = list(household = function(u) u == 0), infPeriod = function(ids) rexp(length(ids), rate=1/8)) plot(simepi) intensityplot(simepi) ## see if we correctly estimate the parameters fitsimepi <- twinSIR(~ cox(AGE) + household, data = simepi) cbind(true = c(0.05, -4, 0.1), est = coef(fitsimepi), confint(fitsimepi)) }
A twinstim
model as described in Meyer et al. (2012) is fitted to
marked spatio-temporal point process data. This constitutes a
regression approach for conditional intensity function modelling.
The implementation is illustrated in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
twinstim(endemic, epidemic, siaf, tiaf, qmatrix = data$qmatrix, data, subset, t0 = data$stgrid$start[1], T = tail(data$stgrid$stop,1), na.action = na.fail, start = NULL, partial = FALSE, epilink = "log", control.siaf = list(F = list(), Deriv = list()), optim.args = list(), finetune = FALSE, model = FALSE, cumCIF = FALSE, cumCIF.pb = interactive(), cores = 1, verbose = TRUE)
twinstim(endemic, epidemic, siaf, tiaf, qmatrix = data$qmatrix, data, subset, t0 = data$stgrid$start[1], T = tail(data$stgrid$stop,1), na.action = na.fail, start = NULL, partial = FALSE, epilink = "log", control.siaf = list(F = list(), Deriv = list()), optim.args = list(), finetune = FALSE, model = FALSE, cumCIF = FALSE, cumCIF.pb = interactive(), cores = 1, verbose = TRUE)
endemic |
right-hand side formula for the exponential (Cox-like
multiplicative) endemic component. May contain offsets (to be marked
by the special function |
epidemic |
formula representing the epidemic model for the event-specific
covariates (marks) determining infectivity. Offsets are not
implemented here. If omitted or |
siaf |
spatial interaction function. Possible specifications are:
If you run into “false convergence” with a non-constant
|
tiaf |
temporal interaction function. Possible specifications are:
|
qmatrix |
square indicator matrix (0/1 or |
data |
an object of class |
subset |
an optional vector evaluating to logical indicating a subset of
|
t0 , T
|
events having occurred during (-Inf;t0] are regarded as part of the
prehistory |
na.action |
how to deal with missing values in |
start |
a named vector of initial values for (a subset of) the parameters.
The names must conform to the conventions of Alternatively, |
partial |
logical indicating if a partial likelihood similar to the approach
by Diggle et al. (2010) should be used (default is |
epilink |
a character string determining the link function to be used for the
|
control.siaf |
a list with elements |
optim.args |
an argument list passed to Initial values for the parameters may be given as list element
Note that For the There may be an extra component Importantly, the Similarly, |
finetune |
logical indicating if a second maximisation should be performed with
robust Nelder-Mead |
model |
logical indicating if the model environment should be kept with the
result, which is required for
|
cumCIF |
logical (default: |
cumCIF.pb |
logical indicating if a progress bar should be shown
during the calculation of |
cores |
number of processes to use in parallel operation. By default
|
verbose |
logical indicating if information should be printed during
execution. Defaults to |
The function performs maximum likelihood inference
for the additive-multiplicative spatio-temporal intensity model
described in Meyer et al. (2012). It uses nlminb
as the
default optimizer and returns an object of class "twinstim"
.
Such objects have print
, plot
and
summary
methods.
The summary
output can be converted via corresponding
xtable
or
toLatex
methods.
Furthermore, the usual accessor methods are implemented, including
coef
, vcov
, logLik
,
residuals
, and
update
.
Additional functionality is provided by the R0
and
simulate
methods.
Returns an S3 object of class "twinstim"
, which is a list with
the following components:
coefficients |
vector containing the MLE. |
loglik |
value of the log-likelihood function at the MLE with a
logical attribute |
counts |
number of log-likelihood and score evaluations during optimization. |
converged |
either |
fisherinfo |
expected Fisher information evaluated at the
MLE. Only non- |
fisherinfo.observed |
observed Fisher information matrix
evaluated at the value of the MLE. Obtained as the negative Hessian.
Only non- |
fitted |
fitted values of the conditional intensity function at the events. |
fittedComponents |
two-column matrix with columns |
tau |
fitted cumulative ground intensities at the event times.
Only non- |
R0 |
estimated basic reproduction number for each event. This equals the spatio-temporal integral of the epidemic intensity over the observation domain (t0;T] x W for each event. |
npars |
vector describing the lengths of the 5 parameter
subvectors: endemic intercept(s) |
qmatrix |
the |
bbox |
the bounding box of |
timeRange |
the time range used for fitting: |
formula |
a list containing the four main parts of the model
specification: |
xlevels |
a record of the levels of the factors used in fitting. |
control.siaf |
see the “Arguments” section above. |
optim.args |
input optimizer arguments used to determine the MLE. |
functions |
if |
call |
the matched call. |
runtime |
the |
If model=TRUE
, the model evaluation environment is assigned to
this list and can thus be queried by calling environment()
on
the result.
twinstim
makes use of the memoise package if it is
available – and that is highly recommended for non-constant
siaf
specifications to speed up calculations. Specifically, the
necessary numerical integrations of the spatial interaction function
will be cached such that they are only calculated once for every
state of the siaf
parameters during optimization.
Sebastian Meyer
Contributions to this documentation by Michael Höhle and Mayeul Kauffmann.
Diggle, P. J., Kaimi, I. & Abellana, R. (2010): Partial-likelihood analysis of spatio-temporal point-process data. Biometrics, 66, 347-354.
Martinussen, T. and Scheike, T. H. (2006): Dynamic Regression Models for Survival Data. Springer.
Meyer, S. (2010):
Spatio-Temporal Infectious Disease Epidemiology based on Point Processes.
Master's Thesis, Ludwig-Maximilians-Universität
München.
Available as https://epub.ub.uni-muenchen.de/11703/
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
Rathbun, S. L. (1996): Asymptotic properties of the maximum likelihood estimator for spatio-temporal point processes. Journal of Statistical Planning and Inference, 51, 55-74.
vignette("twinstim")
.
There is a simulate.twinstim
method,
which simulates the point process based on the fitted twinstim
.
A discrete-space alternative is offered by the twinSIR
modelling framework.
# Load invasive meningococcal disease data data("imdepi") ### first, fit a simple endemic-only model m_noepi <- twinstim( endemic = addSeason2formula(~ offset(log(popdensity)) + I(start/365-3.5), S=1, period=365, timevar="start"), data = imdepi, subset = !is.na(agegrp) ) ## look at the model summary summary(m_noepi) ## there is no evidence for a type-dependent endemic intercept (LR test) m_noepi_type <- update(m_noepi, endemic = ~(1|type) + .) pchisq(2*c(logLik(m_noepi_type)-logLik(m_noepi)), df=1, lower.tail=FALSE) ### add an epidemic component with just the intercept, i.e. ### assuming uniform dispersal in time and space up to a distance of ### eps.s = 200 km and eps.t = 30 days (see summary(imdepi)) m0 <- update(m_noepi, epidemic=~1, model=TRUE) ## summarize the model fit summary(m0, correlation = TRUE, symbolic.cor = TRUE) ## the default confint-method can be used for Wald-CI's confint(m0, level=0.95) ## same "untrimmed" R0 for every event (simple epidemic intercept model) summary(R0(m0, trimmed=FALSE)) ## plot the path of the fitted total intensity plot(m0, "total intensity", tgrid=500) if (surveillance.options("allExamples")) { ## extract "residual process" integrating over space (takes some seconds) res <- residuals(m0) # if the model describes the true CIF well _in the temporal dimension_, # then this residual process should behave like a stationary Poisson # process with intensity 1 plot(res, type="l"); abline(h=c(0, length(res)), lty=2) # easier, with CI and serial correlation: checkResidualProcess(m0) } ## Not run: ## NB: in contrast to nlminb(), optim's BFGS would miss the ## likelihood maximum wrt the epidemic intercept m0_BFGS <- update(m_noepi, epidemic=~1, optim.args = list(method="BFGS")) format(cbind(nlminb=coef(m0), BFGS=coef(m0_BFGS)), digits=3, scientific=FALSE) m0_BFGS$fisherinfo # singular Fisher information matrix here m0$fisherinfo logLik(m0_BFGS) logLik(m0) ## nlminb is more powerful since we make use of the analytical fisherinfo ## as estimated by the model during optimization, which optim cannot ## End(Not run) ### an epidemic-only model? ## for a purely epidemic model, all events must have potential source events ## (otherwise the intensity at the observed event would be 0) ## let's focus on the C-type for this example imdepiC <- subset(imdepi, type == "C") table(summary(imdepiC)$nSources) ## 106 events have no prior, close events (in terms of eps.s and eps.t) try(twinstim(epidemic = ~1, data = imdepiC)) # detects this problem ## let's assume spatially unbounded interaction imdepiC_infeps <- update(imdepiC, eps.s = Inf) (s <- summary(imdepiC_infeps)) table(s$nSources) ## for 11 events, there is no prior event within eps.t = 30 days ## (which is certainly true for the first event) plot(s$counter, main = "Number of infectious individuals over time (eps.t = 30)") rug(imdepiC_infeps$events$time) rug(imdepiC_infeps$events$time[s$nSources == 0], col = 2, lwd = 3) ## An endemic component would catch such events (from unobserved sources), ## otherwise a longer infectious period would need to be assumed and ## for the first event to happen, a prehistory is required (e.g., t0 = 31). ## As an example, we fit the data only until T = 638 (all events have ancestors) m_epi <- twinstim(epidemic = ~1, data = imdepiC_infeps, t0 = 31, T = 638) summary(m_epi) if (surveillance.options("allExamples")) withAutoprint({ ### full model with interaction functions (time-consuming) ## estimate an exponential temporal decay of infectivity m1_tiaf <- update(m0, tiaf=tiaf.exponential()) plot(m1_tiaf, "tiaf", scaled=FALSE) ## estimate a step function for spatial interaction summary(sourceDists <- getSourceDists(imdepi, "space")) (knots <- quantile(sourceDists, c(5,10,20,40)/100)) m1_fstep <- update(m0, siaf=knots) plot(m1_fstep, "siaf", scaled=FALSE) rug(sourceDists, ticksize=0.02) ## estimate a continuously decreasing spatial interaction function, ## here we use the kernel of an isotropic bivariate Gaussian m1 <- update(m0, siaf = siaf.gaussian()) AIC(m_noepi, m0, m1_fstep, m1) summary(m1) # e.siaf.1 is log(sigma), no test for H0: log(sigma) = 0 exp(confint(m1, "e.siaf.1")) # a confidence interval for sigma plot(m1, "siaf", scaled=FALSE) ## alternative: siaf.powerlaw() with eps.s=Inf and untie()d data, ## see vignette("twinstim") ## add epidemic covariates m2 <- update(m1, epidemic = ~ 1 + type + agegrp) AIC(m1, m2) # further improvement summary(m2) ## look at estimated R0 values by event type tapply(R0(m2), imdepi$events@data[names(R0(m2)), "type"], summary) })
# Load invasive meningococcal disease data data("imdepi") ### first, fit a simple endemic-only model m_noepi <- twinstim( endemic = addSeason2formula(~ offset(log(popdensity)) + I(start/365-3.5), S=1, period=365, timevar="start"), data = imdepi, subset = !is.na(agegrp) ) ## look at the model summary summary(m_noepi) ## there is no evidence for a type-dependent endemic intercept (LR test) m_noepi_type <- update(m_noepi, endemic = ~(1|type) + .) pchisq(2*c(logLik(m_noepi_type)-logLik(m_noepi)), df=1, lower.tail=FALSE) ### add an epidemic component with just the intercept, i.e. ### assuming uniform dispersal in time and space up to a distance of ### eps.s = 200 km and eps.t = 30 days (see summary(imdepi)) m0 <- update(m_noepi, epidemic=~1, model=TRUE) ## summarize the model fit summary(m0, correlation = TRUE, symbolic.cor = TRUE) ## the default confint-method can be used for Wald-CI's confint(m0, level=0.95) ## same "untrimmed" R0 for every event (simple epidemic intercept model) summary(R0(m0, trimmed=FALSE)) ## plot the path of the fitted total intensity plot(m0, "total intensity", tgrid=500) if (surveillance.options("allExamples")) { ## extract "residual process" integrating over space (takes some seconds) res <- residuals(m0) # if the model describes the true CIF well _in the temporal dimension_, # then this residual process should behave like a stationary Poisson # process with intensity 1 plot(res, type="l"); abline(h=c(0, length(res)), lty=2) # easier, with CI and serial correlation: checkResidualProcess(m0) } ## Not run: ## NB: in contrast to nlminb(), optim's BFGS would miss the ## likelihood maximum wrt the epidemic intercept m0_BFGS <- update(m_noepi, epidemic=~1, optim.args = list(method="BFGS")) format(cbind(nlminb=coef(m0), BFGS=coef(m0_BFGS)), digits=3, scientific=FALSE) m0_BFGS$fisherinfo # singular Fisher information matrix here m0$fisherinfo logLik(m0_BFGS) logLik(m0) ## nlminb is more powerful since we make use of the analytical fisherinfo ## as estimated by the model during optimization, which optim cannot ## End(Not run) ### an epidemic-only model? ## for a purely epidemic model, all events must have potential source events ## (otherwise the intensity at the observed event would be 0) ## let's focus on the C-type for this example imdepiC <- subset(imdepi, type == "C") table(summary(imdepiC)$nSources) ## 106 events have no prior, close events (in terms of eps.s and eps.t) try(twinstim(epidemic = ~1, data = imdepiC)) # detects this problem ## let's assume spatially unbounded interaction imdepiC_infeps <- update(imdepiC, eps.s = Inf) (s <- summary(imdepiC_infeps)) table(s$nSources) ## for 11 events, there is no prior event within eps.t = 30 days ## (which is certainly true for the first event) plot(s$counter, main = "Number of infectious individuals over time (eps.t = 30)") rug(imdepiC_infeps$events$time) rug(imdepiC_infeps$events$time[s$nSources == 0], col = 2, lwd = 3) ## An endemic component would catch such events (from unobserved sources), ## otherwise a longer infectious period would need to be assumed and ## for the first event to happen, a prehistory is required (e.g., t0 = 31). ## As an example, we fit the data only until T = 638 (all events have ancestors) m_epi <- twinstim(epidemic = ~1, data = imdepiC_infeps, t0 = 31, T = 638) summary(m_epi) if (surveillance.options("allExamples")) withAutoprint({ ### full model with interaction functions (time-consuming) ## estimate an exponential temporal decay of infectivity m1_tiaf <- update(m0, tiaf=tiaf.exponential()) plot(m1_tiaf, "tiaf", scaled=FALSE) ## estimate a step function for spatial interaction summary(sourceDists <- getSourceDists(imdepi, "space")) (knots <- quantile(sourceDists, c(5,10,20,40)/100)) m1_fstep <- update(m0, siaf=knots) plot(m1_fstep, "siaf", scaled=FALSE) rug(sourceDists, ticksize=0.02) ## estimate a continuously decreasing spatial interaction function, ## here we use the kernel of an isotropic bivariate Gaussian m1 <- update(m0, siaf = siaf.gaussian()) AIC(m_noepi, m0, m1_fstep, m1) summary(m1) # e.siaf.1 is log(sigma), no test for H0: log(sigma) = 0 exp(confint(m1, "e.siaf.1")) # a confidence interval for sigma plot(m1, "siaf", scaled=FALSE) ## alternative: siaf.powerlaw() with eps.s=Inf and untie()d data, ## see vignette("twinstim") ## add epidemic covariates m2 <- update(m1, epidemic = ~ 1 + type + agegrp) AIC(m1, m2) # further improvement summary(m2) ## look at estimated R0 values by event type tapply(R0(m2), imdepi$events@data[names(R0(m2)), "type"], summary) })
"twinstim"
The function epitest
takes a "twinstim"
model
and tests if the spatio-temporal interaction invoked by the epidemic
model component is statistically significant.
The test only works for simple epidemic models, where epidemic = ~1
(no additional parameters for event-specific infectivity),
and requires the non-canonical epilink="identity"
(see
twinstim
).
A permutation test is performed by default, which is only valid if the
endemic intensity is space-time separable.
The approach is described in detail in Meyer et al. (2016),
where it is also compared to alternative global tests for clustering
such as the knox
test.
epitest(model, data, tiles, method = "time", B = 199, eps.s = NULL, eps.t = NULL, fixed = NULL, verbose = TRUE, compress = FALSE, ...) ## S3 method for class 'epitest' coef(object, which = c("m1", "m0"), ...) ## S3 method for class 'epitest' plot(x, teststat = c("simpleR0", "D"), ...)
epitest(model, data, tiles, method = "time", B = 199, eps.s = NULL, eps.t = NULL, fixed = NULL, verbose = TRUE, compress = FALSE, ...) ## S3 method for class 'epitest' coef(object, which = c("m1", "m0"), ...) ## S3 method for class 'epitest' plot(x, teststat = c("simpleR0", "D"), ...)
model |
a simple epidemic |
data |
an object of class |
tiles |
(only used by |
method |
one of the following character strings specifying the test method:
|
B |
the number of permutations for the Monte Carlo approach.
The default number is rather low; if computationally feasible,
|
eps.s , eps.t
|
arguments for |
fixed |
optional character vector naming parameters to fix at their original
value when re-fitting the |
verbose |
the amount of tracing in the range |
compress |
logical indicating if the |
... |
further arguments for |
object , x
|
an object of class |
which |
a character string indicating either the full ( |
teststat |
a character string determining the test statistic to plot, either
|
This space-time interaction test is limited to models with
epidemic = ~1
, since covariate effects are not identifiable
under the null hypothesis of no space-time interaction.
Estimating a rich epidemic model
based on permuted data
will most likely result in singular convergence.
A similar issue might arise when the model employs parametric
interaction functions, in which case fixed=TRUE
can be used.
For further details see Meyer et al. (2016).
The test statistic is the reproduction number simpleR0
.
A likelihood ratio test of the supplied epidemic model against
the corresponding endemic-only model is also available.
By default, the null distribution of the test statistic under no
space-time interaction is obtained by a Monte Carlo permutation
approach (via permute.epidataCS
) and therefore relies on
a space-time separable endemic model component.
The plot
-method shows a truehist
of
the simulated null distribution together with the observed value.
The coef
-method extracts the parameter estimates from the B
permfits
(by default for the full model which = "m1"
).
a list (inheriting from "htest"
) with the following components:
method |
a character string indicating the type of test performed. |
data.name |
a character string giving the supplied |
statistic |
the observed test statistic. |
parameter |
the (effective) number of permutations used to calculate the p-value (only those with convergent fits are used). |
p.value |
the p-value for the test. For the |
In addition, if method != "LRT"
, the result will have the
following elements:
permfits |
the list of model fits (endemic-only and epidemic)
from the |
permstats |
a data frame with |
The plot
-method invisibly returns NULL
.
The coef
-method returns the B
x length(coef(model))
matrix of parameter estimates.
Sebastian Meyer
Meyer, S., Warnke, I., Rössler, W. and Held, L. (2016): Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-temporal Epidemiology, 17, 15-25. doi:10.1016/j.sste.2016.03.002. Eprint: https://arxiv.org/abs/1512.09052.
data("imdepi", "imdepifit") ## test for space-time interaction of the B-cases ## assuming spatial interaction to be constant within 50 km imdepiB50 <- update(subset(imdepi, type == "B"), eps.s = 50) imdfitB50 <- update(imdepifit, data = imdepiB50, subset = NULL, epidemic = ~1, epilink = "identity", siaf = NULL, start = c("e.(Intercept)" = 0)) ## simple likelihood ratio test epitest(imdfitB50, imdepiB50, method = "LRT") ## permutation test et <- epitest(imdfitB50, imdepiB50, B = 5, # CAVE: limited here for speed verbose = 2, # (tracing does not work on Windows .seed = 1, .parallel = 1) # if parallelized) et plot(et) ## summary of parameter estimates under permutation summary(coef(et, which = "m1"))
data("imdepi", "imdepifit") ## test for space-time interaction of the B-cases ## assuming spatial interaction to be constant within 50 km imdepiB50 <- update(subset(imdepi, type == "B"), eps.s = 50) imdfitB50 <- update(imdepifit, data = imdepiB50, subset = NULL, epidemic = ~1, epilink = "identity", siaf = NULL, start = c("e.(Intercept)" = 0)) ## simple likelihood ratio test epitest(imdfitB50, imdepiB50, method = "LRT") ## permutation test et <- epitest(imdfitB50, imdepiB50, B = 5, # CAVE: limited here for speed verbose = 2, # (tracing does not work on Windows .seed = 1, .parallel = 1) # if parallelized) et plot(et) ## summary of parameter estimates under permutation summary(coef(et, which = "m1"))
twinstim
A twinstim
model as described in Meyer et al. (2012) requires
the specification of the spatial and temporal interaction functions
( and
, respectively),
i.e. how infectivity decays with increasing spatial and temporal
distance from the source of infection.
Own such functions can be specified (see
siaf
and tiaf
, respectively), but the
package already predefines some common dispersal kernels returned by
the constructor functions documented here.
See Meyer and Held (2014) for various spatial interaction functions,
and Meyer et al. (2017, Section 3, available as vignette("twinstim")
)
for an illustration of the implementation.
# predefined spatial interaction functions siaf.constant() siaf.step(knots, maxRange = Inf, nTypes = 1, validpars = NULL) siaf.gaussian(nTypes = 1, logsd = TRUE, density = FALSE, F.adaptive = FALSE, F.method = "iso", effRangeMult = 6, validpars = NULL) siaf.exponential(nTypes = 1, validpars = NULL, engine = "C") siaf.powerlaw(nTypes = 1, validpars = NULL, engine = "C") siaf.powerlaw1(nTypes = 1, validpars = NULL, sigma = 1) siaf.powerlawL(nTypes = 1, validpars = NULL, engine = "C") siaf.student(nTypes = 1, validpars = NULL, engine = "C") # predefined temporal interaction functions tiaf.constant() tiaf.step(knots, maxRange = Inf, nTypes = 1, validpars = NULL) tiaf.exponential(nTypes = 1, validpars = NULL)
# predefined spatial interaction functions siaf.constant() siaf.step(knots, maxRange = Inf, nTypes = 1, validpars = NULL) siaf.gaussian(nTypes = 1, logsd = TRUE, density = FALSE, F.adaptive = FALSE, F.method = "iso", effRangeMult = 6, validpars = NULL) siaf.exponential(nTypes = 1, validpars = NULL, engine = "C") siaf.powerlaw(nTypes = 1, validpars = NULL, engine = "C") siaf.powerlaw1(nTypes = 1, validpars = NULL, sigma = 1) siaf.powerlawL(nTypes = 1, validpars = NULL, engine = "C") siaf.student(nTypes = 1, validpars = NULL, engine = "C") # predefined temporal interaction functions tiaf.constant() tiaf.step(knots, maxRange = Inf, nTypes = 1, validpars = NULL) tiaf.exponential(nTypes = 1, validpars = NULL)
knots |
numeric vector of distances at which the step function
switches to a new height. The length of this vector determines the
number of parameters to estimate. For identifiability, the step
function has height 1 in the first interval |
maxRange |
a scalar larger than any of |
nTypes |
determines the number of parameters ((log-)scales or (log-)shapes)
of the kernels. In a multitype epidemic, the different types may
share the same spatial interaction function, in which case
|
logsd , density
|
logicals affecting the parametrization of the Gaussian kernel.
Settings different from the defaults are deprecated.
The default is to use only the kernel of the bivariate, isotropic
normal distribution ( |
F.adaptive , F.method
|
If |
effRangeMult |
determines the effective range for numerical integration
in terms of multiples of the standard deviation |
validpars |
function taking one argument, the parameter vector, indicating if it
is valid (see also |
engine |
character string specifying the implementation to use.
Prior to surveillance 0.14.0, the |
sigma |
Fixed value of |
Evaluation of twinstim
's likelihood involves cubature of the
spatial interaction function over polygonal domains. Various
approaches have been compared by Meyer (2010, Section 3.2) and a new
efficient method, which takes advantage of the assumed isotropy, has
been proposed by Meyer and Held (2014, Supplement B, Section 2) for
evaluation of the power-law kernels.
These cubature methods are available in the dedicated R package
polyCub and used by the kernels implemented in surveillance.
The readily available spatial interaction functions are defined as follows:
siaf.constant
:
siaf.step
:,
where , and
are
the parameters (heights) to estimate.
indicates
if distance
belongs to the
th interval
according to
c(0,knots,maxRange)
, where indicates
the interval
c(0,knots[1])
.
Note that siaf.step
makes use of the memoise package
if it is available – and that is highly recommended to speed up
calculations. Specifically, the areas of the intersection of a
polygonal domain (influence region) with the “rings” of the
two-dimensional step function will be cached such that they are
only calculated once for every polydomain
(in the first
iteration of the twinstim
optimization). They are used in
the integration components F
and Deriv
.
See Meyer and Held (2014) for a use case and further details.
siaf.gaussian
:
If nTypes=1
(single-type epidemic or type-invariant
siaf
in multi-type epidemic), then
for all types
.
If
density=TRUE
(deprecated), then the kernel formula above is
additionally divided by , yielding the
density of the bivariate, isotropic Gaussian distribution with
zero mean and covariance matrix
.
The standard deviation is optimized on the log-scale
(
logsd = TRUE
, not doing so is deprecated).
siaf.exponential
:
The scale parameter is estimated on the log-scale,
i.e.,
, and
is the actual model parameter.
siaf.powerlaw
:
The parameters are optimized on the log-scale to ensure positivity, i.e.,
and
,
where
is the parameter vector.
If a power-law kernel is not identifiable for the dataset at hand,
the exponential kernel or a lagged power law are useful alternatives.
siaf.powerlaw1
:,
i.e., siaf.powerlaw
with fixed .
A different fixed value for
can be specified via the
sigma
argument of siaf.powerlaw1
.
The decay parameter is estimated on the log-scale.
siaf.powerlawL
:, for
, and
otherwise,
which is a Lagged power-law kernel featuring uniform
short-range dispersal (up to distance ) and a
power-law decay (Pareto-style) from distance
onwards.
The parameters are optimized on the log-scale to ensure positivity, i.e.
and
,
where
is the parameter vector.
However, there is a caveat associated with this kernel: Its
derivative wrt
is mathematically undefined at
the threshold
. This local non-differentiability
makes
twinstim
's likelihood maximization sensitive wrt
parameter start values, and is likely to cause false convergence
warnings by nlminb
. Possible workarounds are to use
the slow and robust method="Nelder-Mead"
, or to just ignore
the warning and verify the result by sets of different start values.
siaf.student
:,
which is a reparametrized -kernel.
For
, this is the kernel of the Cauchy density with scale
sigma
. In Geostatistics, a correlation function of this
kind is known as the Cauchy model.
The parameters are optimized on the log-scale to ensure
positivity, i.e. and
, where
is the parameter vector.
The predefined temporal interaction functions are defined as follows:
tiaf.constant
:
tiaf.step
:,
where , and
are
the parameters (heights) to estimate.
indicates
if
belongs to the
th interval
according to
c(0,knots,maxRange)
, where indicates
the interval
c(0,knots[1])
.
tiaf.exponential
:,
which is the kernel of the exponential distribution.
If nTypes=1
(single-type epidemic or type-invariant
tiaf
in multi-type epidemic), then
for all types
.
The specification of an interaction function, which is a list.
See siaf
and tiaf
, respectively, for a
description of its components.
Sebastian Meyer
Meyer, S. (2010):
Spatio-Temporal Infectious Disease Epidemiology based on Point Processes.
Master's Thesis, Ludwig-Maximilians-Universität
München.
Available as https://epub.ub.uni-muenchen.de/11703/
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
twinstim
, siaf
, tiaf
,
and package polyCub for the involved cubature methods.
# constant temporal dispersal tiaf.constant() # step function kernel tiaf.step(c(3,7), maxRange=14, nTypes=2) # exponential temporal decay tiaf.exponential() # Type-dependent Gaussian spatial interaction function using an adaptive # two-dimensional midpoint-rule to integrate it over polygonal domains siaf.gaussian(2, F.adaptive=TRUE) # Single-type Gaussian spatial interaction function (using polyCub.iso) siaf.gaussian() # Exponential kernel siaf.exponential() # Power-law kernel siaf.powerlaw() # Power-law kernel with fixed sigma = 1 siaf.powerlaw1() # "lagged" power-law siaf.powerlawL() # (reparametrized) t-kernel siaf.student() # step function kernel siaf.step(c(10,20,50), maxRange=100)
# constant temporal dispersal tiaf.constant() # step function kernel tiaf.step(c(3,7), maxRange=14, nTypes=2) # exponential temporal decay tiaf.exponential() # Type-dependent Gaussian spatial interaction function using an adaptive # two-dimensional midpoint-rule to integrate it over polygonal domains siaf.gaussian(2, F.adaptive=TRUE) # Single-type Gaussian spatial interaction function (using polyCub.iso) siaf.gaussian() # Exponential kernel siaf.exponential() # Power-law kernel siaf.powerlaw() # Power-law kernel with fixed sigma = 1 siaf.powerlaw1() # "lagged" power-law siaf.powerlawL() # (reparametrized) t-kernel siaf.student() # step function kernel siaf.step(c(10,20,50), maxRange=100)
twimstim
The function plots the fitted temporal or (isotropic) spatial
interaction function of a twinstim
object.
The implementation is illustrated in Meyer et al. (2017, Section 3),
see vignette("twinstim")
.
iafplot(object, which = c("siaf", "tiaf"), types = NULL, scaled = c("intercept", "standardized", "no"), truncated = FALSE, log = "", conf.type = if (length(pars) > 1) "MC" else "parbounds", conf.level = 0.95, conf.B = 999, xgrid = 101, col.estimate = rainbow(length(types)), col.conf = col.estimate, alpha.B = 0.15, lwd = c(3,1), lty = c(1,2), verticals = FALSE, do.points = FALSE, add = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, legend = !add && (length(types) > 1), ...)
iafplot(object, which = c("siaf", "tiaf"), types = NULL, scaled = c("intercept", "standardized", "no"), truncated = FALSE, log = "", conf.type = if (length(pars) > 1) "MC" else "parbounds", conf.level = 0.95, conf.B = 999, xgrid = 101, col.estimate = rainbow(length(types)), col.conf = col.estimate, alpha.B = 0.15, lwd = c(3,1), lty = c(1,2), verticals = FALSE, do.points = FALSE, add = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, legend = !add && (length(types) > 1), ...)
object |
object of class |
which |
argument indicating which of the two interaction functions to plot.
Possible values are |
types |
integer vector indicating for which event |
scaled |
character string determining if/how the the interaction function should be scaled. Possible choices are:
The first one is the default and required for the comparison of
estimated interaction functions from different models.
For backward compatibility, |
truncated |
logical indicating if the plotted interaction function should
take the maximum range of interaction ( |
log |
a character string passed to |
conf.type |
type of confidence interval to produce. |
conf.level |
the confidence level required. For |
conf.B |
number of samples for the |
xgrid |
either a numeric vector of x-values (distances from the host) where
to evaluate |
col.estimate |
vector of colours to use for the function point estimates of the different |
col.conf |
vector of colours to use for the confidence intervals of the different |
alpha.B |
alpha transparency value (as relative opacity) used for the |
lwd , lty
|
numeric vectors of length two specifying the line width and type of point estimates (first element) and confidence limits (second element), respectively. |
verticals , do.points
|
graphical settings for step function
kernels. These can be logical (as in |
add |
add to an existing plot? |
xlim , ylim
|
vectors of length two containing the x- and y-axis limit of the
plot. The default y-axis range (
|
xlab , ylab
|
labels for the axes with |
legend |
logical indicating if a legend for the |
... |
additional arguments passed to the default |
A plot is created – see e.g. Figure 3(b) in Meyer et al. (2012).
The function invisibly returns a matrix of the plotted values of the
interaction function (evaluated on xgrid
, by type). The first
column of the matrix contains the distance , and the remaining
length(types)
columns contain the (scaled) function values for
each type.
The pointwise confidence intervals of the interaction functions are
returned in similar matrices as attributes: if
length(types)==1
, there is a single attribute "CI"
,
whereas for multiple types, the attributes are named
paste0("CI.",typeNames)
(where the typeNames
are
retrieved from object$qmatrix
).
Sebastian Meyer
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
plot.twinstim
, which calls this function.
data("imdepifit") iafplot(imdepifit, "tiaf", scaled=FALSE) # tiaf.constant(), not very exciting iafplot(imdepifit, "siaf", scaled=FALSE) # scaled version uses a Monte-Carlo-CI set.seed(1) # result depends on .Random.seed iafplot(imdepifit, "siaf", scaled=TRUE, conf.type="MC", conf.B=199, col.conf=gray(0.4), conf.level=NA) # show MC samples
data("imdepifit") iafplot(imdepifit, "tiaf", scaled=FALSE) # tiaf.constant(), not very exciting iafplot(imdepifit, "siaf", scaled=FALSE) # scaled version uses a Monte-Carlo-CI set.seed(1) # result depends on .Random.seed iafplot(imdepifit, "siaf", scaled=TRUE, conf.type="MC", conf.B=199, col.conf=gray(0.4), conf.level=NA) # show MC samples
intensityplot
method to plot the evolution of the total infection
intensity, its epidemic proportion or its endemic proportion over time
or space (integrated over the other dimension) of fitted
twinstim
models (or simEpidataCS
).
The "simEpidataCS"
-method is just a wrapper around
intensityplot.twinstim
by making the "simEpidataCS"
object
"twinstim"
-compatible, i.e. enriching it by the
required model components and environment.
The intensity.twinstim
auxiliary function returns functions which
calculate the endemic or epidemic intensity at a specific time point or
location (integrated over the other dimension).
## S3 method for class 'twinstim' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = c("time", "space"), types = 1:nrow(x$qmatrix), tiles, tiles.idcol = NULL, plot = TRUE, add = FALSE, tgrid = 101, rug.opts = list(), sgrid = 128, polygons.args = list(), points.args = list(), cex.fun = sqrt, ...) ## S3 method for class 'simEpidataCS' intensityplot(x, ...) intensity.twinstim(x, aggregate = c("time", "space"), types = 1:nrow(x$qmatrix), tiles, tiles.idcol = NULL)
## S3 method for class 'twinstim' intensityplot(x, which = c("epidemic proportion", "endemic proportion", "total intensity"), aggregate = c("time", "space"), types = 1:nrow(x$qmatrix), tiles, tiles.idcol = NULL, plot = TRUE, add = FALSE, tgrid = 101, rug.opts = list(), sgrid = 128, polygons.args = list(), points.args = list(), cex.fun = sqrt, ...) ## S3 method for class 'simEpidataCS' intensityplot(x, ...) intensity.twinstim(x, aggregate = c("time", "space"), types = 1:nrow(x$qmatrix), tiles, tiles.idcol = NULL)
x |
an object of class |
which |
|
aggregate |
One of |
types |
event types to aggregate. By default, all types of events are aggregated, but one could also be interested in only one specific type or a subset of event types. |
tiles |
object of class |
tiles.idcol |
either a column index for |
plot |
logical indicating if a plot is desired, which defaults to |
add |
logical. If |
tgrid |
either a numeric vector of time points when to evaluate
|
rug.opts |
if a list, its elements are passed as arguments to the function
|
sgrid |
either an object of class |
polygons.args |
if a list, its elements are passed as arguments to
|
points.args |
if a list, its elements are passed as arguments to
|
cex.fun |
function which takes a vector of counts of events
at each unique location and returns a (vector of) |
... |
further arguments passed to |
If plot = FALSE
or aggregate = "time"
,
a function is returned, which takes a vector of
time points (if aggregate = "time"
) or a matrix of coordinates
(if aggregate = "space"
), and returns which
on this grid.
intensity.twinstim
returns a list containing such functions for
the endemic and epidemic intensity (but these are not vectorized).
If plot = TRUE
and aggregate = "space"
, the
trellis.object
of the spatial plot is returned.
Sebastian Meyer
plot.twinstim
, which calls intensityplot.twinstim
.
data("imdepi", "imdepifit") # for the intensityplot we need the model environment, which can be # easily added by the intelligent update method (no need to refit the model) imdepifit <- update(imdepifit, model=TRUE) ## path of the total intensity opar <- par(mfrow=c(2,1)) intensityplot(imdepifit, which="total intensity", aggregate="time", tgrid=500) plot(imdepi, "time", breaks=100) par(opar) ## time course of the epidemic proportion by event intensityplot(imdepifit, which="epidemic proportion", aggregate="time", tgrid=500, types=1) intensityplot(imdepifit, which="epidemic proportion", aggregate="time", tgrid=500, types=2, add=TRUE, col=2) legend("topright", legend=levels(imdepi$events$type), lty=1, col=1:2, title = "event type") ## endemic and total intensity in one plot intensity_endprop <- intensityplot(imdepifit, which="endemic proportion", aggregate="time", plot=FALSE) intensity_total <- intensityplot(imdepifit, which="total intensity", aggregate="time", tgrid=501, lwd=2) curve(intensity_endprop(x) * intensity_total(x), add=TRUE, col=2, lwd=2, n=501) text(2500, 0.36, labels="total", col=1, pos=2, font=2) text(2500, 0.08, labels="endemic", col=2, pos=2, font=2) ## spatial shape of the intensity (aggregated over time) # need a map of the 'stgrid' tiles, here Germany's districts load(system.file("shapes", "districtsD.RData", package="surveillance")) # total intensity (using a rather sparse 'sgrid' for speed) intensityplot(imdepifit, which="total intensity", aggregate="space", tiles=districtsD, sgrid=500, col.regions=rev(heat.colors(100))) if (surveillance.options("allExamples")) { # epidemic proportion by type maps_epiprop <- lapply(1:2, function (type) { intensityplot(imdepifit, which="epidemic", aggregate="space", types=type, tiles=districtsD, sgrid=1000, main=rownames(imdepifit$qmatrix)[type], scales=list(draw=FALSE), at=seq(0,1,by=0.1), col.regions=rev(hcl.colors(10,"YlOrRd")), colorkey=list(title=list("Epidemic proportion", cex=1))) }) plot(maps_epiprop[[1]], split=c(1,1,2,1), more=TRUE) plot(maps_epiprop[[2]], split=c(2,1,2,1)) }
data("imdepi", "imdepifit") # for the intensityplot we need the model environment, which can be # easily added by the intelligent update method (no need to refit the model) imdepifit <- update(imdepifit, model=TRUE) ## path of the total intensity opar <- par(mfrow=c(2,1)) intensityplot(imdepifit, which="total intensity", aggregate="time", tgrid=500) plot(imdepi, "time", breaks=100) par(opar) ## time course of the epidemic proportion by event intensityplot(imdepifit, which="epidemic proportion", aggregate="time", tgrid=500, types=1) intensityplot(imdepifit, which="epidemic proportion", aggregate="time", tgrid=500, types=2, add=TRUE, col=2) legend("topright", legend=levels(imdepi$events$type), lty=1, col=1:2, title = "event type") ## endemic and total intensity in one plot intensity_endprop <- intensityplot(imdepifit, which="endemic proportion", aggregate="time", plot=FALSE) intensity_total <- intensityplot(imdepifit, which="total intensity", aggregate="time", tgrid=501, lwd=2) curve(intensity_endprop(x) * intensity_total(x), add=TRUE, col=2, lwd=2, n=501) text(2500, 0.36, labels="total", col=1, pos=2, font=2) text(2500, 0.08, labels="endemic", col=2, pos=2, font=2) ## spatial shape of the intensity (aggregated over time) # need a map of the 'stgrid' tiles, here Germany's districts load(system.file("shapes", "districtsD.RData", package="surveillance")) # total intensity (using a rather sparse 'sgrid' for speed) intensityplot(imdepifit, which="total intensity", aggregate="space", tiles=districtsD, sgrid=500, col.regions=rev(heat.colors(100))) if (surveillance.options("allExamples")) { # epidemic proportion by type maps_epiprop <- lapply(1:2, function (type) { intensityplot(imdepifit, which="epidemic", aggregate="space", types=type, tiles=districtsD, sgrid=1000, main=rownames(imdepifit$qmatrix)[type], scales=list(draw=FALSE), at=seq(0,1,by=0.1), col.regions=rev(hcl.colors(10,"YlOrRd")), colorkey=list(title=list("Epidemic proportion", cex=1))) }) plot(maps_epiprop[[1]], split=c(1,1,2,1), more=TRUE) plot(maps_epiprop[[2]], split=c(2,1,2,1)) }
"twinstim"
Objects
Besides print
and summary
methods there
are also some standard extraction methods defined for objects of class
"twinstim"
: vcov
, logLik
, and
nobs
. This
also enables the use of, e.g., confint
and
AIC
. The model summary
can be exported to LaTeX
by the corresponding toLatex
or xtable
methods.
## S3 method for class 'twinstim' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'twinstim' summary(object, test.iaf = FALSE, correlation = FALSE, symbolic.cor = FALSE, runtime = FALSE, ...) ## S3 method for class 'twinstim' coeflist(x, ...) ## S3 method for class 'twinstim' vcov(object, ...) ## S3 method for class 'twinstim' logLik(object, ...) ## S3 method for class 'twinstim' nobs(object, ...) ## S3 method for class 'summary.twinstim' print(x, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ...) ## S3 method for class 'summary.twinstim' toLatex(object, digits = max(3, getOption("digits") - 3), eps.Pvalue = 1e-4, align = "lrrrr", booktabs = getOption("xtable.booktabs", FALSE), withAIC = FALSE, ...) ## S3 method for class 'summary.twinstim' xtable(x, caption = NULL, label = NULL, align = c("l", "r", "r", "r"), digits = 3, display = c("s", "f", "s", "s"), ..., ci.level = 0.95, ci.fmt = "%4.2f", ci.to = "--", eps.Pvalue = 1e-4)
## S3 method for class 'twinstim' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'twinstim' summary(object, test.iaf = FALSE, correlation = FALSE, symbolic.cor = FALSE, runtime = FALSE, ...) ## S3 method for class 'twinstim' coeflist(x, ...) ## S3 method for class 'twinstim' vcov(object, ...) ## S3 method for class 'twinstim' logLik(object, ...) ## S3 method for class 'twinstim' nobs(object, ...) ## S3 method for class 'summary.twinstim' print(x, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ...) ## S3 method for class 'summary.twinstim' toLatex(object, digits = max(3, getOption("digits") - 3), eps.Pvalue = 1e-4, align = "lrrrr", booktabs = getOption("xtable.booktabs", FALSE), withAIC = FALSE, ...) ## S3 method for class 'summary.twinstim' xtable(x, caption = NULL, label = NULL, align = c("l", "r", "r", "r"), digits = 3, display = c("s", "f", "s", "s"), ..., ci.level = 0.95, ci.fmt = "%4.2f", ci.to = "--", eps.Pvalue = 1e-4)
x , object
|
an object of class |
digits |
integer, used for number formatting with |
test.iaf |
logical indicating if the simple Wald z- and p-values
should be calculated for parameters of the interaction functions
|
correlation |
logical. If |
symbolic.cor |
logical. If |
runtime |
logical. If |
signif.stars |
logical. If |
eps.Pvalue |
passed to |
booktabs |
logical indicating if the |
withAIC |
logical indicating if the AIC and the log-likelihood of the model should be included below the table of coefficients in the LaTeX tabular. |
caption , label , align , display
|
see |
ci.level , ci.fmt , ci.to
|
the confidence intervals are calculated
at level |
... |
For |
The estimated coefficients and standard Wald-type confidence intervals
can be extracted using the default coef
and
confint
methods from package stats.
Note, however, that there is the useful coeflist
method to
list the coefficients by model component.
The print
and summary
methods allow the compact or comprehensive
representation of the fitting results, respectively. The former only prints
the original function call, the estimated coefficients and the maximum
log-likelihood value. The latter prints the whole coefficient matrix
with standard errors, z- and p-values (see printCoefmat
)
– separately for the endemic and the epidemic component – and
additionally the AIC, the achieved log-likelihood, the number of
log-likelihood and score evaluations, and the runtime.
They both append a big “WARNING”, if the optimization algorithm
did not converge.
The toLatex
method is essentially a
translation of the printed summary table of coefficients to LaTeX
code (using xtable). However, the xtable
method does a
different job in that it first converts coefficients to rate ratios
(RR, i.e., the exp
-transformation) and gives confidence
intervals for those instead of standard errors and z-values.
Intercepts and interaction function parameters are ignored by the
xtable
method.
The print
methods return their first argument, invisibly, as
they always should.
The vcov
method returns the estimated variance-covariance
matrix of the parameters, which is the inverse of
object$fisherinfo
(estimate of the expected Fisher
information matrix). This "fisherinfo"
is not always available
(see twinstim
), in which case
object$fisherinfo.observed
is used if available or an error is
returned otherwise.
The logLik
and nobs
methods return the maximum
log-likelihood value of the model, and the number of events (excluding
events of the prehistory), respectively.
The summary
method returns a list containing some summary
statistics of the model, which is nicely printed by the corresponding
print
method.
The toLatex
method returns a character vector of class
"Latex"
, each element containing one line of LaTeX code (see
print.Latex
).
The xtable
method returns an object of class
"xtable"
. Note that the column name of the confidence
interval, e.g. “95% CI”, contains the percent symbol that may
need to be escaped when printing the "xtable"
in the output
format (see sanitize.text.function
in
print.xtable
). This may also hold for row names.
Sebastian Meyer
# load a fit of the 'imdepi' data, see the example in ?twinstim data("imdepifit") # print method imdepifit # extract point estimates (in a single vector or listed by model component) coef(imdepifit) coeflist(imdepifit) # variance-covariance matrix of endemic parameters # (inverse of expected Fisher information) unname(vcov(imdepifit)[1:4,1:4]) # the default confint() method may be used for Wald CI's confint(imdepifit, parm="e.typeC", level=0.95) # log-likelihood and AIC of the fitted model logLik(imdepifit) AIC(imdepifit) nobs(imdepifit) # produce a summary with parameter correlations and runtime information (s <- summary(imdepifit, correlation=TRUE, symbolic.cor=TRUE, runtime=TRUE)) # create LaTeX code of coefficient table toLatex(s, digits=2) # or using the xtable-method (which produces rate ratios) xtable(s)
# load a fit of the 'imdepi' data, see the example in ?twinstim data("imdepifit") # print method imdepifit # extract point estimates (in a single vector or listed by model component) coef(imdepifit) coeflist(imdepifit) # variance-covariance matrix of endemic parameters # (inverse of expected Fisher information) unname(vcov(imdepifit)[1:4,1:4]) # the default confint() method may be used for Wald CI's confint(imdepifit, parm="e.typeC", level=0.95) # log-likelihood and AIC of the fitted model logLik(imdepifit) AIC(imdepifit) nobs(imdepifit) # produce a summary with parameter correlations and runtime information (s <- summary(imdepifit, correlation=TRUE, symbolic.cor=TRUE, runtime=TRUE)) # create LaTeX code of coefficient table toLatex(s, digits=2) # or using the xtable-method (which produces rate ratios) xtable(s)
twinstim
's
The fitted conditional intensity function from twinstim
may be visualized in at least two ways: iafplot
plots the fitted
interaction functions (as a function of the distance from the host), and
intensityplot.twinstim
plots the fitted intensity either
aggregated over space (evolution over time) or aggregated over time
(spatial surface of the cumulated intensity). The plot
method for
class "twinstim"
is just a wrapper for these two functions.
## S3 method for class 'twinstim' plot(x, which, ...)
## S3 method for class 'twinstim' plot(x, which, ...)
x |
an object of class |
which |
character. Which characteristic of the conditional intensity should
be plotted? Possible values are the ones allowed in
the functions |
... |
further arguments passed to |
See the documentation of the respective plot functions,
iafplot
or intensityplot.twinstim
.
Sebastian Meyer
# see the examples for iafplot() and intensityplot.twinstim()
# see the examples for iafplot() and intensityplot.twinstim()
twinstim
objects
Function to compute estimated and profile likelihood based confidence
intervals for twinstim
objects. Computations might be cumbersome!
WARNING: the implementation is not well tested, simply uses
optim
(ignoring optimizer settings from the original fit),
and does not return the complete set of coefficients at each grid point.
## S3 method for class 'twinstim' profile(fitted, profile, alpha = 0.05, control = list(fnscale = -1, maxit = 100, trace = 1), do.ltildeprofile=FALSE, ...)
## S3 method for class 'twinstim' profile(fitted, profile, alpha = 0.05, control = list(fnscale = -1, maxit = 100, trace = 1), do.ltildeprofile=FALSE, ...)
fitted |
an object of class |
profile |
a list with elements being numeric vectors of length 4. These vectors must
have the form
|
alpha |
|
control |
control object to use in |
do.ltildeprofile |
If |
... |
unused (argument of the generic). |
list with profile log-likelihood evaluations on the grid, and
– not implemented yet –
highest likelihood and Wald confidence intervals.
The argument profile
is also returned.
Michael Höhle
# profiling takes a while ## Not run: #Load the twinstim model fitted to the IMD data data("imdepi", "imdepifit") # for profiling we need the model environment imdepifit <- update(imdepifit, model=TRUE) #Generate profiling object for a list of parameters for the new model names <- c("h.(Intercept)","e.typeC") coefList <- lapply(names, function(name) { c(pmatch(name,names(coef(imdepifit))),NA,NA,11) }) #Profile object (necessary to specify a more loose convergence #criterion). Speed things up by using do.ltildeprofile=FALSE (the default) prof <- profile(imdepifit, coefList, control=list(reltol=0.1, REPORT=1), do.ltildeprofile=TRUE) #Plot result for one variable par(mfrow=c(1,2)) for (name in names) { with(as.data.frame(prof$lp[[name]]), matplot(grid,cbind(profile,estimated,wald), type="l",xlab=name,ylab="loglik")) legend(x="bottomleft",c("profile","estimated","wald"),lty=1:3,col=1:3) } ## End(Not run)
# profiling takes a while ## Not run: #Load the twinstim model fitted to the IMD data data("imdepi", "imdepifit") # for profiling we need the model environment imdepifit <- update(imdepifit, model=TRUE) #Generate profiling object for a list of parameters for the new model names <- c("h.(Intercept)","e.typeC") coefList <- lapply(names, function(name) { c(pmatch(name,names(coef(imdepifit))),NA,NA,11) }) #Profile object (necessary to specify a more loose convergence #criterion). Speed things up by using do.ltildeprofile=FALSE (the default) prof <- profile(imdepifit, coefList, control=list(reltol=0.1, REPORT=1), do.ltildeprofile=TRUE) #Plot result for one variable par(mfrow=c(1,2)) for (name in names) { with(as.data.frame(prof$lp[[name]]), matplot(grid,cbind(profile,estimated,wald), type="l",xlab=name,ylab="loglik")) legend(x="bottomleft",c("profile","estimated","wald"),lty=1:3,col=1:3) } ## End(Not run)
A spatial interaction function for use in twinstim
can be constructed via the siaf
function.
It checks the supplied function elements, assigns defaults for
missing arguments, and returns all checked arguments in a list.
However, for standard applications it is much easier to use one of the
pre-defined spatial interaction functions, e.g.,
siaf.gaussian
.
siaf(f, F, Fcircle, effRange, deriv, Deriv, simulate, npars, validpars = NULL)
siaf(f, F, Fcircle, effRange, deriv, Deriv, simulate, npars, validpars = NULL)
f |
the spatial interaction function. It must accept
two arguments, the first one being a (2-column) coordinate matrix, the
second one a parameter vector. For marked |
F |
function computing the integral of |
Fcircle |
optional function for fast calculation of the
(two-dimensional) integral of |
effRange |
optional function returning the “effective”
range of |
deriv |
optional derivative of |
Deriv |
function computing the integral of |
simulate |
optional function returning a sample drawn from the
spatial kernel (only required for the simulation of |
npars |
the number of parameters of the spatial interaction
function |
validpars |
optional function taking one argument, the parameter vector, indicating if it
is valid. This approach to specify parameter constraints is rarely
needed, because usual box-constrained parameters can be taken into
account by using L-BFGS-B as the optimization method in
|
list of checked arguments.
Sebastian Meyer
siaf.gaussian
for a pre-defined spatial interaction
function, and tiaf
for the temporal interaction function.
twinstim
In endemic-only twinstim
models, the conditional
intensity is a piecewise constant function independent from the history
of the process. This allows for a much more efficient simulation
algorithm than via Ogata's modified thinning as in the general
simulate.twinstim
method.
simEndemicEvents(object, tiles)
simEndemicEvents(object, tiles)
object |
an object of class |
tiles |
an object inheriting from |
Sebastian Meyer
the general simulation method simulate.twinstim
data("imdepi", "imdepifit") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## Fit an endemic-only twinstim() m_noepi <- update(imdepifit, epidemic = ~0, siaf = NULL, model = TRUE, T = 120) # using a restricted time range, for speed ## Simulate events from the above endemic model set.seed(1) s1 <- simEndemicEvents(m_noepi, tiles = districtsD) class(s1) # just a "SpatialPointsDataFrame" summary(s1@data) plot(imdepi$W, lwd = 2, asp = 1) plot(s1, col = s1$type, cex = 0.5, add = TRUE) ## Compare with the generic simulation method (slower) s0 <- simulate(m_noepi, seed = 1, data = imdepi, tiles = districtsD) class(s0) # gives a full "simEpidataCS" with several methods applicable methods(class = "epidataCS") plot(s0, "time") plot(s0, "space", points.args = list(pch = 3), lwd = 2)
data("imdepi", "imdepifit") load(system.file("shapes", "districtsD.RData", package="surveillance")) ## Fit an endemic-only twinstim() m_noepi <- update(imdepifit, epidemic = ~0, siaf = NULL, model = TRUE, T = 120) # using a restricted time range, for speed ## Simulate events from the above endemic model set.seed(1) s1 <- simEndemicEvents(m_noepi, tiles = districtsD) class(s1) # just a "SpatialPointsDataFrame" summary(s1@data) plot(imdepi$W, lwd = 2, asp = 1) plot(s1, col = s1$type, cex = 0.5, add = TRUE) ## Compare with the generic simulation method (slower) s0 <- simulate(m_noepi, seed = 1, data = imdepi, tiles = districtsD) class(s0) # gives a full "simEpidataCS" with several methods applicable methods(class = "epidataCS") plot(s0, "time") plot(s0, "space", points.args = list(pch = 3), lwd = 2)
The function simEpidataCS
simulates events of a self-exciting
spatio-temporal point process of the "twinstim"
class.
Simulation works via Ogata's modified thinning of the conditional
intensity as described in Meyer et al. (2012). Note that simulation is
limited to the spatial and temporal range of stgrid
.
The simulate
method for objects of class
"twinstim"
simulates new epidemic data using the model and
the parameter estimates of the fitted object.
simEpidataCS(endemic, epidemic, siaf, tiaf, qmatrix, rmarks, events, stgrid, tiles, beta0, beta, gamma, siafpars, tiafpars, epilink = "log", t0 = stgrid$start[1], T = tail(stgrid$stop,1), nEvents = 1e5, control.siaf = list(F=list(), Deriv=list()), W = NULL, trace = 5, nCircle2Poly = 32, gmax = NULL, .allocate = 500, .skipChecks = FALSE, .onlyEvents = FALSE) ## S3 method for class 'twinstim' simulate(object, nsim = 1, seed = NULL, data, tiles, newcoef = NULL, rmarks = NULL, t0 = NULL, T = NULL, nEvents = 1e5, control.siaf = object$control.siaf, W = data$W, trace = FALSE, nCircle2Poly = NULL, gmax = NULL, .allocate = 500, simplify = TRUE, ...)
simEpidataCS(endemic, epidemic, siaf, tiaf, qmatrix, rmarks, events, stgrid, tiles, beta0, beta, gamma, siafpars, tiafpars, epilink = "log", t0 = stgrid$start[1], T = tail(stgrid$stop,1), nEvents = 1e5, control.siaf = list(F=list(), Deriv=list()), W = NULL, trace = 5, nCircle2Poly = 32, gmax = NULL, .allocate = 500, .skipChecks = FALSE, .onlyEvents = FALSE) ## S3 method for class 'twinstim' simulate(object, nsim = 1, seed = NULL, data, tiles, newcoef = NULL, rmarks = NULL, t0 = NULL, T = NULL, nEvents = 1e5, control.siaf = object$control.siaf, W = data$W, trace = FALSE, nCircle2Poly = NULL, gmax = NULL, .allocate = 500, simplify = TRUE, ...)
endemic |
see |
epidemic |
see |
siaf |
see |
tiaf |
e.g. what is returned by the generating function
|
qmatrix |
see |
rmarks |
function of single time (1st argument) and location
(2nd argument) returning a one-row For the |
events |
|
stgrid |
see |
tiles |
object inheriting from |
beta0 , beta , gamma , siafpars , tiafpars
|
these are the parameter subvectors of the |
epilink |
a character string determining the link function to be used for the
|
t0 |
|
T , nEvents
|
simulate a maximum of |
W |
see |
trace |
logical (or integer) indicating if (or how often) the current
simulation status should be |
.allocate |
number of rows (events) to initially allocate for the event history;
defaults to 500. Each time the simulated epidemic exceeds the
allocated space, the event |
.skipChecks , .onlyEvents
|
these logical arguments are not meant to be set by the user.
They are used by the |
object |
an object of class |
nsim |
number of epidemics (i.e. spatio-temporal point patterns inheriting
from class |
seed |
an object specifying how the random number generator should be
initialized for simulation (via |
data |
an object of class |
newcoef |
an optional named numeric vector of (a subset of) parameters to
replace the original point estimates in |
simplify |
logical. It is strongly recommended to set |
control.siaf |
see |
nCircle2Poly |
see |
gmax |
maximum value the temporal interaction function
|
... |
unused (arguments of the generic). |
The function simEpidataCS
returns a simulated epidemic of class
"simEpidataCS"
, which enhances the class
"epidataCS"
by the following additional components known from
objects of class "twinstim"
:
bbox
, timeRange
, formula
, coefficients
,
npars
, control.siaf
, call
, runtime
.
It has corresponding coeflist
,
residuals
,
R0
, and
intensityplot
methods.
The simulate.twinstim
method has some additional
attributes set on its result:
call
, seed
, and runtime
.
If nsim > 1
, it returns an object of class
"simEpidataCSlist"
, the form of which depends on the value of
simplify
(which is stored as an attribute simplified
):
if simplify = FALSE
, then the return value is
just a list of sequential simulations, each of class
"simEpidataCS"
. However, if simplify = TRUE
, then the
sequential simulations share all components but the simulated
events
, i.e. the result is a list with the same components as
a single object of class "simEpidataCS"
, but with events
replaced by an eventsList
containing the events
returned
by each of the simulations.
The stgrid
component of the returned "simEpidataCS"
will be truncated to the actual end of the simulation, which might
be , if the upper bound
nEvents
is reached during
simulation.
CAVE: Currently, simplify=TRUE
in simulate.twinstim
ignores that multiple simulated epidemics
(nsim > 1
) may have different stgrid
time ranges. In a "simEpidataCSlist"
, the stgrid
shared
by all of the simulated epidemics is just the stgrid
returned by the first simulation.
The more detailed the polygons in tiles
are the slower is
the algorithm. You are advised to sacrifice some shape
details for speed by reducing the polygon complexity,
for example via the mapshaper
JavaScript library wrapped by
the R package rmapshaper, or via
simplify.owin
.
Sebastian Meyer, with contributions by Michael Höhle
Douglas, D. H. and Peucker, T. K. (1973): Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122
Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. doi:10.1111/j.1541-0420.2011.01684.x
The function simEndemicEvents
is a faster alternative
for endemic-only models, only returning a
"SpatialPointsDataFrame"
of simulated events.
The plot.epidataCS
and animate.epidataCS
methods for plotting and animating continuous-space epidemic data,
respectively, also work for simulated epidemics (by inheritance),
and twinstim
can be used to fit
spatio-temporal conditional intensity models also to simulated data.
data("imdepi", "imdepifit") ## load borders of Germany's districts (originally obtained from ## the German Federal Agency for Cartography and Geodesy, ## https://gdz.bkg.bund.de/), simplified by the "modified Visvalingam" ## algorithm (level=6.6%) using MapShaper.org (v. 0.1.17): load(system.file("shapes", "districtsD.RData", package="surveillance")) if (surveillance.options("allExamples")) { plot(districtsD) plot(stateD, add=TRUE, border=2, lwd=2) } ## simulate 2 realizations (over a short period, for speed) ## considering events from data(imdepi) before t=31 as prehistory ## IGNORE_RDIFF_BEGIN mysims <- simulate(imdepifit, nsim=2, seed=1, data=imdepi, tiles=districtsD, newcoef=c("e.typeC"=-1), t0=31, T=if (interactive()) 180 else 45, # for CRAN simplify=TRUE) ## IGNORE_RDIFF_END ## plot both simulations using the plot-method for simEpidataCSlist's mysims plot(mysims, aggregate="time") ## extract the second realization -> object of class simEpidataCS mysim2 <- mysims[[2]] summary(mysim2) plot(mysim2, aggregate="space") ### compare the observed _cumulative_ number of cases in the first 90 days to nsim <- 20 ### simulations from the fitted model sims <- simulate(imdepifit, nsim=nsim, seed=1, data=imdepi, t0=0, T=90, tiles=districtsD, simplify=TRUE) ## extract cusums getcsums <- function (events) { tapply(events$time, events@data["type"], function (t) cumsum(table(t)), simplify=FALSE) } csums_observed <- getcsums(imdepi$events) csums_simulated <- lapply(sims$eventsList, getcsums) ## plot it plotcsums <- function (csums, ...) { mapply(function (csum, ...) lines(as.numeric(names(csum)), csum, ...), csums, ...) invisible() } plot(c(0,90), c(0,35), type="n", xlab="Time [days]", ylab="Cumulative number of cases") plotcsums(csums_observed, col=c(2,4), lwd=3) legend("topleft", legend=levels(imdepi$events$type), col=c(2,4), lwd=1) invisible(lapply(csums_simulated, plotcsums, col=adjustcolor(c(2,4), alpha.f=0.5))) ## Not run: ### Experimental code to generate 'nsim' simulations of 'nm2add' months ### beyond the observed time period: nm2add <- 24 nsim <- 5 ### The events still infective by the end of imdepi$stgrid will be used ### as the prehistory for the continued process. origT <- tail(imdepi$stgrid$stop, 1) ## extend the 'stgrid' by replicating the last block 'nm2add' times ## (i.e., holding "popdensity" constant) stgridext <- local({ gLast <- subset(imdepi$stgrid, BLOCK == max(BLOCK)) gAdd <- gLast[rep(1:nrow(gLast), nm2add),]; rownames(gAdd) <- NULL newstart <- seq(origT, by=30, length.out=nm2add) newstop <- c(newstart[-1], max(newstart) + 30) gAdd$start <- rep(newstart, each=nlevels(gAdd$tile)) gAdd$stop <- rep(newstop, each=nlevels(gAdd$tile)) rbind(imdepi$stgrid, gAdd, make.row.names = FALSE)[,-1] }) ## create an updated "epidataCS" with the time-extended 'stgrid' imdepiext <- update(imdepi, stgrid = stgridext) newT <- tail(imdepiext$stgrid$stop, 1) ## simulate beyond the original period simsext <- simulate(imdepifit, nsim=nsim, seed=1, t0=origT, T=newT, data=imdepiext, tiles=districtsD, simplify=TRUE) ## Aside to understand the note from checking events and tiles: # marks(imdepi)["636",] # tile 09662 is attributed to this event, but: # plot(districtsD[c("09678","09662"),], border=1:2, lwd=2, axes=TRUE) # points(imdepi$events["636",]) ## this mismatch is due to polygon simplification ## plot the observed and simulated event numbers over time plot(imdepiext, breaks=c(unique(imdepi$stgrid$start),origT), cumulative=list(maxat=330)) for (i in seq_along(simsext$eventsList)) plot(simsext[[i]], add=TRUE, legend.types=FALSE, breaks=c(unique(simsext$stgrid$start),newT), subset=!is.na(source), # have to exclude the events of the prehistory cumulative=list(offset=c(table(imdepi$events$type)), maxat=330, axis=FALSE), border=NA, density=0) # no histogram abline(v=origT, lty=2, lwd=2) ## End(Not run)
data("imdepi", "imdepifit") ## load borders of Germany's districts (originally obtained from ## the German Federal Agency for Cartography and Geodesy, ## https://gdz.bkg.bund.de/), simplified by the "modified Visvalingam" ## algorithm (level=6.6%) using MapShaper.org (v. 0.1.17): load(system.file("shapes", "districtsD.RData", package="surveillance")) if (surveillance.options("allExamples")) { plot(districtsD) plot(stateD, add=TRUE, border=2, lwd=2) } ## simulate 2 realizations (over a short period, for speed) ## considering events from data(imdepi) before t=31 as prehistory ## IGNORE_RDIFF_BEGIN mysims <- simulate(imdepifit, nsim=2, seed=1, data=imdepi, tiles=districtsD, newcoef=c("e.typeC"=-1), t0=31, T=if (interactive()) 180 else 45, # for CRAN simplify=TRUE) ## IGNORE_RDIFF_END ## plot both simulations using the plot-method for simEpidataCSlist's mysims plot(mysims, aggregate="time") ## extract the second realization -> object of class simEpidataCS mysim2 <- mysims[[2]] summary(mysim2) plot(mysim2, aggregate="space") ### compare the observed _cumulative_ number of cases in the first 90 days to nsim <- 20 ### simulations from the fitted model sims <- simulate(imdepifit, nsim=nsim, seed=1, data=imdepi, t0=0, T=90, tiles=districtsD, simplify=TRUE) ## extract cusums getcsums <- function (events) { tapply(events$time, events@data["type"], function (t) cumsum(table(t)), simplify=FALSE) } csums_observed <- getcsums(imdepi$events) csums_simulated <- lapply(sims$eventsList, getcsums) ## plot it plotcsums <- function (csums, ...) { mapply(function (csum, ...) lines(as.numeric(names(csum)), csum, ...), csums, ...) invisible() } plot(c(0,90), c(0,35), type="n", xlab="Time [days]", ylab="Cumulative number of cases") plotcsums(csums_observed, col=c(2,4), lwd=3) legend("topleft", legend=levels(imdepi$events$type), col=c(2,4), lwd=1) invisible(lapply(csums_simulated, plotcsums, col=adjustcolor(c(2,4), alpha.f=0.5))) ## Not run: ### Experimental code to generate 'nsim' simulations of 'nm2add' months ### beyond the observed time period: nm2add <- 24 nsim <- 5 ### The events still infective by the end of imdepi$stgrid will be used ### as the prehistory for the continued process. origT <- tail(imdepi$stgrid$stop, 1) ## extend the 'stgrid' by replicating the last block 'nm2add' times ## (i.e., holding "popdensity" constant) stgridext <- local({ gLast <- subset(imdepi$stgrid, BLOCK == max(BLOCK)) gAdd <- gLast[rep(1:nrow(gLast), nm2add),]; rownames(gAdd) <- NULL newstart <- seq(origT, by=30, length.out=nm2add) newstop <- c(newstart[-1], max(newstart) + 30) gAdd$start <- rep(newstart, each=nlevels(gAdd$tile)) gAdd$stop <- rep(newstop, each=nlevels(gAdd$tile)) rbind(imdepi$stgrid, gAdd, make.row.names = FALSE)[,-1] }) ## create an updated "epidataCS" with the time-extended 'stgrid' imdepiext <- update(imdepi, stgrid = stgridext) newT <- tail(imdepiext$stgrid$stop, 1) ## simulate beyond the original period simsext <- simulate(imdepifit, nsim=nsim, seed=1, t0=origT, T=newT, data=imdepiext, tiles=districtsD, simplify=TRUE) ## Aside to understand the note from checking events and tiles: # marks(imdepi)["636",] # tile 09662 is attributed to this event, but: # plot(districtsD[c("09678","09662"),], border=1:2, lwd=2, axes=TRUE) # points(imdepi$events["636",]) ## this mismatch is due to polygon simplification ## plot the observed and simulated event numbers over time plot(imdepiext, breaks=c(unique(imdepi$stgrid$start),origT), cumulative=list(maxat=330)) for (i in seq_along(simsext$eventsList)) plot(simsext[[i]], add=TRUE, legend.types=FALSE, breaks=c(unique(simsext$stgrid$start),newT), subset=!is.na(source), # have to exclude the events of the prehistory cumulative=list(offset=c(table(imdepi$events$type)), maxat=330, axis=FALSE), border=NA, density=0) # no histogram abline(v=origT, lty=2, lwd=2) ## End(Not run)
stepComponent
is a wrapper around step
to select a
"twinstim"
component's model based on an information
criterion in a stepwise algorithm.
There are also stand-alone single-step methods of add1
and
drop1
.
stepComponent(object, component = c("endemic", "epidemic"), scope = list(upper = object$formula[[component]]), direction = "both", trace = 2, verbose = FALSE, ...) ## S3 method for class 'twinstim' add1(object, scope, component = c("endemic", "epidemic"), trace = 2, ...) ## S3 method for class 'twinstim' drop1(object, scope, component = c("endemic", "epidemic"), trace = 2, ...)
stepComponent(object, component = c("endemic", "epidemic"), scope = list(upper = object$formula[[component]]), direction = "both", trace = 2, verbose = FALSE, ...) ## S3 method for class 'twinstim' add1(object, scope, component = c("endemic", "epidemic"), trace = 2, ...) ## S3 method for class 'twinstim' drop1(object, scope, component = c("endemic", "epidemic"), trace = 2, ...)
object |
an object of class |
component |
one of |
scope , direction , trace
|
|
verbose |
see |
... |
further arguments passed to |
See step
and add1
, respectively.
(of this wrapper around step
) Sebastian Meyer
data("imdepi", "imdepifit") ## simple baseline model m0 <- update(imdepifit, epidemic=~1, siaf=NULL) ## AIC-based step-wise backward selection of the endemic component m0_step <- stepComponent(m0, "endemic", scope=list(lower=~I(start/365-3.5))) ## nothing is dropped from the model
data("imdepi", "imdepifit") ## simple baseline model m0 <- update(imdepifit, epidemic=~1, siaf=NULL) ## AIC-based step-wise backward selection of the endemic component m0_step <- stepComponent(m0, "endemic", scope=list(lower=~I(start/365-3.5))) ## nothing is dropped from the model
A temporal interaction function for use in twinstim
can be constructed via the tiaf
function.
It checks the supplied function elements, assigns defaults for
missing arguments, and returns all checked arguments in a list.
However, for standard applications it is much easier to use one of the
pre-defined temporal interaction functions, e.g.,
tiaf.exponential
.
tiaf(g, G, deriv, Deriv, npars, validpars = NULL)
tiaf(g, G, deriv, Deriv, npars, validpars = NULL)
g |
the temporal interaction function. It must accept
two arguments, the first one being a vector of time points, the
second one a parameter vector. For marked |
G |
a primitive of |
deriv |
optional derivative of |
Deriv |
optional primitive of |
npars |
the number of parameters of the temporal interaction
function |
validpars |
optional function taking one argument, the parameter vector, indicating if it
is valid. This approach to specify parameter constraints is rarely
needed, because usual box-constrained parameters can be taken into
account by using L-BFGS-B as the optimization method in
|
list of checked arguments.
Sebastian Meyer
tiaf.exponential
for a pre-defined temporal interaction
function, and siaf
for the spatial interaction function.
update
-method for "twinstim"
Update and (by default) re-fit a "twinstim"
. This method is
especially useful if one wants to add the model
environment
(which is required for some methods) to a fitted model object a posteriori.
## S3 method for class 'twinstim' update(object, endemic, epidemic, control.siaf, optim.args, model, ..., use.estimates = TRUE, evaluate = TRUE)
## S3 method for class 'twinstim' update(object, endemic, epidemic, control.siaf, optim.args, model, ..., use.estimates = TRUE, evaluate = TRUE)
object |
a previous |
endemic , epidemic
|
changes to the formulae – see
|
control.siaf |
a list (see |
optim.args |
see |
model |
see |
... |
Additional arguments to the call, or arguments with changed
values. |
use.estimates |
logical indicating if the estimates of
|
evaluate |
If |
If evaluate = TRUE
the re-fitted object, otherwise the updated call.
Sebastian Meyer
Inspiration and some pieces of code originate from
update.default
by the R Core Team.
data("imdepi", "imdepifit") ## add another epidemic covariate ## (but fix siaf-parameter so that this example runs quickly) imdepifit2 <- update(imdepifit, epidemic = ~. + log(popdensity), optim.args = list(fixed="e.siaf.1")) ## compare by AIC AIC(imdepifit, imdepifit2)
data("imdepi", "imdepifit") ## add another epidemic covariate ## (but fix siaf-parameter so that this example runs quickly) imdepifit2 <- update(imdepifit, epidemic = ~. + log(popdensity), optim.args = list(fixed="e.siaf.1")) ## compare by AIC AIC(imdepifit, imdepifit2)
"SpatialPolygons"
Union all subpolygons of a
"SpatialPolygons"
object.
This is a legacy wrapper for the polygon clipping engines implemented by
packages sf and polyclip.
Internally, both method
s need to convert the input polygons to a
class appropriate for the method
, so are rather inefficient.
unionSpatialPolygons(SpP, method = c("sf", "polyclip"), ...)
unionSpatialPolygons(SpP, method = c("sf", "polyclip"), ...)
SpP |
an object of class
|
method |
polygon clipping machinery to use. Default is to call
|
... |
further arguments passed to the chosen |
an object of class
"SpatialPolygons"
representing
the union of all subpolygons.
Sebastian Meyer
st_union
in package sf,
polyclip
in package polyclip.
## Load districts of Germany load(system.file("shapes", "districtsD.RData", package = "surveillance")) plot(districtsD, border = "gray", asp = 1) ## Union these districts using either "sf" or "polyclip" if (requireNamespace("sf")) { stateD <- unionSpatialPolygons(districtsD, method = "sf") plot(stateD, add = TRUE, border = 2, lwd = 2) } if (requireNamespace("polyclip")) { stateD_pc <- unionSpatialPolygons(districtsD, method = "polyclip") plot(stateD_pc, add = TRUE, border = 1, lwd = 2, lty = 2) }
## Load districts of Germany load(system.file("shapes", "districtsD.RData", package = "surveillance")) plot(districtsD, border = "gray", asp = 1) ## Union these districts using either "sf" or "polyclip" if (requireNamespace("sf")) { stateD <- unionSpatialPolygons(districtsD, method = "sf") plot(stateD, add = TRUE, border = 2, lwd = 2) } if (requireNamespace("polyclip")) { stateD_pc <- unionSpatialPolygons(districtsD, method = "polyclip") plot(stateD_pc, add = TRUE, border = 1, lwd = 2, lty = 2) }
This is a generic function intended to randomly break tied data in a
way similar to what jitter
does: tie-breaking is
performed by shifting all data points by a random amount.
The surveillance package defines methods for matrices,
"epidataCS"
, and a default method for numeric vectors.
untie(x, amount, ...) ## S3 method for class 'epidataCS' untie(x, amount = list(t=NULL, s=NULL), minsep = list(t=0, s=0), direction = "left", keep.sources = FALSE, ..., verbose = FALSE) ## S3 method for class 'matrix' untie(x, amount = NULL, minsep = 0, constraint = NULL, giveup = 1000, ...) ## Default S3 method: untie(x, amount = NULL, minsep = 0, direction = c("symmetric", "left", "right"), sort = NULL, giveup = 1000, ...)
untie(x, amount, ...) ## S3 method for class 'epidataCS' untie(x, amount = list(t=NULL, s=NULL), minsep = list(t=0, s=0), direction = "left", keep.sources = FALSE, ..., verbose = FALSE) ## S3 method for class 'matrix' untie(x, amount = NULL, minsep = 0, constraint = NULL, giveup = 1000, ...) ## Default S3 method: untie(x, amount = NULL, minsep = 0, direction = c("symmetric", "left", "right"), sort = NULL, giveup = 1000, ...)
x |
the data to be untied. |
amount |
upper bound for the random amount by which data are shifted.
|
minsep |
minimum separation of jittered points. Can only be
obeyed if much smaller than |
keep.sources |
logical ( |
constraint |
an object of class |
giveup |
number of attempts after which the algorithm should stop trying to generate new points. |
direction |
one of |
sort |
logical indicating if the jittered vector should be sorted. Defaults to doing so if the original vector was already sorted. |
... |
For the |
verbose |
logical passed to |
For numeric vectors (default method), the jittered version is the
same as for jitter(x, amount=amount)
, if
direction="symmetric"
(and amount
is non-NULL
),
and otherwise uses
x
“+-” runif(length(x), 0, amount)
.
For matrices, a vector uniformly drawn from the disc with radius
amount
is added to each point (row).
For "epidataCS"
, amount
is a list stating the amounts
for the temporal and/or spatial dimension, respectively. It then
uses the specific methods with arguments constraint=x$W
,
direction
, and sort=TRUE
. Note that this implements a
simplistic approach of tie-breaking where all events are assumed to be
subject to the same amounts of censoring, and the default amounts may
not be sensible choices.
the untied (jittered) data.
Sebastian Meyer
# vector example set.seed(123) untie(c(rep(1,3), rep(1.2, 4), rep(3,3)), direction="left", sort=FALSE) # spatial example data(imdepi) coords <- coordinates(imdepi$events) table(duplicated(coords)) plot(coords, cex=sqrt(multiplicity(coords))) set.seed(1) coords_untied <- untie(coords) stopifnot(!anyDuplicated(coords_untied)) points(coords_untied, col=2) # shifted by very small amount in this case
# vector example set.seed(123) untie(c(rep(1,3), rep(1.2, 4), rep(3,3)), direction="left", sort=FALSE) # spatial example data(imdepi) coords <- coordinates(imdepi$events) table(duplicated(coords)) plot(coords, cex=sqrt(multiplicity(coords))) set.seed(1) coords_untied <- untie(coords) stopifnot(!anyDuplicated(coords_untied)) points(coords_untied, col=2) # shifted by very small amount in this case
This function takes an sts
object and applies an univariate
surveillance algorithm to the time series of each observational unit.
wrap.algo(sts, algo, control,control.hook=function(k, control) return(control),verbose=TRUE,...) bayes(sts, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05),...) rki(sts, control = list(range = range, b = 2, w = 4, actY = FALSE),...) cusum(sts, control = list(range=range, k=1.04, h=2.26, m=NULL, trans="standard",alpha=NULL),...) glrpois(sts, control = list(range=range,c.ARL=5, S=1,beta=NULL, Mtilde=1, M=-1, change="intercept",theta=NULL),...) glrnb(sts, control = list(range=range,c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL,dir=c("inc","dec"), ret=c("cases","value")),...) outbreakP(sts, control=list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5),...)
wrap.algo(sts, algo, control,control.hook=function(k, control) return(control),verbose=TRUE,...) bayes(sts, control = list(range = range, b = 0, w = 6, actY = TRUE,alpha=0.05),...) rki(sts, control = list(range = range, b = 2, w = 4, actY = FALSE),...) cusum(sts, control = list(range=range, k=1.04, h=2.26, m=NULL, trans="standard",alpha=NULL),...) glrpois(sts, control = list(range=range,c.ARL=5, S=1,beta=NULL, Mtilde=1, M=-1, change="intercept",theta=NULL),...) glrnb(sts, control = list(range=range,c.ARL=5, mu0=NULL, alpha=0, Mtilde=1, M=-1, change="intercept", theta=NULL,dir=c("inc","dec"), ret=c("cases","value")),...) outbreakP(sts, control=list(range = range, k=100, ret=c("cases","value"),maxUpperboundCases=1e5),...)
sts |
Object of class |
algo |
Character string giving the function name of the algorithm
to call, e.g. |
control |
Control object as list. Depends on each algorithm. |
control.hook |
This is a function for handling multivariate objects. This argument is a function function of integer k and the current control object and which returns the appropriate control object for region k. |
verbose |
Boolean, if |
... |
currently ignored. |
An sts
object with the alarm
, upperbound
,
etc. slots filled with the results of independent and univariate
surveillance algorithm.
M. Höhle
algo.rki
, algo.farrington
,
algo.cusum
, algo.glrpois
,
algo.glrnb
, algo.outbreakP
for the exact form of the control
object.
Compute power-law weights with decay parameter d
based on a matrix of neighbourhood orders nbmat
(e.g., as obtained via nbOrder
).
Without normalization and truncation,
this is just (where
is a neighbourhood order).
This function is mainly used internally for
W_powerlaw
weights in hhh4
models.
zetaweights(nbmat, d = 1, maxlag = max(nbmat), normalize = FALSE)
zetaweights(nbmat, d = 1, maxlag = max(nbmat), normalize = FALSE)
nbmat |
numeric, symmetric matrix of neighbourhood orders. |
d |
single numeric decay parameter (default: 1). Should be positive. |
maxlag |
single numeric specifying an upper limit for the power
law. For neighbourhood orders > |
normalize |
Should the resulting weight matrix be normalized such that rows sum to 1? |
a numeric matrix with same dimensions and names as the input matrix.
Sebastian Meyer
nbmat <- matrix(c(0,1,2,2, 1,0,1,1, 2,1,0,2, 2,1,2,0), 4, 4, byrow=TRUE) zetaweights(nbmat, d=1, normalize=FALSE) # harmonic: o^-1 zetaweights(nbmat, d=1, normalize=TRUE) # rowSums=1 zetaweights(nbmat, maxlag=1, normalize=FALSE) # results in adjacency matrix
nbmat <- matrix(c(0,1,2,2, 1,0,1,1, 2,1,0,2, 2,1,2,0), 4, 4, byrow=TRUE) zetaweights(nbmat, d=1, normalize=FALSE) # harmonic: o^-1 zetaweights(nbmat, d=1, normalize=TRUE) # rowSums=1 zetaweights(nbmat, maxlag=1, normalize=FALSE) # results in adjacency matrix