Title: | Qualitative Validation Methods |
---|---|
Description: | Qualitative methods for the validation of dynamic models. It contains (i) an orthogonal set of deviance measures for absolute, relative and ordinal scale and (ii) approaches accounting for time shifts. The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm. |
Authors: | K. Gerald van den Boogaart [aut, ths] , Stefanie Rost [aut], Thomas Petzoldt [aut, ths, cre] |
Maintainer: | Thomas Petzoldt <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3-5 |
Built: | 2024-10-06 03:47:50 UTC |
Source: | https://github.com/r-forge/qualv |
Qualitative methods for model validation.
This package contains functions for a qualitative model comparison.
Common quantitative deviance measures underestimate the
similarity of patterns if there are shifts in time between measurement
and simulation. Qualitative validation methods are additional methods
to validate models, especially useful to compare the patterns of observed and
simulated values.
For a complete list of functions with individual help pages, use
library(help="qualV")
.
Jachner, S., van den Boogaart, K.G. and Petzoldt, T. (2007) Statistical methods for the qualitative assessment of dynamic models with time delay (R package qualV), Journal of Statistical Software, 22(8), 1–30. doi:10.18637/jss.v022.i08.
Various deviance measures are computed allowing the user to find the aspects in which two time series differ.
compareME(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), ignore = c("raw", "centered", "scaled", "ordered"), geometry = c("real", "logarithmic", "geometric", "ordinal"), measure = c("mad", "var", "sd"), type = "normalized", time = "fixed", ..., col.vars=c("time", "ignore") ) ## S3 method for class 'compareME' print(x, ..., digits = 3) ## S3 method for class 'compareME' summary(object, ...)
compareME(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), ignore = c("raw", "centered", "scaled", "ordered"), geometry = c("real", "logarithmic", "geometric", "ordinal"), measure = c("mad", "var", "sd"), type = "normalized", time = "fixed", ..., col.vars=c("time", "ignore") ) ## S3 method for class 'compareME' print(x, ..., digits = 3) ## S3 method for class 'compareME' summary(object, ...)
o |
vector of observed values, |
p |
vector of predicted values, |
o.t |
vector of observation times, |
p.t |
vector of times for predicted values, |
ignore |
a subset of |
geometry |
a subset of |
measure |
a subset of |
type |
a subset of |
time |
a subset of |
... |
further arguments passed to |
col.vars |
a subset of |
digits |
number of significant digits displayed, |
x , object
|
objects of class |
The function provides a simple standard interface to get a first idea
on the similarities and dissimilarities of two time series spanning the
same time interval. The print
and summary
methods extract
the relevant information, rounded to an optional number of significant
digits.
The result is a list of ftable
s containing the deviance
measures of all requested combinations of parameters. The list is done
over the different types of measures requested.
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd = 0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y plot(x, o); lines(x, p); lines(x, os, col = "red") compareME(o, p) compareME(os, p) # observed and measured data with non-matching time intervals data(phyto) compareME(obs$y, sim$y, obs$t, sim$t, time = "fixed") tt <- timeTransME(obs$y, sim$y, obs$t, sim$t, ME = SMSLE, trials = 5) compareME(tt$yo, tt$yp) # show names of deviance measures compareME(type = "name")
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd = 0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y plot(x, o); lines(x, p); lines(x, os, col = "red") compareME(o, p) compareME(os, p) # observed and measured data with non-matching time intervals data(phyto) compareME(obs$y, sim$y, obs$t, sim$t, time = "fixed") tt <- timeTransME(obs$y, sim$y, obs$t, sim$t, ME = SMSLE, trials = 5) compareME(tt$yo, tt$yp) # show names of deviance measures compareME(type = "name")
The efficiency factor is a dimensionless statistic which directly relates predictions to observed data.
EF(o, p)
EF(o, p)
o |
vector of observed values |
p |
vector of corresponding predicted values |
Two time series are compared. 'EF'
is an overall measure
of similarity between fitted and observed values. Any model giving a
negative value cannot be recommended, whereas values close to one
indicate a 'near-perfect' fit.
EF |
efficiency factor |
Nash, J. E. and Sutcliffe, J. V. (1970) River flow forecasting through conceptual models part I - A discussion of principles. Journal of Hydrology, 10, 282-290.
Mayer, D. G. and Butler, D. G. (1993) Statistical Validation. Ecological Modelling, 68, 21-32.
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) EF(o, p) # observed and measured data with non-matching time intervals data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] EF(obsb$y, simb$y)
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) EF(o, p) # observed and measured data with non-matching time intervals data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] EF(obsb$y, simb$y)
A time series is characterised by a sequence of characters, indicating features of the time series itself, of its first or second derivative, steepness or level of values.
f.slope(x, y, f = 0.1, scale = c("mean", "range", "IQR", "sd", "none")) f.curve(x, y, f = 0.1, scale = c("mean", "range", "IQR", "sd", "none")) f.steep(x, y, f1 = 1, f2 = 0.1) f.level(y, high = 0.8, low = 0.2)
f.slope(x, y, f = 0.1, scale = c("mean", "range", "IQR", "sd", "none")) f.curve(x, y, f = 0.1, scale = c("mean", "range", "IQR", "sd", "none")) f.steep(x, y, f1 = 1, f2 = 0.1) f.level(y, high = 0.8, low = 0.2)
x |
vector of time |
y |
input y values |
f |
factor defining the limit for constant ( |
f1 |
factor for the upper bound of steepness |
f2 |
factor for the lower bound of steepness |
scale |
method for internal scaling, |
high |
lower limit of high values |
low |
upper limit of low values |
For the first derivative the segment between two values is
characterised by increasing ('A'), decreasing ('B') or constant ('C') and for
the second by convex ('K'), concave ('I') or linear ('J'). For the property of
the first derivative the segment between two values is characterised
by very steep ('S'), steep ('T') or not steep ('U') or the values are divided
into high ('H'), low ('L') or values in between ('M'). Note that for
the last two cases the original values and the not increments are
standardised (to ).
v |
interval sequence |
data(phyto) bbobs <- dpill(obs$t, obs$y) n <- tail(obs$t, n = 1) - obs$t[1] + 1 obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) obss <- data.frame(t = obsdpill$x, y = obsdpill$y) obss <- obss[match(sim$t, obss$t), ] f.slope(obss$t, obss$y) f.curve(obss$t, obss$y) f.steep(obss$t, obss$y, f1 = 30, f2 = 10) f.level(obss$y)
data(phyto) bbobs <- dpill(obs$t, obs$y) n <- tail(obs$t, n = 1) - obs$t[1] + 1 obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) obss <- data.frame(t = obsdpill$x, y = obsdpill$y) obss <- obss[match(sim$t, obss$t), ] f.slope(obss$t, obss$y) f.curve(obss$t, obss$y) f.steep(obss$t, obss$y, f1 = 30, f2 = 10) f.level(obss$y)
Given a set of predictions and a corresponding set of observations, the geometric validation index is a reliability index for the predictions.
GRI(o, p)
GRI(o, p)
o |
vector of observed values |
p |
vector of corresponding predicted values |
One possible interpretation of 'GRI' is that the simulation is accurate within a multiplicative factor 'GRI', i.e. the observed values fall between 1/GRI and GRI times of the corresponding predicted values. Values close to one indicate a good match.
GRI |
geometric reliability index |
Leggett, L. R. and Williams, L. R. (1981) A reliability index for models. Ecological Modelling, 13, 303-312. doi:10.1016/0304-3800(81)90034-X
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd = 0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) GRI(o, p) # observed and measured data with non-matching time intervals data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] GRI(obsb$y, simb$y)
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd = 0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) GRI(o, p) # observed and measured data with non-matching time intervals data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] GRI(obsb$y, simb$y)
Determines the longest common subsequence of two strings.
LCS(a, b)
LCS(a, b)
a |
vector (numeric or character), missing values are not accepted |
b |
vector (numeric or character), missing values are not accepted |
A longest common subsequence (LCS
) is a common subsequence
of two strings of maximum length. The LCS
Problem consists of
finding a LCS
of two given strings and its length
(LLCS
). A qualitative similarity index QSI
is
computed by division of the LLCS
over maximum length of
'a'
and 'b'
.
a |
vector |
b |
vector |
LLCS |
length of |
LCS |
longest common subsequence |
QSI |
quality similarity index |
va |
one possible |
vb |
one possible |
LCS
is now using a C version of the algorithm provided by
Dominik Reusser.
Wagner, R. A. and Fischer, M. J. (1974) The String-to-String Correction Problem. Journal of the ACM, 21, 168-173.
Paterson, M. and Dancik, V. (1994) Longest Common Subsequences. Mathematical Foundations of Computer Science, 841, 127-142.
Gusfield, D. (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, England, ISBN 0-521-58519-8.
# direct use a <- c("b", "c", "a", "b", "c", "b") b <- c("a", "b", "c", "c", "b") print(LCS(a, b)) # a constructed example x <- seq(0, 2 * pi, 0.1) # time y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) lcs <- LCS(f.slope(x, o), f.slope(x, p)) # too much noise lcs$LLCS lcs$QSI os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y lcs <- LCS(f.slope(x, os), f.slope(x, p)) lcs$LLCS lcs$QSI # observed and measured data with non-matching time intervals data(phyto) bbobs <- dpill(obs$t, obs$y) n <- tail(obs$t, n = 1) - obs$t[1] + 1 obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) obss <- data.frame(t = obsdpill$x, y = obsdpill$y) obss <- obss[match(sim$t, obss$t),] obs_f1 <- f.slope(obss$t, obss$y) sim_f1 <- f.slope(sim$t, sim$y) lcs <- LCS(obs_f1, sim_f1) lcs$QSI
# direct use a <- c("b", "c", "a", "b", "c", "b") b <- c("a", "b", "c", "c", "b") print(LCS(a, b)) # a constructed example x <- seq(0, 2 * pi, 0.1) # time y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias plot(x, o); lines(x, p) lcs <- LCS(f.slope(x, o), f.slope(x, p)) # too much noise lcs$LLCS lcs$QSI os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y lcs <- LCS(f.slope(x, os), f.slope(x, p)) lcs$LLCS lcs$QSI # observed and measured data with non-matching time intervals data(phyto) bbobs <- dpill(obs$t, obs$y) n <- tail(obs$t, n = 1) - obs$t[1] + 1 obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) obss <- data.frame(t = obsdpill$x, y = obsdpill$y) obss <- obss[match(sim$t, obss$t),] obs_f1 <- f.slope(obss$t, obss$y) sim_f1 <- f.slope(sim$t, sim$y) lcs <- LCS(obs_f1, sim_f1) lcs$QSI
The data contain the day since 1.1.1994 and observed/predicted biovolumes of phytoplankton.
data(phyto)
data(phyto)
Two data frames of two variables with the following components:
obs:
A data frame of observed phytoplankton concentration in Bautzen reservoir 1994 (TU Dresden, Institute of Hydrobiology, workgroup limnology) with the elements:
t:
time code
y:
observed biovolume (mg/L)
sim:
A data frame of predicted phytoplankton concentration in Bautzen reservoir 1994 (TU Dresden, Institute of Hydrobiology, workgroup Limnology) with the elements:
t:
time code
y:
predicted biovolume (mg/L)
Different methods for calculating the difference between two vectors.
generalME(o, p, ignore = c("raw", "centered", "scaled", "ordered"), geometry = c("real", "logarithmic", "geometric", "ordinal"), measure = c("mad", "var", "sd"), type = c("dissimilarity", "normalized", "similarity", "reference", "formula", "name", "function"), method = NULL) MAE(o, p, type = "dissimilarity") MAPE(o, p, type = "dissimilarity") MSE(o, p, type = "dissimilarity") RMSE(o, p, type = "dissimilarity") CMAE(o, p, type = "dissimilarity") CMSE(o, p, type = "dissimilarity") RCMSE(o, p, type = "dissimilarity") SMAE(o, p, type = "dissimilarity") SMSE(o, p, type = "dissimilarity") RSMSE(o, p, type = "dissimilarity") MALE(o, p, type = "dissimilarity") MAGE(o, p, type = "dissimilarity") RMSLE(o, p, type = "dissimilarity") RMSGE(o, p, type = "dissimilarity") SMALE(o, p, type = "dissimilarity") SMAGE(o, p, type = "dissimilarity") SMSLE(o, p, type = "dissimilarity") RSMSLE(o, p, type = "dissimilarity") RSMSGE(o, p, type = "dissimilarity") MAOE(o, p, type = "dissimilarity") MSOE(o, p, type = "dissimilarity") RMSOE(o, p, type = "dissimilarity")
generalME(o, p, ignore = c("raw", "centered", "scaled", "ordered"), geometry = c("real", "logarithmic", "geometric", "ordinal"), measure = c("mad", "var", "sd"), type = c("dissimilarity", "normalized", "similarity", "reference", "formula", "name", "function"), method = NULL) MAE(o, p, type = "dissimilarity") MAPE(o, p, type = "dissimilarity") MSE(o, p, type = "dissimilarity") RMSE(o, p, type = "dissimilarity") CMAE(o, p, type = "dissimilarity") CMSE(o, p, type = "dissimilarity") RCMSE(o, p, type = "dissimilarity") SMAE(o, p, type = "dissimilarity") SMSE(o, p, type = "dissimilarity") RSMSE(o, p, type = "dissimilarity") MALE(o, p, type = "dissimilarity") MAGE(o, p, type = "dissimilarity") RMSLE(o, p, type = "dissimilarity") RMSGE(o, p, type = "dissimilarity") SMALE(o, p, type = "dissimilarity") SMAGE(o, p, type = "dissimilarity") SMSLE(o, p, type = "dissimilarity") RSMSLE(o, p, type = "dissimilarity") RSMSGE(o, p, type = "dissimilarity") MAOE(o, p, type = "dissimilarity") MSOE(o, p, type = "dissimilarity") RMSOE(o, p, type = "dissimilarity")
o |
vector of observed values |
p |
vector of corresponding predicted values |
type |
one of |
ignore |
specifies which aspects should be ignored: |
geometry |
indicating the geometry to be used for the data and
the output, |
measure |
indicates how distances should be measured: as mean absolute distances like in MAD, as squared distances like in a variance, or as the root of mean squared distances like in sd. |
method |
optionally the function to be used can specified directly as a function or as a string. |
These comparison criteria are designed for a semiquantitative
comparison of observed values o
with predicted values
p
to validate the performance of the prediction.
The general naming convention follows the grammar scheme
[R][C|S]M[S|A][L|G|O]E
corresponding to
[Root] [Centered | Scaled] Mean [Squared | Absolute]
[Logarithmic, Geometric, Ordinal] Error
is used together with squared errors to indicate, that a root is applied to the mean.
indicates that an additive constant is allowed.
indicates that a scaling of the predictive sequence is allowed. Scaled implies centered for real scale.
indicates that squared error is used.
indicates that absolute error is used.
indicates that the error is calculated based on the logarithms of the values. This is useful for data on a relative scale.
indicates that the result is to be understood as a factor, similar to a geometric mean.
indicates that only the order of the observations is taken into account by analyzing the data by ranks scaled to the interval [0, 1].
The mean errors for squared error measures are based on the number of degrees of freedom of the residuals.
generalME |
selects the best deviance measure according to the description given in the parameters. It has the two additional possibilities of name and function in the type parameter. |
MAE |
mean absolute error |
MAPE |
mean absolute percentage error |
MSE |
mean squared error |
RMSE |
root mean squared error |
CMAE |
centered mean absolute error |
CMSE |
centered mean squared error |
RCMSE |
root centered mean squared error |
SMAE |
scaled mean absolute error |
SMSE |
scaled mean squared error |
RSMSE |
root scaled mean squared error |
MALE |
mean absolute logarithmic error |
MAGE |
mean absolute geometric error |
MSLE |
mean squared logarithmic error |
MSGE |
mean squared geometric error |
RMSLE |
root mean squared logarithmic error |
SMALE |
scaled mean absolute logarithmic error |
SMAGE |
scaled mean absolute relative error |
SMSLE |
scaled mean squared logarithmic error |
RSMSLE |
root scaled mean squared logarithmic error |
RSMSGE |
root scaled mean squared geometric error |
MAOE |
mean absolute ordinal error |
MSOE |
mean squared ordinal error |
RMSOE |
root mean squared ordinal error |
Mayer, D. G. and Butler, D. G. (1993) Statistical Validation. Ecological Modelling, 68, 21-32.
Jachner, S., van den Boogaart, K.G. and Petzoldt, T. (2007) Statistical methods for the qualitative assessment of dynamic models with time delay (R package qualV), Journal of Statistical Software, 22(8), 1–30. doi:10.18637/jss.v022.i08.
data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] o <- obsb$y p <- simb$y generalME(o, p, ignore = "raw", geometry = "real") MAE(o, p) MAPE(o, p) MSE(o, p) RMSE(o, p) CMAE(o, p) CMSE(o, p) RCMSE(o, p) SMAE(o, p) SMSE(o, p) RSMSE(o, p) MALE(o, p) MAGE(o, p) RMSLE(o, p) RMSGE(o, p) SMALE(o, p) SMAGE(o, p) SMSLE(o, p) RSMSLE(o, p) RSMSGE(o, p) MAOE(o, p) MSOE(o, p) RMSOE(o, p) MAE(o, p) MAPE(o, p) MSE(o, p, type = "s") RMSE(o, p, type = "s") CMAE(o, p, type = "s") CMSE(o, p, type = "s") RCMSE(o, p, type = "s") SMAE(o, p, type = "s") SMSE(o, p, type = "s") RSMSE(o, p, type = "s") MALE(o, p, type = "s") MAGE(o, p, type = "s") RMSLE(o, p, type = "s") RMSGE(o, p, type = "s") SMALE(o, p, type = "s") SMAGE(o, p, type = "s") SMSLE(o, p, type = "s") RSMSLE(o, p, type = "s") RSMSGE(o, p, type = "s") MAOE(o, p, type = "s") MSOE(o, p, type = "s") RMSOE(o, p, type = "s")
data(phyto) obsb <- na.omit(obs[match(sim$t, obs$t), ]) simb <- sim[na.omit(match(obs$t, sim$t)), ] o <- obsb$y p <- simb$y generalME(o, p, ignore = "raw", geometry = "real") MAE(o, p) MAPE(o, p) MSE(o, p) RMSE(o, p) CMAE(o, p) CMSE(o, p) RCMSE(o, p) SMAE(o, p) SMSE(o, p) RSMSE(o, p) MALE(o, p) MAGE(o, p) RMSLE(o, p) RMSGE(o, p) SMALE(o, p) SMAGE(o, p) SMSLE(o, p) RSMSLE(o, p) RSMSGE(o, p) MAOE(o, p) MSOE(o, p) RMSOE(o, p) MAE(o, p) MAPE(o, p) MSE(o, p, type = "s") RMSE(o, p, type = "s") CMAE(o, p, type = "s") CMSE(o, p, type = "s") RCMSE(o, p, type = "s") SMAE(o, p, type = "s") SMSE(o, p, type = "s") RSMSE(o, p, type = "s") MALE(o, p, type = "s") MAGE(o, p, type = "s") RMSLE(o, p, type = "s") RMSGE(o, p, type = "s") SMALE(o, p, type = "s") SMAGE(o, p, type = "s") SMSLE(o, p, type = "s") RSMSLE(o, p, type = "s") RSMSGE(o, p, type = "s") MAOE(o, p, type = "s") MSOE(o, p, type = "s") RMSOE(o, p, type = "s")
Dividing time series into interval sequences of qualitative
features and determining the similarity of the qualitative behavior by
means of the length of LCS
.
qvalLCS(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), smooth = c("none", "both", "obs", "sim"), feature = c("f.slope", "f.curve", "f.steep", "f.level")) ## S3 method for class 'qvalLCS' print(x, ...) ## S3 method for class 'qvalLCS' plot(x, y = NULL, ..., xlim = range(c(x$obs$x, x$sim$x)), ylim = range(c(x$obs$y, x$sim$y)), xlab = "time", ylab = " ", col.obs = "black", col.pred = "red", plot.title = paste("LLCS =", x$lcs$LLCS, ", QSI =", x$lcs$QSI), legend = TRUE) ## S3 method for class 'qvalLCS' summary(object, ...)
qvalLCS(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), smooth = c("none", "both", "obs", "sim"), feature = c("f.slope", "f.curve", "f.steep", "f.level")) ## S3 method for class 'qvalLCS' print(x, ...) ## S3 method for class 'qvalLCS' plot(x, y = NULL, ..., xlim = range(c(x$obs$x, x$sim$x)), ylim = range(c(x$obs$y, x$sim$y)), xlab = "time", ylab = " ", col.obs = "black", col.pred = "red", plot.title = paste("LLCS =", x$lcs$LLCS, ", QSI =", x$lcs$QSI), legend = TRUE) ## S3 method for class 'qvalLCS' summary(object, ...)
o |
vector of observed values |
p |
vector of predicted values |
o.t |
vector of observation times |
p.t |
vector of times for predicted values |
smooth |
character string to decide if values should be smoothed
before validation, default no smoothing |
feature |
one of |
x |
a result from a call of |
y |
y unused |
... |
further parameters to be past to
|
xlim |
the size of the plot in x-direction |
ylim |
the size of the plot in y-direction |
xlab |
the label of the x-axis of the plot |
ylab |
the label of the y-axis of the plot |
col.obs |
color to plot the observations |
col.pred |
color to plot the predictions |
plot.title |
title for the plot |
legend |
tegend for the plot |
object |
a result from a call of |
Common quantitative deviance measures underestimate the
similarity of patterns if there are shifts in time between measurement
and simulation. These methods also assume compareable values in each
time series of the whole time sequence. To compare values independent
of time the qualitative behavior of the time series could be
analyzed. Here the time series are divided into interval sequences
of their local shape. The comparison occurs on the basis of these
segments and not with the original time series. Here shifts in time
are possible, i.e. missing or additional segments are acceptable
without losing similarity. The dynamic programming algorithm of
the longest common subsequence LCS
is used to determine
QSI
as index of similarity of the patterns.
If selected the data are smoothed using a weighted average and a
Gaussian curve as kernel. The bandwidth is automatically selected
based on the plug-in methodology (dpill
, see package
KernSmooth for more details).
prints only the requested value, without additional information.
prints all the additional information.
shows a picture visualizing a LCS
.
The result is an object of type qvalLCS
with the following entries:
smooth |
smoothing parameter |
feature |
feature parameter |
o |
xy-table of observed values |
p |
xy-table of predicted values |
obs |
xy-table of (smoothed) observed values |
sim |
xy-table of (smoothed) simulated values |
obsf |
interval sequence of observation according to selected |
simf |
interval sequence of simulation according to selected |
lcs |
output of |
obs.lcs |
one |
sim.lcs |
one |
Agrawal R., K. Lin., H. Sawhney and K. Shim (1995). Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB '95: Proceedings of the 21. International Conference on Very Large Data Bases, pp. 490-501. Morgan Kaufmann Publishers Inc. ISBN 1-55860-379-4.
Cuberos F., J. Ortega, R. Gasca, M. Toro and J. Torres (2002). Qualitative comparison of temporal series - QSI. Topics in Artificial Intelligence. Lecture Notes in Artificial Intelligence, 2504, 75-87.
Jachner, S., K.G. v.d. Boogaart, T. Petzoldt (2007) Statistical methods for the qualitative assessment of dynamic models with time delay (R package qualV), Journal of Statistical Software, 22(8), 1–30. doi:10.18637/jss.v022.i08.
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias qvalLCS(o, p) qvalLCS(o, p, smooth="both", feature="f.curve") qv <- qvalLCS(o, p, smooth = "obs") print(qv) plot(qv, ylim=c(3, 8)) # observed and measured data with non-matching time steps data(phyto) qvlcs <- qvalLCS(obs$y, sim$y, obs$t, sim$t, smooth = "obs") basedate <- as.Date("1960/1/1") qvlcs$o$x <- qvlcs$o$x + basedate qvlcs$obs$x <- qvlcs$obs$x + basedate qvlcs$sim$x <- qvlcs$sim$x + basedate qvlcs$obs.lcs$x <- qvlcs$obs.lcs$x + basedate qvlcs$sim.lcs$x <- qvlcs$sim.lcs$x + basedate plot.qvalLCS(qvlcs) summary(qvlcs)
# a constructed example x <- seq(0, 2*pi, 0.1) y <- 5 + sin(x) # a process o <- y + rnorm(x, sd=0.2) # observation with random error p <- y + 0.1 # simulation with systematic bias qvalLCS(o, p) qvalLCS(o, p, smooth="both", feature="f.curve") qv <- qvalLCS(o, p, smooth = "obs") print(qv) plot(qv, ylim=c(3, 8)) # observed and measured data with non-matching time steps data(phyto) qvlcs <- qvalLCS(obs$y, sim$y, obs$t, sim$t, smooth = "obs") basedate <- as.Date("1960/1/1") qvlcs$o$x <- qvlcs$o$x + basedate qvlcs$obs$x <- qvlcs$obs$x + basedate qvlcs$sim$x <- qvlcs$sim$x + basedate qvlcs$obs.lcs$x <- qvlcs$obs.lcs$x + basedate qvlcs$sim.lcs$x <- qvlcs$sim.lcs$x + basedate plot.qvalLCS(qvlcs) summary(qvlcs)
Various function models for isoton bijective transformation of a time interval to itself.
transBeta(x, p, interval = c(0, 1), inv = FALSE, pmin = -3, pmax = 3, p0 = c(0, 0)) transSimplex(x, p, interval = c(0, 1), inv = FALSE, pmin = -2, pmax = 2, p0 = c(0, 0, 0, 0, 0)) transBezier(x, p, interval = c(0, 1), inv = FALSE, pmin = 0, pmax = 1, p0 = c(0.25, 0.25, 0.75, 0.75))
transBeta(x, p, interval = c(0, 1), inv = FALSE, pmin = -3, pmax = 3, p0 = c(0, 0)) transSimplex(x, p, interval = c(0, 1), inv = FALSE, pmin = -2, pmax = 2, p0 = c(0, 0, 0, 0, 0)) transBezier(x, p, interval = c(0, 1), inv = FALSE, pmin = 0, pmax = 1, p0 = c(0.25, 0.25, 0.75, 0.75))
x |
a vector of values to be transformed, |
p |
the vector of parameters for the transformation, |
interval |
a vector of length 2 giving the minimum and maximum value in the transformation interval. |
inv |
a boolean, if true the inverse transform is computed. |
pmin |
a number or a vector giving the minimal useful value for
the parameters. This information is not used by the function itself,
but rather provides a meta information about the function used in
|
pmax |
provides similar to |
p0 |
provides similar to |
transBeta
The transformation provided is the distribution function of the
Beta-Distribution with parameters exp(p[1])
and
exp(p[2])
scaled to the given interval. This function is
guaranteed to be strictly isotonic for every choice of p. p has
length 2. The strength of the Beta transformation is the reasonable
behavior for strong time deformations.
transSimplex
The transformation provided a simple linear interpolation. The interval is separated into equidistant time spans, which are transformed to non-equidistant length. The length of the new time spans is the proportional to exp(c(p, 0)). This function is guaranteed to be strictly isotonic for every choice of p. p can have any length. The strength of the Simplex transformation is the possibility to have totally different speeds at different times.
transBezier
The transformation is provided by a Bezier-Curve of order
length(p) / 2 + 1
. The first and last control point are given by
c(0, 0)
and c(1, 1)
and the intermediate control points
are given by p[c(1, 2) + 2 * i - 2]
. This function is not guaranteed
to be isotonic for length(p) > 4
. However it seams useful. A
major theoretical advantage is that this model is symmetric between
image and coimage. The strength of the Bezier transformation is fine
tuning of transformation.
The value is a vector of the same length as x
providing the
transformed values.
t <- seq(0, 1, length.out = 101) par(mfrow = c(3, 3)) plot(t, transBeta(t, c(0, 0)), type = "l") plot(t, transBeta(t, c(0, 1)), type = "l") plot(t, transBeta(t, c(-1,1)), type = "l") plot(t, transSimplex(t, c(0)), type = "l") plot(t, transSimplex(t, c(3, 2, 1)), type = "l") plot(t, transSimplex(t, c(0, 2)), type = "l") plot(t, transBezier(t, c(0, 1)), type = "l") plot(t, transBezier(t, c(0, 1, 1, 0)), type = "l") plot(t, transBezier(t, c(0.4, 0.6)), type = "l")
t <- seq(0, 1, length.out = 101) par(mfrow = c(3, 3)) plot(t, transBeta(t, c(0, 0)), type = "l") plot(t, transBeta(t, c(0, 1)), type = "l") plot(t, transBeta(t, c(-1,1)), type = "l") plot(t, transSimplex(t, c(0)), type = "l") plot(t, transSimplex(t, c(3, 2, 1)), type = "l") plot(t, transSimplex(t, c(0, 2)), type = "l") plot(t, transBezier(t, c(0, 1)), type = "l") plot(t, transBezier(t, c(0, 1, 1, 0)), type = "l") plot(t, transBezier(t, c(0.4, 0.6)), type = "l")
Transforming the time of predicted values by means of a monotonic mapping.
timeTransME(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), ignore = "scaled", geometry = "real", measure = "mad", type = c("dissimilarity", "normalized", "similarity", "reference"), interval = range(c(o.t, p.t)), time = c("transformed", "fixed"), trans = transBeta, p0 = eval(formals(trans)$p0), pmin = eval(formals(trans)$pmin, list(p = p0)), pmax = eval(formals(trans)$pmax, list(p = p0)), timeMEFactor = 0, timeME = MAE, timeMEtype = "normalized", timeScale = 1, ME = generalME(o, p, ignore, geometry, measure, type = "function"), MEtype = c("dissimilarity", "normalized"), trials = 100, debug = FALSE) ## S3 method for class 'timeTransME' print(x, ..., digits = 3) ## S3 method for class 'timeTransME' summary(object, ...) ## S3 method for class 'timeTransME' plot(x, y = NULL, ..., col.obs = "black", col.pred = "green", col.map = "red", sub = x$call, xlab = "t", xlim = range(x$x), ylim = range(c(0, x$yo, x$yp)))
timeTransME(o, p, o.t = seq(0, 1, length.out = length(o)), p.t = seq(0, 1, length.out = length(p)), ignore = "scaled", geometry = "real", measure = "mad", type = c("dissimilarity", "normalized", "similarity", "reference"), interval = range(c(o.t, p.t)), time = c("transformed", "fixed"), trans = transBeta, p0 = eval(formals(trans)$p0), pmin = eval(formals(trans)$pmin, list(p = p0)), pmax = eval(formals(trans)$pmax, list(p = p0)), timeMEFactor = 0, timeME = MAE, timeMEtype = "normalized", timeScale = 1, ME = generalME(o, p, ignore, geometry, measure, type = "function"), MEtype = c("dissimilarity", "normalized"), trials = 100, debug = FALSE) ## S3 method for class 'timeTransME' print(x, ..., digits = 3) ## S3 method for class 'timeTransME' summary(object, ...) ## S3 method for class 'timeTransME' plot(x, y = NULL, ..., col.obs = "black", col.pred = "green", col.map = "red", sub = x$call, xlab = "t", xlim = range(x$x), ylim = range(c(0, x$yo, x$yp)))
x |
a result from a call to |
object |
a result from a call to |
o |
vector of observed values |
p |
vector of predicted values |
o.t |
vector of observation times |
p.t |
vector of times for predicted values |
ignore |
one of |
geometry |
one of |
measure |
one of |
type |
one of
|
interval |
a vector with two entries giving start and end time of the experiment. |
time |
indicates wether the time should actually be transformed. LCS is currently not implemented. Use the LCS method directly. |
trans |
the model function for the time transformation. See
|
p0 |
the identity parameters for the time-transformation. A non
identity value can be given to force specific parameters for the
transformation with |
pmin |
number or vector providing the minimal allowed values for the parameters of the transformation. |
pmax |
number or vector providing the minimal allowed values for the parameters of the transformation. |
timeME |
The ME(o(x), p(trans(x, timep)), MEtype) + timeMEFactor * timeME(x * timeScale, trans(x, timep) * timeScale, timeMEtype) over |
timeMEtype |
the type of deviance measure (“dissimilarity” or
“normalized”) to be used for |
timeMEFactor |
a real value specifying the weighting of the time deformation against the value deformation. A value of 0 avoids penalty for time deformation. |
timeScale |
a scaling applied to the time values before
|
ME |
the deviance function to be used for the data. See |
MEtype |
the type of Mean Error to be used in the calculations. This is not the type of Measure to be reported. |
trials |
The number of random starting values that should be used during the optimization of the time transformation. The optimization of the time transformation is a very critical task of this procedure and it had been shown by practical tests that a single local optimization typically fails to find the globally best fit. Depending on the number of parameters a value between 100 and 10000 seems reasonable for this parameter. |
debug |
a logical. If true some diagnostic information for the optimization step is printed. |
... |
further parameters to be passed to
|
col.obs |
color to plot the observations |
col.pred |
color to plot the predictions |
col.map |
color to plot the mapped predictions |
sub |
the sub-headline of the plot |
xlab |
the label of the x-axis of the plot |
xlim |
the size of the plot in x-direction |
ylim |
the size of the plot in y-direction |
y |
y unused |
digits |
number of significant digits displayed |
Common quantitative deviance measures underestimate the
similarity of patterns if there are shifts in time between measurement
and simulation. An alternative to measure model performance
independent of shifts in time is to transform the time of the
simulation, i.e. to run the time faster or slower, and to compare the
performance before and after the transformation. The applied
transformation function must be monotonic. timeTransME
minimizes the joint criteriumME(o(x), p(trans(x, timep)), MEtype) +
timeMEFactor * timeME(x * timeScale, trans(x, timep) * timeScale, timeMEtype)
to find a best fitting time transformation.
print.timeTransME
prints only the requested value, without additional information.
summary.timeTransME
prints all the additional information.
plot.timeTransME
shows a picture visualising the fit of the transformed dataset. This can be used as a diagnostic.
The result is an object of type timeTransME
with the following entries:
totalME |
the requested measure with specified type, |
criterium |
the "dissimilarity" measure, which was calculated as a minimum of ME(o(x), p(trans(x, timep)), MEtype) + timeMEFactor * timeME(x * timeScale, trans(x, timep) * timeScale, timeMEtype) . |
reference |
the reference value of this criterium achieved without time deformation and full dissimilarity. |
call |
the call used to generate this deviance. |
x |
the times at which the series were compared from the perspective of the observations. |
xp |
the transformed times at which the series were compared from the perspective of the prediction. |
yo |
the interpolated values of the observations at times |
yp |
the interpolated values of the time transformed predictions
at times |
timeME |
the deviance of the time transformation:
|
timeMEref |
the reference value of timeME |
timeMEFactor |
the factor to be used for timeME in the weighting
with respect to |
timeScale |
the scaling to time to account for an other unit. |
p |
the parameter of trans minimizing the criterium. |
interval |
the interval of time under consideration |
trans |
the transformation function used for the time. |
optim |
contains informations about the convergence of the optimization procedure and a list of secondary minima found. This additional list element occurs only if there is actually a minimisation performed. |
The deviance calculated by timeTransME(..., time = "fixed")
and the
corresponding deviance measure are different because the timeTransME
does an interpolation and compares time sequences at different spacing,
while a simple deviance measure compares values only.
The CPU usage of the calculation of the
minimum, when trans = "transform"
is very high, because the
optimization is done a hundred times with random starting values for
the parameters. This is necessary since with the given objective the
general purpose optimizers often run into local minima and/or do not
converge. The number of iterations can be controlled with the
parameter trials
. Setting debug = TRUE
gives an impression
how long it takes to find an improved optimum.
set.seed(123) ## a constructed example x <- seq(0, 2*pi, length=10) o <- 5 + sin(x) + rnorm(x, sd=0.2) # observation with random error p <- 5 + sin(x-1) # simulation with time shift # timeTransME(o, p) # reasonably accurate but takes very long! # timeTransME(o, p, trials=5, debug=TRUE) ttbeta <- timeTransME(o, p, trials=5) plot(ttbeta) ## Not run: ttsimplex <- timeTransME(o, p, trans = transSimplex, trials=5) plot(ttsimplex) ttbezier <- timeTransME(o, p, trans = transBezier, trials=5) plot(ttbezier) ## End(Not run) ## observed and measured data with non-matching time intervals data(phyto) bbobs <- dpill(obs$t, obs$y) n <- diff(range(obs$t)) + 1 obss <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) names(obss) <- c("t", "y") obss <- as.data.frame(obss)[match(sim$t, obss$t), ] tt <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSE, timeMEFactor = 0, time = "transform", type = "n", trials = 5) round(tt$totalME, digits = 3) basedate <- as.Date("1960/1/1") plot(basedate + sim$t, sim$y, type="l", ylim = c(min(obs$y, sim$y), max(obs$y, sim$y)), xlab = "time", ylab = "Phytoplankton (mg/L)", col = 2, font = 2, lwd = 2, cex.lab = 1.2, las = 1) lines(basedate + obss$t, obss$y, lwd = 2) points(basedate + obs$t, obs$y, lwd = 2) lines(basedate + tt$x, tt$yp, lwd = 2, col = 2, lty = 2) legend(basedate + 12600, 50, c("measurement", "smoothed measurement", "simulation", "transformed simulation"), lty = c(0, 1, 1, 2), pch = c(1, NA, NA, NA), lwd = 2, col = c(1, 1, 2, 2)) tt1 <- timeTransME(obs$y, sim$y, obs$t, sim$t, ME = SMSLE, type = "n", time = "fixed") tt1 plot(tt1) summary(tt1) ## Not run: tt2 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSLE, type = "n", time = "trans", debug = TRUE) tt2 plot(tt2) # logarithm (SMSLE) is not appropriate for the example summary(tt2) tt3 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSE, type = "n", time = "trans", trans = transBezier, debug = TRUE) tt3 plot(tt3) summary(tt3) tt4 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = MSOE, type = "n", time = "trans", trans = transBezier, debug = TRUE) tt4 plot(tt4) summary(tt4) ## End(Not run)
set.seed(123) ## a constructed example x <- seq(0, 2*pi, length=10) o <- 5 + sin(x) + rnorm(x, sd=0.2) # observation with random error p <- 5 + sin(x-1) # simulation with time shift # timeTransME(o, p) # reasonably accurate but takes very long! # timeTransME(o, p, trials=5, debug=TRUE) ttbeta <- timeTransME(o, p, trials=5) plot(ttbeta) ## Not run: ttsimplex <- timeTransME(o, p, trans = transSimplex, trials=5) plot(ttsimplex) ttbezier <- timeTransME(o, p, trans = transBezier, trials=5) plot(ttbezier) ## End(Not run) ## observed and measured data with non-matching time intervals data(phyto) bbobs <- dpill(obs$t, obs$y) n <- diff(range(obs$t)) + 1 obss <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs, n.points = n) names(obss) <- c("t", "y") obss <- as.data.frame(obss)[match(sim$t, obss$t), ] tt <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSE, timeMEFactor = 0, time = "transform", type = "n", trials = 5) round(tt$totalME, digits = 3) basedate <- as.Date("1960/1/1") plot(basedate + sim$t, sim$y, type="l", ylim = c(min(obs$y, sim$y), max(obs$y, sim$y)), xlab = "time", ylab = "Phytoplankton (mg/L)", col = 2, font = 2, lwd = 2, cex.lab = 1.2, las = 1) lines(basedate + obss$t, obss$y, lwd = 2) points(basedate + obs$t, obs$y, lwd = 2) lines(basedate + tt$x, tt$yp, lwd = 2, col = 2, lty = 2) legend(basedate + 12600, 50, c("measurement", "smoothed measurement", "simulation", "transformed simulation"), lty = c(0, 1, 1, 2), pch = c(1, NA, NA, NA), lwd = 2, col = c(1, 1, 2, 2)) tt1 <- timeTransME(obs$y, sim$y, obs$t, sim$t, ME = SMSLE, type = "n", time = "fixed") tt1 plot(tt1) summary(tt1) ## Not run: tt2 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSLE, type = "n", time = "trans", debug = TRUE) tt2 plot(tt2) # logarithm (SMSLE) is not appropriate for the example summary(tt2) tt3 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = SMSE, type = "n", time = "trans", trans = transBezier, debug = TRUE) tt3 plot(tt3) summary(tt3) tt4 <- timeTransME(obss$y, sim$y, obss$t, sim$t, ME = MSOE, type = "n", time = "trans", trans = transBezier, debug = TRUE) tt4 plot(tt4) summary(tt4) ## End(Not run)