Title: | Soft Classification Performance Measures |
---|---|
Description: | An extension of sensitivity, specificity, positive and negative predictive value to continuous predicted and reference memberships in [0, 1]. |
Authors: | C. Beleites <[email protected]> |
Maintainer: | C. Beleites <[email protected]> |
License: | GPL |
Version: | 1.0-20160527 |
Built: | 2024-11-10 05:45:11 UTC |
Source: | https://github.com/r-forge/softclassval |
Extension of sensitivity, specificity, positive and negative predictive value to continuous predicted and reference memberships in [0, 1].
C. Beleites
Checks whether r
and p
are valid reference and predictions. If p
is a
multiple of r
, recycles r
to the size and shape of p
. If r
has
additional length 1 dimensions (usually because dimensions were dropped from p
), it is
shortend to the shape of p
.
checkrp(r, p)
checkrp(r, p)
r |
reference |
p |
prediction |
In addition, any NA
s in p
are transferred to r
so that these samples are
excluded from counting in nsamples
.
checkrp
is automatically called by the performance functions, but doing so beforehand and
then setting .checked = TRUE
can save time when several performance measures are to be
calculated on the same results.
r
, possibly recycled to length of p
or with dimensions shortened to p
.
Claudia Beleites
ref <- softclassval:::ref ref pred <- softclassval:::pred pred ref <- checkrp (r = ref, p = pred) sens (r = ref, p = pred, .checked = TRUE)
ref <- softclassval:::ref ref pred <- softclassval:::pred pred ref <- checkrp (r = ref, p = pred) sens (r = ref, p = pred, .checked = TRUE)
These performance measures can be used with prediction and reference being continuous class memberships in [0, 1].
Calculate the soft confusion matrix
confusion(r = stop("missing reference"), p = stop("missing prediction"), groups = NULL, operator = "prd", drop = FALSE, .checked = FALSE) confmat(r = stop("missing reference"), p = stop("missing prediction"), ...) sens(r = stop("missing reference"), p = stop("missing prediction"), groups = NULL, operator = "prd", op.dev = dev(match.fun(operator)), op.postproc = postproc(match.fun(operator)), eps = 1e-08, drop = FALSE, .checked = FALSE) spec(r = stop("missing reference"), p = stop("missing prediction"), ...) ppv(r = stop("missing reference"), p = stop("missing prediction"), ..., .checked = FALSE) npv(r = stop("missing reference"), p = stop("missing prediction"), ..., .checked = FALSE)
confusion(r = stop("missing reference"), p = stop("missing prediction"), groups = NULL, operator = "prd", drop = FALSE, .checked = FALSE) confmat(r = stop("missing reference"), p = stop("missing prediction"), ...) sens(r = stop("missing reference"), p = stop("missing prediction"), groups = NULL, operator = "prd", op.dev = dev(match.fun(operator)), op.postproc = postproc(match.fun(operator)), eps = 1e-08, drop = FALSE, .checked = FALSE) spec(r = stop("missing reference"), p = stop("missing prediction"), ...) ppv(r = stop("missing reference"), p = stop("missing prediction"), ..., .checked = FALSE) npv(r = stop("missing reference"), p = stop("missing prediction"), ..., .checked = FALSE)
r |
vector, matrix, or array with reference. |
p |
vector, matrix, or array with predictions |
groups |
grouping variable for the averaging by |
operator |
the |
drop |
should the results possibly be returned as vector instead of 1d array? (Note that
levels of |
.checked |
for internal use: the inputs are guaranteed to be of same size and shape. If
|
... |
handed to |
op.dev |
does the operator measure deviation? |
op.postproc |
if a post-processing function is needed after averaging, it can be given here. See the example. |
eps |
limit below which denominator is considered 0 |
The rows of r
and p
are considered the samples, columns will usually hold the
classes, and further dimensions are preserved but ignored.
r
must have the same number of rows and columns as p
, all other dimensions may be
filled by recycling.
spec
, ppv
, and npv
use the symmetry between the performance measures as
described in the article and call sens
.
numeric of size (ngroups x dim (p) [-1]
) with the respective performance measure
Claudia Beleites
see the literature in citation ("softclassval")
Operators: prd
For the complete confusion matrix, confmat
ref <- softclassval:::ref ref pred <- softclassval:::pred pred ## Single elements or diagonal of confusion matrix confusion (r = ref, p = pred) ## complete confusion matrix cm <- confmat (r = softclassval:::ref, p = pred) [1,,] cm ## Sensitivity-Specificity matrix: cm / rowSums (cm) ## Matrix with predictive values: cm / rep (colSums (cm), each = nrow (cm)) ## sensitivities sens (r = ref, p = pred) ## specificities spec (r = ref, p = pred) ## predictive values ppv (r = ref, p = pred) npv (r = ref, p = pred)
ref <- softclassval:::ref ref pred <- softclassval:::pred pred ## Single elements or diagonal of confusion matrix confusion (r = ref, p = pred) ## complete confusion matrix cm <- confmat (r = softclassval:::ref, p = pred) [1,,] cm ## Sensitivity-Specificity matrix: cm / rowSums (cm) ## Matrix with predictive values: cm / rep (colSums (cm), each = nrow (cm)) ## sensitivities sens (r = ref, p = pred) ## specificities spec (r = ref, p = pred) ## predictive values ppv (r = ref, p = pred) npv (r = ref, p = pred)
The operators measure either a performance (i.e. accordance between reference and prediction) or
a deviation. dev (op) == TRUE
marks operators measuring deviation.
dev(op) dev (op) <- value
dev(op) dev (op) <- value
op |
the operator (function) |
value |
logical indicating the operator type |
logical indicating the type of operator. NULL
if the attribute is missing.
Claudia Beleites
dev (wRMSE) myop <- function (r, p) p * (r == 1) dev (myop) <- TRUE
dev (wRMSE) myop <- function (r, p) p * (r == 1) dev (myop) <- TRUE
Converts a factor with hard class memberships into a membership matrix
factor2matrix(f)
factor2matrix(f)
f |
factor with class labels |
matrix of size length (f)
x nlevels (f)
Claudia Beleites
hardclasses
for the inverse
The operators may work only for hard classes (see and
). hard (op)
== TRUE
marks hard operators.
hard(op) hard (op) <- value
hard(op) hard (op) <- value
op |
the operator (function) |
value |
logical indicating the operator type |
logical indicating the type of operator. NULL
if the attribute is missing.
Claudia Beleites
hard (and) myop <- function (r, p) p * (r == 1) hard (myop) <- TRUE
hard (and) myop <- function (r, p) p * (r == 1) hard (myop) <- TRUE
hardclasses
converts the soft class labels in x
into a factor with hard class memberships and
NA
for soft samples.
hardclasses(x, classdim = 2L, soft.name = NA, tol = 1e-05, drop = TRUE) harden(x, classdim = 2L, tol = 1e-06, closed = TRUE)
hardclasses(x, classdim = 2L, soft.name = NA, tol = 1e-05, drop = TRUE) harden(x, classdim = 2L, tol = 1e-06, closed = TRUE)
x |
matrix or array holding the class memberships |
classdim |
dimension that holds the classes, default columns |
soft.name |
level for soft samples |
tol |
tolerance: samples with membership >= 1 - tol are considered to be hard samples of the respective class. |
drop |
see |
closed |
logical indicating whether the system should be treated as closed-world (i.e. all memberships add to 1) |
harden
hardens the soft
factor array of shape dim (x) [-classdim]
Claudia Beleites
factor2matrix
for the inverse
softclassval:::pred harden (softclassval:::pred) harden (softclassval:::pred, closed = FALSE) ## classical threshold at 0.5 harden (softclassval:::pred, tol = 0.5) ## grey zone: NA for memberships between 0.25 and 0.75 harden (softclassval:::pred, tol = 0.25) ## threshold at 0.7 = 0.5 + 0.2: harden (softclassval:::pred - 0.2, tol = 0.5) harden (softclassval:::pred - 0.2, tol = 0.5, closed = FALSE)
softclassval:::pred harden (softclassval:::pred) harden (softclassval:::pred, closed = FALSE) ## classical threshold at 0.5 harden (softclassval:::pred, tol = 0.5) ## grey zone: NA for memberships between 0.25 and 0.75 harden (softclassval:::pred, tol = 0.25) ## threshold at 0.7 = 0.5 + 0.2: harden (softclassval:::pred - 0.2, tol = 0.5) harden (softclassval:::pred - 0.2, tol = 0.5, closed = FALSE)
Count number of samples
nsamples(r = r, groups = NULL, operator = "prd", hard.operator)
nsamples(r = r, groups = NULL, operator = "prd", hard.operator)
r |
reference class labels with samples in rows. |
groups |
grouping variable for the averaging by |
operator |
the |
hard.operator |
optional: a logical determining whether only hard samples should be counted |
Basically, the reference is summed up. For hard operators, the reference is hardened first: soft
values, i.e. r
in (0, 1) are set to NA.
number of samples in each group (rows) for each class (columns) and all further dimensions of ref.
Claudia Beleites
ref <- softclassval:::ref ref nsamples (ref) nsamples (ref, hard.operator = TRUE)
ref <- softclassval:::ref ref nsamples (ref) nsamples (ref, hard.operator = TRUE)
The postprocessing function is applied during performance calculation after averaging but before
dev
is applied. This is the place where the root is taken of root mean squared errors.
postproc(op) postproc (op) <- value
postproc(op) postproc (op) <- value
op |
the operator (function) |
value |
function (or its name or symbol) to do the post-processing. |
postproc (op)
retrieves the postprocessing function (or NULL
if none is attached)
logical indicating the type of operator. NA
if the attribute is missing.
Claudia Beleites
postproc (wRMSE) myop <- function (r, p) p * (r == 1) postproc (myop) <- `sqrt`
postproc (wRMSE) myop <- function (r, p) p * (r == 1) postproc (myop) <- `sqrt`
Run the unit tests attached to the functions via svUnit
softclassval.unittest()
softclassval.unittest()
invisibly TRUE
if the tests pass, NA
if svUnit is not
available. Stops if errors are encountered.
Claudia Beleites
And operators for the soft performance calculation. The predefined operators are:
Name | Definition | dev ? |
postproc ? |
hard ? |
Explanation |
gdl |
pmin (r, p) |
FALSE | FALSE | the Gödel-operator (weak conjunction) | |
luk |
pmax (r + p - 1, 0) |
FALSE | FALSE | Łukasiewicz-operator (strong conjunction) | |
prd |
r * p |
FALSE | FALSE | product operator | |
and |
r * p |
FALSE | TRUE | Boolean conjunction: accepts only 0 or 1, otherwise yields NA |
|
wMAE |
r * abs (r - p) |
TRUE | FALSE | for weighted mean absolute error | |
wRMAE |
r * abs (r - p) |
TRUE | sqrt | FALSE | for weighted root mean absolute error (bound for RMSE) |
##' wMSE |
r * (r - p)^2 |
TRUE | FALSE | for weighted mean squared error | |
wRMSE |
r * (r - p)^2 |
TRUE | sqrt | FALSE | for root weighted mean squared error |
strong(r, p) luk(r, p) weak(r, p) gdl(r, p) prd(r, p) and(r, p) wMAE(r, p) wRMAE(r, p) wMSE(r, p) wRMSE(r, p)
strong(r, p) luk(r, p) weak(r, p) gdl(r, p) prd(r, p) and(r, p) wMAE(r, p) wRMAE(r, p) wMSE(r, p) wRMSE(r, p)
r |
reference vector, matrix, or array with numeric values in [0, 1], for |
p |
prediction vector, matrix, or array with numeric values in [0, 1], for |
numeric of the same size as p
Claudia Beleites
see the literature in citation ("softclassval")
Performance measures: sens
ops <- c ("luk", "gdl", "prd", "and", "wMAE", "wRMAE", "wMSE", "wRMSE") ## make a nice table lastline <- function (f){ body <- body (get (f)) ## function body body <- deparse (body) body [length (body) - 1] ## last line is closing brace } data.frame (source = sapply (ops, lastline), dev = sapply (ops, function (f) dev (get (f))), hard = sapply (ops, function (f) hard (get (f))), postproc = I (lapply (ops, function (f) postproc (get (f)))) ) x <- softclassval:::v x luk (0.7, 0.8) ## The behaviour of the operators ## op (x, 1) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 1), x)) ## op (x, 0) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 0), x)) ## op (x, x) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, x), x)) ## Note that the deviation operators are not commutative ## (due to the weighting by reference) zapsmall ( cbind (sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (1, x), x)) - cbind (sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 1), x)) )
ops <- c ("luk", "gdl", "prd", "and", "wMAE", "wRMAE", "wMSE", "wRMSE") ## make a nice table lastline <- function (f){ body <- body (get (f)) ## function body body <- deparse (body) body [length (body) - 1] ## last line is closing brace } data.frame (source = sapply (ops, lastline), dev = sapply (ops, function (f) dev (get (f))), hard = sapply (ops, function (f) hard (get (f))), postproc = I (lapply (ops, function (f) postproc (get (f)))) ) x <- softclassval:::v x luk (0.7, 0.8) ## The behaviour of the operators ## op (x, 1) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 1), x)) ## op (x, 0) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 0), x)) ## op (x, x) cbind (x, sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, x), x)) ## Note that the deviation operators are not commutative ## (due to the weighting by reference) zapsmall ( cbind (sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (1, x), x)) - cbind (sapply (c ("luk", "gdl", "prd", "wMAE", "wRMAE", "wMSE", "wRMSE"), function (op, x) get (op) (x, 1), x)) )