Title: | Polychoric and Polyserial Correlations |
---|---|
Description: | Computes polychoric and polyserial correlations by quick "two-step" methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases. |
Authors: | John Fox [aut, cre], Adrian Dusa [ctb] |
Maintainer: | John Fox <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.8-1 |
Built: | 2024-11-24 06:10:15 UTC |
Source: | https://github.com/r-forge/polycor |
hetcor
computes a heterogenous correlation matrix, consisting of Pearson product-moment
correlations between numeric variables, polyserial correlations between numeric
and ordinal variables, and polychoric correlations between ordinal variables.
The detectCores
function is imported from the parallel package and re-exported.
hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE) ## S3 method for class 'data.frame' hetcor(data, ML = FALSE, std.err = TRUE, use = c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE, ...) ## Default S3 method: hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE) ## S3 method for class 'hetcor' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'hetcor' as.matrix(x, ...) detectCores(all.tests=FALSE, logical=TRUE)
hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE) ## S3 method for class 'data.frame' hetcor(data, ML = FALSE, std.err = TRUE, use = c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE, ...) ## Default S3 method: hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), thresholds=FALSE) ## S3 method for class 'hetcor' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'hetcor' as.matrix(x, ...) detectCores(all.tests=FALSE, logical=TRUE)
data |
a data frame consisting of factors, ordered factors, logical variables, character variables, and/or numeric variables, or the first of several variables. |
... |
variables and/or arguments to be passed down. |
ML |
if |
std.err |
if |
bins |
number of bins to use for continuous variables in testing bivariate normality; the default is 4. |
pd |
if |
parallel |
if |
ncores |
the number of cores to use for parallel computations; the default is the number of physical cores detected. |
use |
if |
thresholds |
if |
x |
an object of class |
digits |
number of significant digits. |
all.tests |
logical, apply all known tests; default is |
logical |
if |
hetcor
returns an object of class "hetcor"
with the following components:
correlations |
the correlation matrix. |
type |
the type of each correlation: |
std.errors |
the standard errors of the correlations, if requested. |
n |
the number (or numbers) of observations on which the correlations are based. |
tests |
p-values for tests of bivariate normality for each pair of variables. |
NA.method |
the method by which any missing data were handled: |
ML |
|
thresholds |
optionally, according to the |
Be careful with character variables (as opposed to factors), the values of which are
ordered alphabetically. Thus, e.g., the values "disagree"
, "neutral"
,
"agree"
are ordered "agree"
, "disagree"
, "neutral"
.
Although the function reports standard errors for product-moment correlations, transformations (the most well known is Fisher's z-transformation) are available that make the approach to asymptotic normality much more rapid.
John Fox [email protected]
Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68-74 in S. Kotz and N. Johnson, eds., The Encyclopedia of Statistics, Volume 7. Wiley.
Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44, 443-460.
Rodriguez, R.N. (1982) Correlation. Pp. 193-204 in S. Kotz and N. Johnson, eds., The Encyclopedia of Statistics, Volume 2. Wiley.
Ghosh, B.K. (1966) Asymptotic expansion for the moments of the distribution of correlation coefficient. Biometrika 53, 258-262.
Olkin, I., and Pratt, J.W. (1958) Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics 29, 201-211.
polychor
, polyserial
, nearPD
,
detectCores
if(require(mvtnorm)){ set.seed(12345) R <- matrix(0, 4, 4) R[upper.tri(R)] <- runif(6) diag(R) <- 1 R <- cov2cor(t(R) %*% R) round(R, 4) # population correlations data <- rmvnorm(1000, rep(0, 4), R) round(cor(data), 4) # sample correlations } if(require(mvtnorm)){ x1 <- data[,1] x2 <- data[,2] y1 <- cut(data[,3], c(-Inf, .75, Inf)) y2 <- cut(data[,4], c(-Inf, -1, .5, 1.5, Inf)) data <- data.frame(x1, x2, y1, y2) hetcor(data) # Pearson, polychoric, and polyserial correlations, 2-step est. } if(require(mvtnorm)){ hetcor(x1, x2, y1, y2, ML=TRUE) # Pearson, polychoric, polyserial correlations, ML est. } ## Not run: hc <- hetcor(data, ML=TRUE) # parallel computation: hc.m <- hetcor(data, ML=TRUE, parallel=TRUE, ncores=min(2, detectCores())) hc.m all.equal(hc, hc.m) # error handling: data$y1[data$y2 == "(0.5,1.5]"] <- NA hetcor(data) ## End(Not run)
if(require(mvtnorm)){ set.seed(12345) R <- matrix(0, 4, 4) R[upper.tri(R)] <- runif(6) diag(R) <- 1 R <- cov2cor(t(R) %*% R) round(R, 4) # population correlations data <- rmvnorm(1000, rep(0, 4), R) round(cor(data), 4) # sample correlations } if(require(mvtnorm)){ x1 <- data[,1] x2 <- data[,2] y1 <- cut(data[,3], c(-Inf, .75, Inf)) y2 <- cut(data[,4], c(-Inf, -1, .5, 1.5, Inf)) data <- data.frame(x1, x2, y1, y2) hetcor(data) # Pearson, polychoric, and polyserial correlations, 2-step est. } if(require(mvtnorm)){ hetcor(x1, x2, y1, y2, ML=TRUE) # Pearson, polychoric, polyserial correlations, ML est. } ## Not run: hc <- hetcor(data, ML=TRUE) # parallel computation: hc.m <- hetcor(data, ML=TRUE, parallel=TRUE, ncores=min(2, detectCores())) hc.m all.equal(hc, hc.m) # error handling: data$y1[data$y2 == "(0.5,1.5]"] <- NA hetcor(data) ## End(Not run)
Computes the polychoric correlation (and its standard error) between two ordinal variables or from their contingency table, under the assumption that the ordinal variables dissect continuous latent variables that are bivariate normal. Either the maximum-likelihood estimator or a (possibly much) quicker “two-step” approximation is available. For the ML estimator, the estimates of the thresholds and the covariance matrix of the estimates are also available.
polychor(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, start, thresholds=FALSE)
polychor(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, start, thresholds=FALSE)
x |
a contingency table of counts or an ordered categorical variable; the latter can be numeric, logical, a factor, an ordered factor, or a character variable, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically. |
y |
if |
ML |
if |
control |
optional arguments to be passed to the |
std.err |
if |
maxcor |
maximum absolute correlation (to insure numerical stability). |
start |
optional start value(s): if a single number, start value for the correlation; if a list with the elements |
thresholds |
if |
The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the
thresholds for the two variables (;
) and
the population correlation (
). Here,
and
are respectively the number of levels
of
and
. The likelihood is maximized numerically using the
optim
function,
and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by optim
.
The two-step estimator is computed by first estimating the thresholds (
and
) separately from the marginal distribution of each variable. Then the
one-dimensional likelihood for
is maximized numerically, using
optim
if standard errors are
requested, or optimise
if they are not. The standard error computed treats the thresholds as fixed.
If std.err
or thresholds
is TRUE
,
returns an object of class "polycor"
with the following components:
type |
set to |
rho |
the polychoric correlation. |
row.cuts |
estimated thresholds for the row variable ( |
col.cuts |
estimated thresholds for the column variable ( |
var |
the estimated variance of the correlation, or, for the ML estimate, the estimated covariance matrix of the correlation and thresholds. |
n |
the number of observations on which the correlation is based. |
chisq |
chi-square test for bivariate normality. |
df |
degrees of freedom for the test of bivariate normality. |
ML |
|
Othewise, returns the polychoric correlation.
John Fox [email protected]
Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68–74 in S. Kotz and N. Johnson, eds., The Encyclopedia of Statistics, Volume 7. Wiley.
Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44, 443-460.
hetcor
, polyserial
, print.polycor
,
optim
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y) # 2-step estimate } if(require(mvtnorm)){ polychor(x, y, ML=TRUE, std.err=TRUE) # ML estimate }
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y) # 2-step estimate } if(require(mvtnorm)){ polychor(x, y, ML=TRUE, std.err=TRUE) # ML estimate }
Computes the polyserial correlation (and its standard error) between a quantitative variable and an ordinal variable, based on the assumption that the joint distribution of the quantitative variable and a latent continuous variable underlying the ordinal variable is bivariate normal. Either the maximum-likelihood estimator or a quicker “two-step” approximation is available. For the ML estimator the estimates of the thresholds and the covariance matrix of the estimates are also available.
polyserial(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, bins=4, start, thresholds=FALSE)
polyserial(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, bins=4, start, thresholds=FALSE)
x |
a numerical variable. |
y |
an ordered categorical variable; can be numeric, logical, a factor, an ordered factor, or a character variables, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically. |
ML |
if |
control |
optional arguments to be passed to the |
std.err |
if |
maxcor |
maximum absolute correlation (to insure numerical stability). |
bins |
the number of bins into which to dissect |
start |
optional start value(s): if a single number, start value for the correlation; if a list with the elements |
thresholds |
if |
The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the
thresholds for (
) and
the population correlation (
). The likelihood is maximized numerically using the
optim
function,
and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by optim
.
The two-step estimator is computed by first estimating the thresholds
()
from the marginal distribution of
. Then if the standard error of
is requested, the
one-dimensional likelihood for
is maximized numerically, using
optim
if standard errors are
requested; the standard error computed treats the thresholds as fixed. If the standard error isn't request,
is computed directly.
If std.err
or thresholds
is TRUE
,
returns an object of class "polycor"
with the following components:
type |
set to |
rho |
the polyserial correlation. |
cuts |
estimated thresholds for the ordinal variable ( |
var |
the estimated variance of the correlation, or, for the ML estimator, the estimated covariance matrix of the correlation and thresholds. |
n |
the number of observations on which the correlation is based. |
chisq |
chi-square test for bivariate normality. |
df |
degrees of freedom for the test of bivariate normality. |
ML |
|
Othewise, returns the polyserial correlation.
John Fox [email protected]
Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68–74 in S. Kotz and N. Johnson, eds., The Encyclopedia of Statistics, Volume 7. Wiley.
hetcor
, polychor
, print.polycor
,
optim
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polyserial(x, y) # 2-step estimate } if(require(mvtnorm)){ polyserial(x, y, ML=TRUE, std.err=TRUE) # ML estimate }
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polyserial(x, y) # 2-step estimate } if(require(mvtnorm)){ polyserial(x, y, ML=TRUE, std.err=TRUE) # ML estimate }
Some standard methods for objects of class polycor
, produced by
polychor
and polyserial
, including print
, summary
, coef
, and vcov
. The summary
method simply invokes the print
method.
## S3 method for class 'polycor' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'polycor' summary(object, ...) ## S3 method for class 'polycor' coef(object, correlation=TRUE, thresholds=TRUE, ...) ## S3 method for class 'polycor' vcov(object, correlation=TRUE, thresholds=TRUE, ...)
## S3 method for class 'polycor' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'polycor' summary(object, ...) ## S3 method for class 'polycor' coef(object, correlation=TRUE, thresholds=TRUE, ...) ## S3 method for class 'polycor' vcov(object, correlation=TRUE, thresholds=TRUE, ...)
x , object
|
an object of class |
digits |
number of significant digits to be printed. |
correlation |
return the estimated correlation or sampling variance of the correlation. |
thresholds |
return the estimated thresholds or sampling variances/covariances of the thresholds. |
... |
pass arguments from |
John Fox [email protected]
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) print(polychor(x, y, ML=TRUE, std.err=TRUE), digits=3) # polychoric correlation, ML estimate } if(require(mvtnorm)){ coef(polychor(x, y, ML=TRUE, std.err=TRUE)) } if(require(mvtnorm)){ vcov(polychor(x, y, ML=TRUE, std.err=TRUE)) }
if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) print(polychor(x, y, ML=TRUE, std.err=TRUE), digits=3) # polychoric correlation, ML estimate } if(require(mvtnorm)){ coef(polychor(x, y, ML=TRUE, std.err=TRUE)) } if(require(mvtnorm)){ vcov(polychor(x, y, ML=TRUE, std.err=TRUE)) }