Title: | Simple and Canonical Correspondence Analysis |
---|---|
Description: | Performs simple and canonical CA (covariates on rows/columns) on a two-way frequency table (with missings) by means of SVD. Different scaling methods (standard, centroid, Benzecri, Goodman) as well as various plots including confidence ellipsoids are provided. |
Authors: | Patrick Mair [aut, cre], Jan De Leeuw [aut] |
Maintainer: | Patrick Mair <[email protected]> |
License: | GPL-2 |
Version: | 1.1-4 |
Built: | 2024-11-18 06:10:32 UTC |
Source: | https://github.com/r-forge/psychor |
This function performs simple and canonical CA for possibly incomplete tables based on SVD. Different scaling methods for row and column scores are provided.
anacor(tab, ndim = 2, row.covariates, col.covariates, scaling = c("Benzecri","Benzecri"), ellipse = FALSE, eps = 1e-06) ## S3 method for class 'anacor' print(x,...)
anacor(tab, ndim = 2, row.covariates, col.covariates, scaling = c("Benzecri","Benzecri"), ellipse = FALSE, eps = 1e-06) ## S3 method for class 'anacor' print(x,...)
tab |
Data frame of dimension n times m with frequencies. Missings are coded as |
ndim |
Number of dimensions. |
row.covariates |
Matrix with n rows containing covariates for the row scores. |
col.covariates |
Matrix with m rows containing covariates for the column scores. |
scaling |
A vector with two elements. The first one corresponds to the method for row scaling, the second one for column scaling. Available scaling methods are |
ellipse |
If |
eps |
Convergence criterion for reconstitution algorithm. |
x |
Object of class |
... |
Additional arguments ignored. |
Missing values in tab
are imputed using the reconstitution algorithm. Setting scaling
to "standard"
leads to standard coordinates. Principal coordinates can be computed by means of Benzecri decomposition.
Goodman scaling is based on Fisher-Maung decomposition.
For large datasets it is suggested to set ellipse = FALSE
. If ellipse = TRUE
, make sure
that there are no rows and columns that have full 0 entries.
row.scores |
Scaled row scores. |
col.scores |
Scaled column scores. |
ndim |
Number of dimensions extracted. |
chisq |
Total chi-square value. |
chisq.decomp |
Chi-square decomposition across dimensions with p-values. |
singular.values |
Singular values without trivial solution. |
se.singular.values |
Standard errors for the singular values. |
stestmat |
Test results for singular values (only for |
left.singvec |
Left singular vectors without trivial solution. |
right.singvec |
Right singular vectors without trivial solution. |
eigen.values |
Eigenvalues for the fitted dimensionality. |
eigenvall |
Full vector of eigenvalues (principal inertias). |
datname |
Name of the dataset. |
tab |
Table with imputed frequencies in case of missings. |
row.covariates |
Matrix with row covariates. |
col.covariates |
Matrix with column covariates. |
scaling |
Scaling Method. |
bdmat |
List of matrices with observed and fitted Benzecri distances for rows and columns. |
rmse |
Root mean squared error of Bezencri distances (rows and columns). |
row.acov |
Covariance matrix for row scores. |
col.acov |
Covariance matrix for column scores. |
cancoef |
List containing canonical coefficients (CCA only). |
sitescores |
List containing the site scores (CCA only). |
isetcor |
List containing the intraset correlations (CCA only). |
Jan de Leeuw, Patrick Mair
De Leeuw, J. and Mair, P. (2009). Simple and Canonical Correspondence Analysis Using the R Package anacor. Journal of Statistical Software, 31(5), 1-18. https://www.jstatsoft.org/v31/i05/
## simple CA on Tocher data, symmetric standard coordinates data(tocher) res <- anacor(tocher) res ## simple CA on Tocher data, asymmetric coordinates res <- anacor(tocher, scaling = c("standard", "Benzecri")) res ## 2- and 5-dimensional solutions for bitterling data, Benzecri scaling data(bitterling) res1 <- anacor(bitterling, ndim = 2, scaling = c("Benzecri", "Benzecri")) res2 <- anacor(bitterling, ndim = 5, scaling = c("Benzecri", "Benzecri")) res1 res2 ## Canonical CA on Maxwell data, Goodman scaling data(maxwell) res <- anacor(maxwell$table, row.covariates = maxwell$row.covariates, scaling = c("Goodman", "Goodman")) res
## simple CA on Tocher data, symmetric standard coordinates data(tocher) res <- anacor(tocher) res ## simple CA on Tocher data, asymmetric coordinates res <- anacor(tocher, scaling = c("standard", "Benzecri")) res ## 2- and 5-dimensional solutions for bitterling data, Benzecri scaling data(bitterling) res1 <- anacor(bitterling, ndim = 2, scaling = c("Benzecri", "Benzecri")) res2 <- anacor(bitterling, ndim = 5, scaling = c("Benzecri", "Benzecri")) res1 res2 ## Canonical CA on Maxwell data, Goodman scaling data(maxwell) res <- anacor(maxwell$table, row.covariates = maxwell$row.covariates, scaling = c("Goodman", "Goodman")) res
This dataset concerns reproductive behavior of male bitterlings with data derived from 13 sequences using a moving time-window of size two.
data(bitterling)
data(bitterling)
A frequency tables with bitterling reproductive behavior at time point 1 (rows) and at time point 2 (columns).
jk
jerking
tu
turning beats
hb
head butting
chs
chasing
ft
fleeing
qu
quivering
le
leading
hdp
head down posture
sk
skimming
sn
snapping
chf
chafing
ffl
finflickering
Wiepkema, P.R. (1961). An ethological analysis of the reproductive behavior of the bitterling (rhodeus amarus bloch). Archives Neerlandais Zoologique, 14, 103-199.
data(bitterling)
data(bitterling)
Utility function to produce a Burt matrix out of a data-frame.
burtTable(data)
burtTable(data)
data |
Data frame to be converted. |
## sleeping bags data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat burtTable(sleeping_cat)
## sleeping bags data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat burtTable(sleeping_cat)
This utility function expands a matrix or data frame to an indicator supermatrix and
optionally converts this to a data frame again. By default NA
becomes zero and constant rows and columns are eliminated.
expandFrame(tab, clean = TRUE, zero = TRUE, returnFrame = TRUE)
expandFrame(tab, clean = TRUE, zero = TRUE, returnFrame = TRUE)
tab |
Data frame (factors). Missings are coded as |
clean |
If |
zero |
If |
returnFrame |
If |
## sleeping bags data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat expandFrame(sleeping_cat)
## sleeping bags data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat expandFrame(sleeping_cat)
Records of family faculties cross-classification of midparent height and adult children heigth in inches.
data(galton)
data(galton)
A frequency table with 11 times 14 height classifications in inches.
Galton, F. (1889). Natural Inheritance. London: MacMillan.
data(galton)
data(galton)
Table with occupational status of fathers versus occupational status of their sons for a sample of 3497 British families.
data(glass)
data(glass)
Rows represent occupation of fathers, columns occupation of sons.
PROF
professional and high administrative
EXEC
managerial and executive
HSUP
higher supervisory
LSUP
lower supervisory
SKIL
skilled manual and routine nonmanual
SEMI
semi-skilled manual
UNSK
unskilled manual
Glass, D.V. (1954). Social Mobility in Britain. Glencoe: Free Press.
data(glass) ## maybe str(glass) ; plot(glass) ...
data(glass) ## maybe str(glass) ; plot(glass) ...
This data set is a hypothetical data set originally contrived by Maxwell (1961) for demonstrating his method of discriminant analysis. The data consist of three criterion groups, schizophrenic , manic-depressive and anxiety state, and four binary predictor variables each indicating either presence (1) or absence (0) of a certain symptom. The four symptoms are anxiety suspicion, schizophrenic type of thought disorders, and delusions of guilt. These four binary variables were factorially combined to form 16 distinct patterns of symptoms (predictor patterns), and each of these patterns is identified with a row of the table, which contains the cross-classification of 620 patients according to the 16 patterns of symptoms and the three criterion groups.
data(maxwell)
data(maxwell)
A list with the frequency table as the first element and the row covariates as the second.
This dataset can be used for canonical CA. The binary predictor variables can be considered as row covariates.
Maxwell, A.E. (1961). Canonical variate analysis when the variables are dichotomous. Educational and Psychological Measurement, 21,259-271.
data(maxwell) ## maybe str(maxwell) ; plot(maxwell) ...
data(maxwell) ## maybe str(maxwell) ; plot(maxwell) ...
This function takes a data frame, a vector of types, a list of knot vectors, and a vector of orders. It returns a list of codings for the variables, i.e., crisp indicator, numerical version, or fuzzy indicator.
mkIndiList(data, type = rep("C",dim(data)[2]), knots, ord)
mkIndiList(data, type = rep("C",dim(data)[2]), knots, ord)
data |
Data frame to be converted. |
type |
If |
knots |
List of knot sequences for type-F coding. |
ord |
Vector with b-spline order for type-F coding. |
For the fuzzy coding, the variable values need to be provided as integers. Each list element contains a vector with knots (breaks) for each variable separately. The order is defined through the ord
argument as
vector (again, for each variable). See bsplines help file for more details.
## sleeping bags crisp and numeric data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat mkIndiList(sleeping_cat) ## crisp mkIndiList(sleeping_cat, type = rep("A", ncol(sleeping_cat))) ## numeric mkIndiList(sleeping_cat, type = c("A","A","A","C")) ## mixed ## artificial data fuzzy coding x1 <- sample(1:6, 20, replace = TRUE) x2 <- sample(1:3, 20, replace = TRUE) data <- data.frame(x1,x2) knots <- list(c(1,3,5,6), c(1,2,3)) ord <- c(2,1) mkIndiList(data, type = c("F","F"), knots = knots, ord = ord) ## Also mixed indicator versions are possible mkIndiList(data, type = c("C","F"), knots = knots, ord = ord)
## sleeping bags crisp and numeric data(sleeping) sleeping_cat <- sleeping temp_cat <- cut(sleeping$Temperature, c(-20, -1, 7), labels = c("warm", "cold")) sleeping_cat$Temperature <- temp_cat weight_cat <- cut(sleeping$Weight, c(700, 1100, 2200), labels = c("light", "heavy")) sleeping_cat$Weight <- weight_cat price_cat <- cut(sleeping$Price, c(100, 250, 350, 700), labels = c("cheap", "medium", "expensive")) sleeping_cat$Price <- price_cat sleeping_cat mkIndiList(sleeping_cat) ## crisp mkIndiList(sleeping_cat, type = rep("A", ncol(sleeping_cat))) ## numeric mkIndiList(sleeping_cat, type = c("A","A","A","C")) ## mixed ## artificial data fuzzy coding x1 <- sample(1:6, 20, replace = TRUE) x2 <- sample(1:3, 20, replace = TRUE) data <- data.frame(x1,x2) knots <- list(c(1,3,5,6), c(1,2,3)) ord <- c(2,1) mkIndiList(data, type = c("F","F"), knots = knots, ord = ord) ## Also mixed indicator versions are possible mkIndiList(data, type = c("C","F"), knots = knots, ord = ord)
These functions produce various plots for objects of class "anacor"
## S3 method for class 'anacor' plot(x, plot.type = "jointplot", plot.dim = c(1,2), col.row = "cadetblue", col.column = "coral1", catlabels = list(label.row = TRUE, label.col = TRUE, col.row = "cadetblue", col.column = "coral1", cex = 0.8, pos = 3), legpos = "top", arrows = c(FALSE, FALSE), conf = 0.95, wlines = 0, asp = 1, pch = 20, xlab, ylab, main, type, xlim, ylim, cex.axis2, ...)
## S3 method for class 'anacor' plot(x, plot.type = "jointplot", plot.dim = c(1,2), col.row = "cadetblue", col.column = "coral1", catlabels = list(label.row = TRUE, label.col = TRUE, col.row = "cadetblue", col.column = "coral1", cex = 0.8, pos = 3), legpos = "top", arrows = c(FALSE, FALSE), conf = 0.95, wlines = 0, asp = 1, pch = 20, xlab, ylab, main, type, xlim, ylim, cex.axis2, ...)
x |
Object of class |
plot.type |
Type of plot to be produced (details see below): 2-D and 3-D for |
plot.dim |
Vector of length 2 with Dimensions to be plotted. For |
col.row |
Color row categories |
col.column |
Color column categories |
catlabels |
Various parameter settings for labels |
legpos |
Position of the legend (for |
conf |
Ellipsoid confidence level for |
arrows |
Whether arrows from the origin to the row scores (first element) or column scores (second element) should be drawn. |
wlines |
For |
asp |
Aspect ratio. |
pch |
Symbol for plotting points. |
xlab |
Label x-axis. |
ylab |
Label y-axis. |
xlim |
Scale x-axis. |
ylim |
Scale y-axis. |
main |
Plot title. |
type |
Whether points, lines or both should be plotted; for |
cex.axis2 |
For |
... |
Additional graphical parameters. |
The following plot types are provided: "jointplot"
plots row and column scores into the same device, "rowplot"
and "colplot"
plot the row scores and column scores, respectively, in separate devices. For these types of plots 3-dimensional versions are provided. The graph plot is an unlabeled version of the joint plot where the points are connected by lines. Options are provided (wlines
) to steer the line thickness indicating the connection strength.
The regression plot ("regplot"
) provides two plots.
First, the unscaled solution is plotted. A frequency grid for the row
categories (x-axis) and column categories (y-axis) is produced.
The regression line is based on the category weighted means of the relative frequencies:
the blue line on the column-wise means on the x-axis and the column category on the y-axis,
the red line is based on the row categories on the x-axis and the row-wise means on the y-axis.
In a second device the scaled solution is plotted. The frequency grid is determined by
the row scores (x-axis) and the column scores(y-axis).
Now, instead of the row/column categories, the column scores (black line y-axis)
and the row scores (red line x-axis) are used.
The transformation plot ("transplot"
) plots the row/column categories against the row/column scores. The Benzecri plot ("benzplot"
) plots the observed distances against the fitted distances. It is assumed that the CA result is Benzecri scaled. The ordination diagram ("orddiag"
) for CCA produces a joint plot and includes the column and row covariates based on intraset correlations.
Jan de Leeuw, Patrick Mair
De Leeuw, J. and Mair, P. (2009). Simple and Canonical Correspondence Analysis Using the R Package anacor. Journal of Statistical Software, 31(5), 1-18. https://www.jstatsoft.org/v31/i05/
## symmetric map data(tocher) res <- anacor(tocher) plot(res, conf = NULL, main = "Symmetric Map") ## simple CA on Tocher data, asymmetric coordinates res <- anacor(tocher, scaling = c("standard", "Benzecri")) res ## Regression plots using Glass data data(glass) res <- anacor(glass) plot(res, plot.type = "regplot", xlab = "fathers occupation", ylab = "sons occupation") ## Benzecri Plots for bitterling data data(bitterling) res1 <- anacor(bitterling, ndim = 2, scaling = c("Benzecri", "Benzecri")) res2 <- anacor(bitterling, ndim = 5, scaling = c("Benzecri", "Benzecri")) res2 plot(res1, plot.type = "benzplot", main = "Benzecri Distances (2D)") plot(res2, plot.type = "benzplot", main = "Benzecri Distances (5D)") ## Column score plot,transformation plot, and ordination diagram for canonical CA data(maxwell) res <- anacor(maxwell$table, row.covariates = maxwell$row.covariates, scaling = c("Goodman", "Goodman")) res plot(res, plot.type = "colplot", xlim = c(-1.5,1), conf = NULL) plot(res, plot.type = "transplot", legpos = "topright") plot(res, plot.type = "orddiag")
## symmetric map data(tocher) res <- anacor(tocher) plot(res, conf = NULL, main = "Symmetric Map") ## simple CA on Tocher data, asymmetric coordinates res <- anacor(tocher, scaling = c("standard", "Benzecri")) res ## Regression plots using Glass data data(glass) res <- anacor(glass) plot(res, plot.type = "regplot", xlab = "fathers occupation", ylab = "sons occupation") ## Benzecri Plots for bitterling data data(bitterling) res1 <- anacor(bitterling, ndim = 2, scaling = c("Benzecri", "Benzecri")) res2 <- anacor(bitterling, ndim = 5, scaling = c("Benzecri", "Benzecri")) res2 plot(res1, plot.type = "benzplot", main = "Benzecri Distances (2D)") plot(res2, plot.type = "benzplot", main = "Benzecri Distances (5D)") ## Column score plot,transformation plot, and ordination diagram for canonical CA data(maxwell) res <- anacor(maxwell$table, row.covariates = maxwell$row.covariates, scaling = c("Goodman", "Goodman")) res plot(res, plot.type = "colplot", xlim = c(-1.5,1), conf = NULL) plot(res, plot.type = "transplot", legpos = "topright") plot(res, plot.type = "orddiag")
This data set provides 4 variables measured on 21 sleeping bags. The variables are temperature, weight, price, and material.
sleeping
sleeping
A data frame of dimenson 21 times 4.
Prediger, S. (1997). Symbolic objects in formal concept analysis. In G. Mineau, and A. Fall (eds.), Proceedings of the 2nd International Symposium on Knowledge, Retrieval, Use, and Storage for Efficiency.
data(sleeping) sleeping
data(sleeping) sleeping
Abundance of hunting spiders in a Dutch dune area.
data(glass)
data(glass)
A list of data frames containing the frequency table (28 observations) and the row covariates.
Table:
Alopacce
Abundance of Alopecosa accentuata.
Alopcune
Abundance of Alopecosa cuneata.
Alopfabr
Abundance of Alopecosa fabrilis.
Arctlute
Abundance of Arctosa lutetiana.
Arctperi
Abundance of Arctosa perita.
Auloalbi
Abundance of Aulonia albimana.
Pardlugu
Abundance of Pardosa lugubris.
Pardmont
Abundance of Pardosa monticola.
Pardnigr
Abundance of Pardosa nigriceps.
Pardpull
Abundance of Pardosa pullata.
Trocterr
Abundance of Trochosa terricola.
Zoraspin
Abundance of Zora spinimana.
Row covariates:
WaterCon
Log percentage of soil dry mass.
BareSand
Log percentage cover of bare sand.
FallTwig
Log percentage cover of fallen leaves and twigs.
CoveMoss
Log percentage cover of the moss layer.
CoveHerb
Log percentage cover of the herb layer.
ReflLux
Reflection of the soil surface with cloudless sky.
Van der Aart, P.J.M. and Smeek-Enserink, N. (1975). Correlations between distributions of hunting spiders (Lycosidae, Ctenidae) and environmental characteristics in a dune area. Netherlands Journal of Zoology, 25, 1–45.
data(spider) str(spider)
data(spider) str(spider)
This dataset provides a cross-classification of subjects according to their mental health status and parents' socio-economic status.
data(srole)
data(srole)
Mental health has four categories (rows): well, mild symptom formation, moderate symptom formation, and impaired. There are six categories of socio-economic status in the columns.
Srole, L., Langner, T.S., Michael, S.T., Opler, M.K., & Rennie, T.A.C. (1962). Mental health in the metropolis: The midtown Manhattan study. New York: McGraw-Hill.
data(srole) ## maybe str(srole) ; plot(srole) ...
data(srole) ## maybe str(srole) ; plot(srole) ...
Eye color and hair color cross-classification of 5387 Scottish school children.
data(tocher)
data(tocher)
Frequency table with eye color in the rows (blue, light, medium, dark) and hair color in the columns (fair, red, medium, dark, black).
Maung, K. (1941). Discriminant analysis of Tocher's eye colour data for Scottish school children. Annals of Eugenics, 11, 64-67.
data(tocher) ## maybe str(tocher) ; plot(tocher) ...
data(tocher) ## maybe str(tocher) ; plot(tocher) ...