Package: corpora 0.6

Stephanie Evert

corpora: Statistics and Data Sets for Corpus Frequency Data

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').

Authors:Stephanie Evert [cre, aut]

corpora_0.6.tar.gz
corpora_0.6.zip(r-4.5)corpora_0.6.zip(r-4.4)corpora_0.6.zip(r-4.3)
corpora_0.6.tgz(r-4.4-any)corpora_0.6.tgz(r-4.3-any)
corpora_0.6.tar.gz(r-4.5-noble)corpora_0.6.tar.gz(r-4.4-noble)
corpora_0.6.tgz(r-4.4-emscripten)corpora_0.6.tgz(r-4.3-emscripten)
corpora.pdf |corpora.html
corpora/json (API)
NEWS

# Install 'corpora' in R:
install.packages('corpora', repos = c('https://r-forge.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://r-forge.r-project.org/projects/sigil

Datasets:
  • BNCInChargeOf - Collocations of the phrase "in charge of"
  • BNCbiber - Biber's (1988) register features for the British National Corpus
  • BNCcomparison - Comparison of written and spoken noun frequencies in the British National Corpus
  • BNCdomains - Distribution of domains in the British National Corpus
  • BNCmeta - Metadata for the British National Corpus
  • BNCqueries - Per-text frequency counts for a selection of BNCweb corpus queries
  • BrownBigrams - Bigrams of adjacent words from the Brown corpus
  • BrownLOBPassives - Frequency counts of passive verb phrases in the Brown and LOB corpora
  • BrownPassives - Frequency counts of passive verb phrases in the Brown corpus
  • BrownStats - Basic statistics of texts in the Brown corpus
  • DistFeatBrownFam - Latent dimension scores from a distributional analysis of the Brown Family corpora
  • KrennPPV - German PP-Verb collocation candidates annotated by Brigitte Krenn
  • LOBPassives - Frequency counts of passive verb phrases in the LOB corpus
  • LOBStats - Basic statistics of texts in the LOB corpus
  • PassiveBrownFam - By-text frequencies of passive verb phrases in the Brown Family corpora.
  • VSS - A small corpus of very short stories with linguistic annotations

On CRAN:

2.83 score 34 scripts 428 downloads 1 mentions 19 exports 0 dependencies

Last updated 4 months agofrom:0e519e6902. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 29 2024
R-4.5-winNOTESep 29 2024
R-4.5-linuxNOTESep 29 2024
R-4.4-winNOTESep 29 2024
R-4.4-macNOTESep 29 2024
R-4.3-winNOTESep 29 2024
R-4.3-macNOTESep 29 2024

Exports:alpha.colbinom.pvalchisqchisq.pvalcolVectorcont.tablecorpora.palettefisher.pvalkeynessprop.cintqwrowVectorsample.dfsimulated.censussimulated.language.coursesimulated.wikipediastars.pvalz.scorez.score.pval

Dependencies:

Readme and manuals

Help Manual

Help pageTopics
corpora: Statistical Inference from Corpus Frequency Datacorpora-package corpora
P-values of the binomial test for frequency counts (corpora)binom.pval
Biber's (1988) register features for the British National CorpusBNCbiber
Comparison of written and spoken noun frequencies in the British National CorpusBNCcomparison
Distribution of domains in the British National Corpus (BNC)BNCdomains
Collocations of the phrase "in charge of" (BNC)BNCInChargeOf
Metadata for the British National Corpus (XML edition)BNCmeta
Per-text frequency counts for a selection of BNCweb corpus queriesBNCqueries
Bigrams of adjacent words from the Brown corpusBrownBigrams
Frequency counts of passive verb phrases in the Brown and LOB corporaBrownLOBPassives
Frequency counts of passive verb phrases in the Brown corpusBrownPassives
Basic statistics of texts in the Brown corpusBrownStats
Pearson's chi-squared statistic for frequency comparisons (corpora)chisq
P-values of Pearson's chi-squared test for frequency comparisons (corpora)chisq.pval
Build contingency tables for frequency comparison (corpora)cont.table
Colour palettes for linguistic visualization (corpora)alpha.col corpora.palette
Latent dimension scores from a distributional analysis of the Brown Family corporaDistFeatBrownFam
P-values of Fisher's exact test for frequency comparisons (corpora)fisher.pval
Compute best-practice keyness measures (corpora)keyness
German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)KrennPPV
Frequency counts of passive verb phrases in the LOB corpusLOBPassives
Basic statistics of texts in the LOB corpusLOBStats
By-text frequencies of passive verb phrases in the Brown Family corpora.PassiveBrownFam
Confidence interval for proportion based on frequency counts (corpora)prop.cint
Split string into words, similar to qw() in Perl (corpora)qw
Propagate vector to single-row or single-column matrix (corpora)colVector rowVector
Random samples from data frames (corpora)sample.df
Simulated census data for examples and illustrations (corpora)FakeCensus simulated.census
Simulated study on effectiveness of language course (corpora)LanguageCourse simulated.language.course
Simulated type and token counts for Wikipedia articles (corpora)simulated.wikipedia WackypediaStats
Show p-values as significance stars (corpora)stars.pval
A small corpus of very short stories with linguistic annotationsVSS
The z-score statistic for frequency counts (corpora)z.score
P-values of the z-score test for frequency counts (corpora)z.score.pval