Package: tm 0.7-14

Kurt Hornik

tm: Text Mining Package

A framework for text mining applications within R.

Authors:Ingo Feinerer [aut], Kurt Hornik [aut, cre], Artifex Software, Inc. [ctb, cph]

tm_0.7-14.tar.gz
tm_0.7-14.zip(r-4.5)tm_0.7-14.zip(r-4.4)tm_0.7-14.zip(r-4.3)
tm_0.7-14.tgz(r-4.4-x86_64)tm_0.7-14.tgz(r-4.4-arm64)tm_0.7-14.tgz(r-4.3-x86_64)tm_0.7-14.tgz(r-4.3-arm64)
tm_0.7-14.tar.gz(r-4.5-noble)tm_0.7-14.tar.gz(r-4.4-noble)
tm_0.7-14.tgz(r-4.4-emscripten)tm_0.7-14.tgz(r-4.3-emscripten)
tm.pdf |tm.html
tm/json (API)
NEWS

# Install 'tm' in R:
install.packages('tm', repos = c('https://r-forge.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://r-forge.r-project.org/projects/tm

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:
  • acq - 50 Exemplary News Articles from the Reuters-21578 Data Set of Topic acq
  • crude - 20 Exemplary News Articles from the Reuters-21578 Data Set of Topic crude

On CRAN:

12.63 score 96 packages 14k scripts 46k downloads 94 mentions 79 exports 7 dependencies

Last updated 3 months agofrom:a8e682042e. Checks:OK: 2 ERROR: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 17 2024
R-4.5-win-x86_64ERROROct 17 2024
R-4.5-linux-x86_64OKOct 17 2024
R-4.4-win-x86_64ERROROct 17 2024
R-4.4-mac-x86_64ERROROct 17 2024
R-4.4-mac-aarch64ERROROct 17 2024
R-4.3-win-x86_64ERROROct 17 2024
R-4.3-mac-x86_64ERROROct 17 2024
R-4.3-mac-aarch64ERROROct 17 2024

Exports:as.DocumentTermMatrixas.TermDocumentMatrixas.VCorpusBoost_tokenizercontent_transformerCorpusDataframeSourceDirSourceDocsDocumentTermMatrixDublinCoreDublinCore<-eoifindAssocsfindFreqTermsfindMostFreqTermsFunctionGeneratorgetElemgetMetagetReadersgetSourcesgetTokenizersgetTransformationsHeaps_plotinspectMC_tokenizernDocsnTermsPCorpuspGetElemPlainTextDocumentread_dtm_Blei_et_alread_dtm_MCreadDataframereadDOCreaderreadPDFreadPlainreadRCV1readRCV1asPlainreadReut21578XMLreadReut21578XMLasPlainreadTaggedreadXMLremoveNumbersremovePunctuationremoveSparseTermsremoveWordsscan_tokenizerSimpleCorpusSimpleSourcestemCompletionstemDocumentstepNextstopwordsstripWhitespaceTermDocumentMatrixtermFreqTermstm_filtertm_indextm_maptm_parLapplytm_parLapply_enginetm_reducetm_term_scoreURISourceVCorpusVectorSourceweightBinWeightFunctionweightSMARTweightTfweightTfIdfwriteCorpusXMLSourceXMLTextDocumentZipf_plotZipSource

Dependencies:BHcliNLPRcpprlangslamxml2

Extensions

Rendered fromextensions.Rnwusingutils::Sweaveon Oct 17 2024.

Last update: 2017-09-10
Started: 2012-01-14

Introduction to the tm Package

Rendered fromtm.Rnwusingutils::Sweaveon Oct 17 2024.

Last update: 2024-08-13
Started: 2012-01-14

Readme and manuals

Help Manual

Help pageTopics
50 Exemplary News Articles from the Reuters-21578 Data Set of Topic acqacq
Content Transformerscontent_transformer
CorporaCorpus
20 Exemplary News Articles from the Reuters-21578 Data Set of Topic crudecrude
Data Frame SourceDataframeSource
Directory SourceDirSource
Access Document IDs and TermsDocs nDocs nTerms Terms
Find Associations in a Term-Document MatrixfindAssocs findAssocs.DocumentTermMatrix findAssocs.TermDocumentMatrix
Find Frequent TermsfindFreqTerms
Find Most Frequent TermsfindMostFreqTerms findMostFreqTerms.DocumentTermMatrix findMostFreqTerms.TermDocumentMatrix findMostFreqTerms.term_frequency
Read Document-Term Matricesread_dtm_Blei_et_al read_dtm_MC
TokenizersgetTokenizers
TransformationsgetTransformations
Parallelized 'lapply'tm_parLapply tm_parLapply_engine
Inspect Objectsinspect inspect.PCorpus inspect.TermDocumentMatrix inspect.TextDocument inspect.VCorpus
Metadata ManagementDublinCore DublinCore<- meta meta.PCorpus meta.PlainTextDocument meta.SimpleCorpus meta.VCorpus meta.XMLTextDocument meta<-.PCorpus meta<-.PlainTextDocument meta<-.SimpleCorpus meta<-.VCorpus meta<-.XMLTextDocument
Permanent CorporaPCorpus
Plain Text DocumentsPlainTextDocument
Visualize a Term-Document Matrixplot.TermDocumentMatrix
Read In a Text Document from a Data FramereadDataframe
Read In a MS Word DocumentreadDOC
ReadersFunctionGenerator getReaders Reader
Read In a PDF DocumentreadPDF
Read In a Text DocumentreadPlain
Read In a Reuters Corpus Volume 1 DocumentreadRCV1 readRCV1asPlain
Read In a Reuters-21578 XML DocumentreadReut21578XML readReut21578XMLasPlain
Read In a POS-Tagged Word Text DocumentreadTagged
Read In an XML DocumentreadXML
Remove Numbers from a Text DocumentremoveNumbers removeNumbers.character removeNumbers.PlainTextDocument
Remove Punctuation Marks from a Text DocumentremovePunctuation removePunctuation.character removePunctuation.PlainTextDocument
Remove Sparse Terms from a Term-Document MatrixremoveSparseTerms
Remove Words from a Text DocumentremoveWords removeWords.character removeWords.PlainTextDocument
Simple CorporaSimpleCorpus
Sourcesclose.SimpleSource eoi eoi.SimpleSource getElem getElem.DataframeSource getElem.DirSource getElem.URISource getElem.VectorSource getElem.XMLSource getMeta getMeta.DataframeSource getSources length.SimpleSource open.SimpleSource pGetElem pGetElem.DataframeSource pGetElem.DirSource pGetElem.URISource pGetElem.VectorSource reader reader.SimpleSource SimpleSource Source stepNext stepNext.SimpleSource
Complete StemsstemCompletion
Stem WordsstemDocument stemDocument.character stemDocument.PlainTextDocument
Stopwordsstopwords
Strip Whitespace from a Text DocumentstripWhitespace stripWhitespace.PlainTextDocument
Term-Document Matrixas.DocumentTermMatrix as.TermDocumentMatrix DocumentTermMatrix TermDocumentMatrix
Term Frequency VectortermFreq
Text DocumentsTextDocument
Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectorsc.TermDocumentMatrix c.term_frequency c.TextDocument c.VCorpus
Filter and Index Functions on Corporatm_filter tm_filter.PCorpus tm_filter.SimpleCorpus tm_filter.VCorpus tm_index tm_index.PCorpus tm_index.SimpleCorpus tm_index.VCorpus
Transformations on Corporatm_map tm_map.PCorpus tm_map.SimpleCorpus tm_map.VCorpus
Combine Transformationstm_reduce
Compute Score for Matching Termstm_term_score tm_term_score.DocumentTermMatrix tm_term_score.PlainTextDocument tm_term_score.TermDocumentMatrix tm_term_score.term_frequency
TokenizersBoost_tokenizer MC_tokenizer scan_tokenizer
Uniform Resource Identifier SourceURISource
Volatile Corporaas.VCorpus VCorpus
Vector SourceVectorSource
Weight BinaryweightBin
Weighting FunctionWeightFunction
SMART WeightingsweightSMART
Weight by Term FrequencyweightTf
Weight by Term Frequency - Inverse Document FrequencyweightTfIdf
Write a Corpus to DiskwriteCorpus
XML SourceXMLSource
XML Text DocumentsXMLTextDocument
Explore Corpus Term Frequency CharacteristicsHeaps_plot Zipf_plot
ZIP File SourceZipSource