I am interested in retieving machine readable meta information about R packages.
For example, when I go to CRAN I can see a short description about the package, before I download it: https://cran.r-project.org/web/packages/MASS/
I could not find any way to retrieve a different output from the CRAN server than HTML. I would like to avoid parsing HTML and instead somehow retrieve meta information about packages in a more convenient format (e.g., JSON).
I saw that each R package (at least to my knowledge) has a yaml-like (?) description text inside its source code package (the file is called DESCRIPTION
). However, so far I could only find this kind of description inside tar archives, which means that I would have to download the package before I can access its description.
Here an example of the DESCRIPTION
from the MASS package:
Package: MASS
Priority: recommended
Version: 7.3-55
Date: 2022-01-12
Revision: $Rev: 3559 $
Depends: R (>= 3.3.0), grDevices, graphics, stats, utils
Imports: methods
Suggests: lattice, nlme, nnet, survival
Authors@R: c(person("Brian", "Ripley", role = c("aut", "cre", "cph"),
email = "[email protected]"),
person("Bill", "Venables", role = "ctb"),
person(c("Douglas", "M."), "Bates", role = "ctb"),
person("Kurt", "Hornik", role = "trl",
comment = "partial port ca 1998"),
person("Albrecht", "Gebhardt", role = "trl",
comment = "partial port ca 1998"),
person("David", "Firth", role = "ctb"))
Description: Functions and datasets to support Venables and Ripley,
"Modern Applied Statistics with S" (4th edition, 2002).
Title: Support Functions and Datasets for Venables and Ripley's MASS
LazyData: yes
ByteCompile: yes
License: GPL-2 | GPL-3
URL: http://www.stats.ox.ac.uk/pub/MASS4/
Contact: <[email protected]>
NeedsCompilation: yes
Packaged: 2022-01-13 05:06:37 UTC; ripley
Author: Brian Ripley [aut, cre, cph],
Bill Venables [ctb],
Douglas M. Bates [ctb],
Kurt Hornik [trl] (partial port ca 1998),
Albrecht Gebhardt [trl] (partial port ca 1998),
David Firth [ctb]
Maintainer: Brian Ripley <[email protected]>
Repository: CRAN
Date/Publication: 2022-01-13 08:05:04 UTC
Any suggestions how to get that directly in a machine-readable and convenient form?
I tried to look it up, but search engines did not bring me any useful result so far.
Edit / Clarification: I am looking for a solution that does not rely on R, but rather a web API that is agnostic of the used framework / language for meta data retrieval.
CodePudding user response:
Does tools::CRAN_package_db()
have all the information you want? (See here for some discussion)
> dd <- tools::CRAN_package_db()
> names(dd)
[1] "Package" "Version"
[3] "Priority" "Depends"
[5] "Imports" "LinkingTo"
[7] "Suggests" "Enhances"
[9] "License" "License_is_FOSS"
[11] "License_restricts_use" "OS_type"
[13] "Archs" "MD5sum"
[15] "NeedsCompilation" "Additional_repositories"
[17] "Author" "Authors@R"
[19] "Biarch" "BugReports"
[21] "BuildKeepEmpty" "BuildManual"
[23] "BuildResaveData" "BuildVignettes"
[25] "Built" "ByteCompile"
[27] "Classification/ACM" "Classification/ACM-2012"
[29] "Classification/JEL" "Classification/MSC"
[31] "Classification/MSC-2010" "Collate"
[33] "Collate.unix" "Collate.windows"
[35] "Contact" "Copyright"
[37] "Date" "Description"
[39] "Encoding" "KeepSource"
[41] "Language" "LazyData"
[43] "LazyDataCompression" "LazyLoad"
[45] "MailingList" "Maintainer"
[47] "Note" "Packaged"
[49] "RdMacros" "StagedInstall"
[51] "SysDataCompression" "SystemRequirements"
[53] "Title" "Type"
[55] "URL" "UseLTO"
[57] "VignetteBuilder" "ZipData"
[59] "Published" "Path"
[61] "X-CRAN-Comment" "Reverse depends"
[63] "Reverse imports" "Reverse linking to"
[65] "Reverse suggests" "Reverse enhances"
CodePudding user response:
An acceptable solution is the METACRAN API that is available here: https://crandb.r-pkg.org/
CodePudding user response:
You can download https://cloud.r-project.org/src/contrib/PACKAGES.gz (or even in uncompressed form https://cloud.r-project.org/src/contrib/PACKAGES). It contains information about all currently available packages in DCF format, using some of the fields from DESCRIPTION files, and some others.
You don't need to use cloud.r-project.org
, any of the CRAN mirrors will do.