We won’t discuss GitHub in depth. I’ll assume that the user already knows the basics of Git(Hub): Creating a repo online and start using it locally. (A good reference is “Pro Git” and “Happy Git and GitHub for the useR”)
We won’t discuss software engineering/design; we’ll look at some tools and dev cycles. (For that, you have “CodeComplete”)
What we’ll be using today
devtools: “Package development tools for R” (read more).
roxygen2: “A ‘Doxygen’-like in-source documentation system for Rd, collation, and ‘NAMESPACE’ files” (read more).
covr: (An R package for) “Test Coverage for Packages” (read more).
RStudio: “RStudio is an integrated development environment (IDE) for R” (read more).
valgrind: (if you use C/C++ code) “a memory error detector” (read more).
Why ‘spend’ time writing an R package?
To name a few:
Easy way to share code: Type install.packages and voila!
Already standardized: So you don’t need to think about how to structure it.
CRAN checks everything for you: Force yourself to code things right.
Challenges in Irreproducible Research (nature topic)
Not so obvious examples
Implementing a pipeline: A collection of functions that are used to process data in a specific way. e.g., CDC’s epinow2-pipeline (link).
Branding: You can create an extension for Rmarkdown, quarto, or ggplot2 (to name a few) that contains your company’s colors and logos. e.g., University of Southern California’s IMAGE grant (link) (see also this for quarto formats.)
Shiny apps: So it is easier to share them with other users, e.g., the epiworldRShiny implements a shiny-version of the epiworldR R package.
Sharing Data: There are multiple R packages that only contain data, for instance, the gapminder package (link).
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.
What’s an R package, anyway? (cont.)
Folders
R Where the R scripts live, e.g., addnums.r, boring-pkg.r.
tests Where the tests live, e.g., testthat/test-addnums.r (using testthat or tinytest).
vignettes Where vignettes live, e.g., vignettes/mode_details_on_addnums.rmd.
data Where the data lives, e.g., fakedata.rda, aneattable.csv.
man Where the manuals live, e.g., addnums.Rd, boringpkg.Rd.
And files
DESCRIPTION The metadata of the package: name, description, version, author(s), dependencies, etc.
NAMESPACE Declares the name of the objects (functions mainly) that will be part of the package (and will be visible to the user)
Set up the structure: Create folders and files R/, data/, src/, tests/, man/, vignettes/, DESCRIPTION, NAMESPACE(?)
Code! For f in F do
Write f
Document f: what it does, what the inputs are, what it returns, and examples.
Add some tests: is it doing what it is supposed to do?
R CMD Check: Will CRAN and the rest of the functions ‘like’ it?
Commit your changes!: Update ChangeLog+news.md, and commit (so travis and friends run)!
next f
Submit your package to CRAN/Bioconductor (read more)
Handson
Step 1: Set up the structure
You have several methods: package.skeleton(), Rstudio’s “New Project”, devtools::new(), etc. You can also use usethis::create_package().
library(usethis)# I'm using a temp folder, but you should use something else like# "C:/Users/vegayon/Documents/boringpkg"tmp_folder <-tempfile("boringpkg") # Creating the packageusethis::create_package(path = tmp_folder,roxygen =TRUE,rstudio =TRUE,fields =list(Package ="boringpkg",Version ="0.0.0.9000",Title ="A boring package",Description ="A boring package" ))# We can inspect what things it created:list.files(tmp_folder, recursive =TRUE)
Try it out!
Step 1: Set up the structure (cont. 1)
And, using the usethis package, we can add some extras. Locally:
# Creating a README for the projectusethis::use_readme_rmd()# Infrastructure for testingusethis::use_testthat()# A LICENSE file (required by CRAN)# The line "license: MIT + file" in the DESCRIPTION fileusethis::use_mit_license(copyright_holder ="George G. Vega Yon")usethis::use_news_md()
If you are planning to use GitHub:
# Some basic CI infrastructureusethis::use_github_action_check_standard()
Try it out! Other things we are not setting up: Codecoverage (checkout the usethis::use_coverage() function).
Step 1: Set up the structure (cont. 2)
What did happen?
use_readme_rmd(): Creates a readme file that will be in the project’s main folder. Think about it as the home page of your project.
use_testthat(): The basic infrastructure for testing packages will be created.
use_mit_license(copyright_holder = "George G. Vega Yon"): Creates the LICENSE file and puts it under ‘George G. Vega Yon’.
use_news_md(): Creates the news.md file which us used for tracking changes and communicating them to the users (e.g. netdiffuseR)
Now that we have the structure, we can start coding!
Step 2.1 and 2.2: Write f and Document it
Using roxygen2 is very straight forward. For our fist pice of code, we create the R/addnums.r file (right).
Type devtools::document(), or press Ctrl + Shift + D (RStudio will: create the manual and the NAMESPACE). Make sure you activate this option in RStudio (not the default).
Notice that the output is defined using S3-type objects (read more)
#' The title of -addnums-#'#' Here is a brief description#' #' @param a Numeric scalar. A brief description.#' @param b Numeric scalar. A brief description.#' #' @details Computes the sum of \code{x} and \code{y}.#' @return A list of class \code{boringpkg_addnums}:#' \item{a}{Numeric scalar.}#' \item{b}{Numeric scalar.}#' \item{ab}{Numeric scalar. the sum of \code{a} and \code{b}}#' @examples#' foo(1, 2)#' #' @exportaddnums <-function(a, b) { ans <- a + bstructure(list(a = a, b = b, ab = ans) , class ="boringpkg_addnums")}
Detour: S3 classes
Here we are leveraging S3 classes. In R, the S3 is a simple way to do object oriented programming. The way it works:
Objects should have a class attribute (e.g., class(x) <- "myclass").
You can then define “methods” for that class, e.g., print.myclass and plot.myclass.
These are valid as long as there is a generic function (e.g., print and plot) that can handle the class.
You can inspect the methods using the methods() function, e.g.,
[1] add1 anova coerce confint cooks.distance
[6] deviance drop1 effects extractAIC family
[11] formula influence initialize logLik model.frame
[16] nobs predict print profile residuals
[21] rstandard rstudent show sigma slotsFromS3
[26] summary vcov weights
see '?methods' for accessing help and source code
Detour: S3 classes (cont.)
You can also define your own methods, e.g.,:
my_generic <-function(x) UseMethod("my_generic")my_generic.default <-function(x) "I don't know how to handle this"my_generic.numeric <-function(x) "I know how to handle this"my_generic("a")
[1] "I don't know how to handle this"
my_generic(1)
[1] "I know how to handle this"
Step 2.1 and 2.2: Write f and Document it (cont. 1)
Here we are defining a method for plot.boringpkg_addnums. Notice that the plot function already has a generic method, which is why we only need to define the method for boringpkg_addnums.
#' @rdname addnums#' @export#' @param x An object of class \code{boringpkg_addnums}.#' @param y Ignored.#' @param ... Further arguments passed to#' \code{\link[graphics:plot.window]{plot.window}}.plot.boringpkg_addnums <-function(x, y =NULL, ...) { graphics::plot.new() graphics::plot.window(xlim =range(unlist(c(0,x))), ylim =c(-.5,1)) graphics::axis(1)with(x, graphics::segments(0, 1, ab, col ="blue", lwd=3))with(x, graphics::segments(0, 0, a, col ="green", lwd=3))with(x, graphics::segments(a, .5, a + b, col ="red", lwd=3)) graphics::legend("bottom", col =c("blue", "green", "red"), legend =c("a+b", "a", "b"), bty ="n", ncol =3, lty =1, lwd=3)}
Here, we are adding a function that is documented in addnums. The plot method (left).
You’ll need to update the man/*Rd every time that you add new roxygen content. Press Ctrl + Shift + D, and RStudio will do it for you (or enter devtools::document() which calls roxygen2::roxygenize()).
Notice that functions from foreign packages are called using the :: operator (read more here and here). Not a requirement, but makes the code easier to maintain (and less error-prone… because of R’s scoping rules).
Step 2.1 and 2.2: Write f and Document it (cont. 2)
Since we added foreign functions, we need to add them to the NAMESPACE and to the DESCRIPTION files:
R/boring-pkg.r
#' @importFrom graphics plot.new plot.window axis legendNULL#' boringpkg#'#' A (not so) boring collection of functions#'#' @description We add stuff up... You can access to the project#' website at \url{https://github.com/USCbiostats/software-dev/tree/master/rpkgs/boringpkg}#'#' @docType package#' @name boringpkg#'#' @author George G. Vega YonNULL
DESCRIPTION
Package: boringpkgType: PackageTitle: What the Package Does (Title Case)Version:0.1.0Author: Who wrote itMaintainer: The package maintainer <yourself@somewhere.net>Description: More about what it does (maybe more than one line) Use four spaces when indenting paragraphs within the Description.License: What license is it under?Encoding: UTF-8LazyData:trueSuggests: testthat, covrRoxygenNote:5.0.1Imports: graphics
The Suggests and RoxygenNote fields were added automagically by devtools.
Step 2.3: Add some tests
Test are run every time that you run R CMD check
In the tests/testthat/ dir, add/edit a source file with tests, e.g. test-addnums.r
test_that("addnums(a, b) = a+b", {# Preparing the test a <-1 b <--2# Calling the function ans0 <- a+b ans1 <-addnums(a, b)# Are these equal?expect_equal(ans0, ans1$ab)})test_that("Plot returns -boringpkg_foo-", {expect_s3_class(plot(addnums(1,2)), "boringpkg_foo" ) # This will generate an error as the function plot.boringpkg_addnum# does not return an object of that class.})
You can run the tests using Ctrl + Shift + t
Furthermore, tests are used to evaluate code coverage.
Step 2.4: R CMD check
Running R CMD check is easy with RStudio
If you don’t have C/C++ code Just press Ctrl + Shift + E
If you have C/C++ code, use R CMD Check with valgrind (check for segfaults)
$ R CMD build boringpkg/$ R CMD check --as-cran --use-valgrind boringpkg*.tar.gz
You can ask RStudio to use valgrind too.
For an extensive list of the checks, see the section 1.3.1 Checking packagesin the “Writing R extensions” manual.
How does the output of R CMD check looks like?
==> devtools::check()Updating boringpkg documentationLoading boringpkgSetting env vars ---------------------------------------------------------------CFLAGS : -Wall -pedanticCXXFLAGS: -Wall -pedanticBuilding boringpkg --------------------------------------------------------------'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet \ CMD build \ '/home/vegayon/Dropbox/usc/core_project/CodingStandards/rpkgs/boringpkg' \ --no-resave-data --no-manual * checking for file ‘/home/vegayon/Dropbox/usc/core_project/CodingStandards/rpkgs/boringpkg/DESCRIPTION’ ... OK* preparing ‘boringpkg’:* checking DESCRIPTION meta-information ... OK* checking for LF line-endings in source and make files* checking for empty or unneeded directories* building ‘boringpkg_0.1.999.tar.gz’Setting env vars ---------------------------------------------------------------_R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE_R_CHECK_CRAN_INCOMING_ : FALSE_R_CHECK_FORCE_SUGGESTS_ : FALSEChecking boringpkg --------------------------------------------------------------'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet \ CMD check '/tmp/RtmpBYGSto/boringpkg_0.1.999.tar.gz' --as-cran --timings \ --no-manual * using log directory ‘/home/vegayon/Dropbox/usc/core_project/CodingStandards/rpkgs/boringpkg.Rcheck’* using R version 3.3.2 (2016-10-31)* using platform: x86_64-pc-linux-gnu (64-bit)* using session charset: UTF-8* using options ‘--no-manual --as-cran’* checking for file ‘boringpkg/DESCRIPTION’ ... OK* checking extension type ... Package* this is package ‘boringpkg’ version ‘0.1.999’* package encoding: UTF-8* checking package namespace information ... OK* checking package dependencies ... OK* checking if this is a source package ... OK* checking if there is a namespace ... OK* checking for executable files ... OK* checking for hidden files and directories ... OK* checking for portable file names ... OK* checking for sufficient/correct file permissions ... OK* checking whether package ‘boringpkg’ can be installed ... OK* checking installed package size ... OK* checking package directory ... OK* checking DESCRIPTION meta-information ... OK* checking top-level files ... OK* checking for left-over files ... OK* checking index information ... OK* checking package subdirectories ... OK* checking R files for non-ASCII characters ... OK* checking R files for syntax errors ... OK* checking whether the package can be loaded ... OK* checking whether the package can be loaded with stated dependencies ... OK* checking whether the package can be unloaded cleanly ... OK* checking whether the namespace can be loaded with stated dependencies ... OK* checking whether the namespace can be unloaded cleanly ... OK* checking loading without being on the library search path ... OK* checking dependencies in R code ... OK* checking S3 generic/method consistency ... OK* checking replacement functions ... OK* checking foreign function calls ... OK* checking R code for possible problems ... OK* checking Rd files ... OK* checking Rd metadata ... OK* checking Rd line widths ... OK* checking Rd cross-references ... OK* checking for missing documentation entries ... OK* checking for code/documentation mismatches ... OK* checking Rd \usage sections ... OK* checking Rd contents ... OK* checking for unstated dependencies in examples ... OK* checking examples ... OK* checking for unstated dependencies in ‘tests’ ... OK* checking tests ... Running ‘testthat.R’ OK* DONEStatus: OKR CMD check results0 errors | 0 warnings | 0 notesR CMD check succeeded
Step 2.4: R CMD check (cont. 1)
R packages can be shared by “building” them. For example, the following command:
R CMD build ~/Documents/boringpkg
Will generate package the R package into a file named boringpkg_0.1.0.tar.gz that can later be distributed. Users can install this package either through command line:
Add the new files to the tree, commit your changes, and push them to Github
$ git add R/newfile.r man/newfile.Rd$ git commit -a -m "Adding new function"$ git push
Once uploaded, I recommend checking Travis and AppVeyor to see if everything went well.
Step 3: Submit your package to CRAN/Bioconductor
You can release your package using devtools::release() function.
The devtools package was built to make life easier. Such is it’s creator’s confidence in the release function that it comes with a warranty!
If a devtools bug causes one of the CRAN maintainers to treat you impolitely, I will personally send you a handwritten apology note. Please forward me the email and your address, and I’ll get a card in the mail.
Dev Cycle (once again!)
The most important step
Code! For f in F do
Write f
Document f: what it does, what the inputs are, what it returns, examples.
Add some tests: is it doing what is supposed to do?
R CMD Check: Will CRAN, and the rest of the functions, ‘like’ it?
Commit your changes!: Update ChangeLog+news.md, and commit (so travis and friends run)!