Intro to Rcpp

PHS 7045: Advanced Programming

George G. Vega Yon, Ph.D.

The University of Utah

2024-09-24

Intro

Before we start

  1. You need to have Rcpp installed in your system:

    install.packages("Rcpp")
  2. You need to have a compiler

    • Windows: You can download Rtools from here.

    • MacOS: It is a bit complicated… Here are some options:

      • CRAN’s manual to get the clang, clang++, and gfortran compilers here.

      • A great guide by the coatless professor here

And that’s it!

R is great, but…

  • The problem:

    • As we saw, R is very fast… once vectorized

    • What to do if your model cannot be vectorized?

  • The solution: Use C/C++/Fotran! It works with R!

  • The problem to the solution: What R user knows any of those!?

  • R has had an API (application programming interface) for integrating C/C++ code with R for a long time.

  • Unfortunately, it is not very straightforward

Enter Rcpp

The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++

Why bother?

  • To draw ten numbers from a normal distribution with sd = 100.0 using R C API:

    SEXP stats = PROTECT(R_FindNamespace(mkString("stats")));
    SEXP rnorm = PROTECT(findVarInFrame(stats, install("rnorm")));
    SEXP call = PROTECT(
      LCONS( rnorm, CONS(ScalarInteger(10), CONS(ScalarReal(100.0),
      R_NilValue))));
    SET_TAG(CDDR(call),install("sd"));
    SEXP res = PROTECT(eval(call, R_GlobalEnv));
    UNPROTECT(4);
    return res;
  • Using Rcpp:

    Environment stats("package:stats");
    Function rnorm = stats["rnorm"];
    return rnorm(10, Named("sd", 100.0));

The Rcpp ecosystem

  • Rcpp (link): The main API that exposes C++ to R.

  • RcppArmadillo (link): A package that provides a high-level interface to the Armadillo C++ library for linear algebra (great for sparse matrices).

  • RcppEigen (link): A package that provides a high-level interface to the Eigen C++ library for linear algebra (great for file-backed matrices).

  • RcppParallel (link) and RcppThread (link): Packages that provide parallel computing capabilities.

  • You can find execellent examples using Rcpp in https://gallery.rcpp.org.

  • There’s also the cpp11 package (we won’t cover it in this course).

Example 1: Looping over a vector

1#include<Rcpp.h>

2using namespace Rcpp;

3// [[Rcpp::export]]
NumericVector add1(NumericVector x) {

4  NumericVector ans(x.size());

  for (int i = 0; i < x.size(); ++i)
    ans[i] = x[i] + 1;

  return ans;
}
1
We include the header using <Rcpp.h>, not "Rcpp.h".
2
This is to avoid typing Rcpp:: before every object.
3
This is a comment that tells Rcpp to export this function to R.
4
Create a vector of the same size as x.
add1(1:10)
 [1]  2  3  4  5  6  7  8  9 10 11

Example 1: Looping over a vector (vers 2)

Make it sweeter by adding some “sugar” (the Rcpp kind)

#include<Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector add1Cpp(NumericVector x) {
  return x + 1;
}
add1Cpp(1:10)
 [1]  2  3  4  5  6  7  8  9 10 11

How much fast?

Compared to this:

add1R <- function(x) {
  for (i in 1:length(x))
    x[i] <- x[i] + 1
  x
}

microbenchmark::microbenchmark(add1R(1:1000), add1Cpp(1:1000))
Unit: microseconds
            expr    min     lq     mean  median      uq      max neval
   add1R(1:1000) 50.459 51.709 81.72481 54.4175 56.3135 2722.751   100
 add1Cpp(1:1000)  2.125  4.376 12.56858  6.5220  8.7090  564.293   100

Obviously vectorization in R would be faster, but this is just an example.

C++ in R

Main differences between R and C++

  1. One is compiled, and the other interpreted

  2. Indexing objects: In C++ the indices range from 0 to (n - 1), whereas in R is from 1 to n.

  3. All expressions end with a ; (optional in R).

  4. In C++ object need to be declared, in R not (dynamic).

C++/Rcpp fundamentals: Types

Besides C-like data types (double, int, char, and bool), we can use the following types of objects with Rcpp:

  • Matrices: NumericMatrix, IntegerMatrix, LogicalMatrix, CharacterMatrix

  • Vectors: NumericVector, IntegerVector, LogicalVector, CharacterVector

  • And more!: DataFrame, List, Function, Environment

Parts of “an Rcpp program”

1#include<Rcpp.h>

2using namespace Rcpp

3// [[Rcpp::export]]
4NumericVector add1(NumericVector x) {

  NumericVector ans(x.size());

  for (int i = 0; i < x.size(); ++i)
    ans[i] = x[i] + 1;

  return ans;
}
1
The #include<Rcpp.h> is similar to library(...) in R, it brings in all that we need to write C++ code for Rcpp.
2
using namespace Rcpp is somewhat similar to detach(...). This simplifies syntax. If we don’t include this, all calls to Rcpp members need to be explicit, e.g., instead of typing NumericVector, we would need to type Rcpp::NumericVector
3
The // starts a comment in C++, in this case, the // [[Rcpp::export]] comment is a flag Rcpp uses to “export” this C++ function to R.
4
It is the first part of the function definition. We are creating a function that returns a NumericVector, is called add1, has a single input element named x that is also a NumericVector.

Parts of “an Rcpp program” (cont’d)

#include<Rcpp.h>

using namespace Rcpp 

// [[Rcpp::export]]
NumericVector add1(NumericVector x) {

5  NumericVector ans(x.size());

6  for (int i = 0; i < x.size(); ++i)
7    ans[i] = x[i] + 1;
  
8  return ans;
}
5
Here, we are declaring an object called ans, which is a NumericVector with an initial size equal to the size of x. Notice that .size() is called a “member function” of the x object, which is of class NumericVector.
6
We are declaring a for-loop (three parts):
7
ans[i] = x[i] + 1 set the i-th element of ans equal to the i-th element of x plus 1.
8
return ans exists the function returning the vector ans.

C++/Rcpp fundamentals

Now, where to execute/run this?

  • You can use the sourceCpp function from the Rcpp package to run .cpp scripts (this is what I do most of the time).
  • There’s also cppFunction, which allows compiling a single function.
  • Write an R package that works with Rcpp.

For now, let’s use the first option.

Example running .cpp file

Using the norm.cpp file (which you can download):

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
double normRcpp(NumericVector x) {
  
  return sqrt(sum(pow(x, 2.0)));
  
}

We can compile and obtain this function using this line Rcpp::sourceCpp("norm.cpp"):

Rcpp::sourceCpp("norm.cpp")
normRcpp(1:10)
[1] 19.62142
sqrt(sum((1:10)^2))
[1] 19.62142

Examlpe running .cpp file (cont’d)

Let’s do it again but see the verbose output:

Rcpp::sourceCpp("norm.cpp", verbose = TRUE)

Generated extern "C" functions 
--------------------------------------------------------


#include <Rcpp.h>
#ifdef RCPP_USE_GLOBAL_ROSTREAM
Rcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();
Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

// normRcpp
double normRcpp(NumericVector x);
RcppExport SEXP sourceCpp_1_normRcpp(SEXP xSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericVector >::type x(xSEXP);
    rcpp_result_gen = Rcpp::wrap(normRcpp(x));
    return rcpp_result_gen;
END_RCPP
}

Generated R functions 
-------------------------------------------------------

`.sourceCpp_1_DLLInfo` <- dyn.load('/tmp/Rtmp47BJnZ/sourceCpp-x86_64-pc-linux-gnu-1.0.13/sourcecpp_1922657a201c/sourceCpp_2.so')

normRcpp <- Rcpp:::sourceCppFunction(function(x) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_normRcpp')

rm(`.sourceCpp_1_DLLInfo`)

Building shared library
--------------------------------------------------------

DIR: /tmp/Rtmp47BJnZ/sourceCpp-x86_64-pc-linux-gnu-1.0.13/sourcecpp_1922657a201c

/usr/local/lib/R/bin/R CMD SHLIB -o 'sourceCpp_2.so' 'norm.cpp' 

Your turn

Now, get ready for some Rcpp action!

Problem 1: Adding vectors

  1. Using what you have just learned about Rcpp, write a function to add two vectors of the same length. Use the following template
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector add_vectors([declare vector 1], [declare vector 2]) {
  
  ... magick ...
  
  return [something];
}
  1. Now, we have to check for lengths. Use the stop function to make sure lengths match. Add the following lines in your code
if ([some condition])
  stop("an arbitrary error message :)");

Problem 2: Fibonacci series

Fibonacci Spiral

Each element of the sequence is determined by the following:

\[ F(n) = \left\{\begin{array}{ll} n, & \mbox{ if }n \leq 1\\ F(n - 1) + F(n - 2), & \mbox{otherwise} \end{array}\right. \]

Using recursions, we can implement this algorithm in R as follows:

fibR <- function(n) {
  if (n <= 1)
    return(n)
  fibR(n - 1) + fibR(n - 2)
}

# Is it working?
c(
  fibR(0), fibR(1), fibR(2),
  fibR(3), fibR(4), fibR(5),
  fibR(6)
)
[1] 0 1 1 2 3 5 8

Now, let’s translate this code into Rcpp and see how much speed boost we get!

Problem 2: Fibonacci series (solution)

Here is the full solution (hidden!):

Code
#include <Rcpp.h>

// [[Rcpp::export]]
int fibCpp(int n) {
  if (n <= 1)
    return n;
  
  return fibCpp(n - 1) + fibCpp(n - 2);
  
}
microbenchmark::microbenchmark(fibR(20), fibCpp(20))
Unit: microseconds
       expr      min        lq       mean    median       uq      max neval cld
   fibR(20) 2684.844 2771.6205 2991.03323 2828.3850 2992.139 4894.129   100  a 
 fibCpp(20)   18.204   19.0445   28.50812   20.3155   22.796  711.842   100   b

Exposing C++ classes to R

Exposing C++ classes to R

#include <Rcpp.h>

class Person {
public:
  Person(std::string name, int age) : 
    name(name), age(age) {}
  std::string get_name() const { return name; }
  int get_age() const { return age; }
  void print() const {Rprintf(
    "%s is %d years old\n", name.c_str(), age);
    }

private:
  std::string name;
  int age;
};

// [[Rcpp::export]]
Rcpp::XPtr<Person> create_person(
  std::string name, int age
  ) {
  return Rcpp::XPtr<Person>(new Person(name, age));
}

// [[Rcpp::export]]
std::string get_name(SEXP person) {
  Rcpp::XPtr<Person> p(person);
  return p->get_name();
}

// [[Rcpp::export]]
int print_person(SEXP person) {
  Rcpp::XPtr<Person> p(person);
  p->print();
  return 0;
}
  • We define a class Person with a constructor, three methods, and two private members.
  • The print method uses Rprintf to print to the R console.
  • (Wrapping the class) The create_person function creates a pointer (wrapper) to a Person object (XPtr<Person>). Notice the new keyword.
  • (Unwrapping the class) To unwrap the class, we use the XPtr constructor on an SEXP object (S expression, from when R was called S!). We use -> to access the class methods.

Now let’s use it!

Exposing C++ classes to R (cont’d)

# Construct a person
myperson <- create_person("Jorge", 30)

# Get the name
get_name(myperson)
## [1] "Jorge"

# Default print is not very informative
myperson
## <pointer: 0x55555ab3b4c0>
print_person(myperson) # Custom print
## Jorge is 30 years old
## [1] 0

Fin

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       Ubuntu 22.04.4 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Etc/UTC
 date     2024-09-24
 pandoc   3.3 @ /usr/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version date (UTC) lib source
 cachem           1.1.0   2024-05-16 [1] RSPM (R 4.4.0)
 cli              3.6.3   2024-06-21 [1] RSPM (R 4.4.0)
 devtools         2.4.5   2022-10-11 [1] RSPM (R 4.4.0)
 digest           0.6.36  2024-06-23 [1] RSPM (R 4.4.0)
 ellipsis         0.3.2   2021-04-29 [1] RSPM (R 4.4.0)
 evaluate         0.24.0  2024-06-10 [1] RSPM (R 4.4.0)
 fastmap          1.2.0   2024-05-15 [1] RSPM (R 4.4.0)
 fs               1.6.4   2024-04-25 [1] RSPM (R 4.4.0)
 glue             1.7.0   2024-01-09 [1] RSPM (R 4.4.0)
 htmltools        0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
 htmlwidgets      1.6.4   2023-12-06 [1] RSPM (R 4.4.0)
 httpuv           1.6.15  2024-03-26 [1] RSPM (R 4.4.0)
 jsonlite         1.8.8   2023-12-04 [1] RSPM (R 4.4.0)
 knitr            1.48    2024-07-07 [1] RSPM (R 4.4.0)
 later            1.3.2   2023-12-06 [1] RSPM (R 4.4.0)
 lifecycle        1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
 magrittr         2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
 memoise          2.0.1   2021-11-26 [1] RSPM (R 4.4.0)
 microbenchmark   1.5.0   2024-09-04 [1] RSPM (R 4.4.0)
 mime             0.12    2021-09-28 [1] RSPM (R 4.4.0)
 miniUI           0.1.1.1 2018-05-18 [1] RSPM (R 4.4.0)
 pkgbuild         1.4.4   2024-03-17 [1] RSPM (R 4.4.0)
 pkgload          1.4.0   2024-06-28 [1] RSPM (R 4.4.0)
 profvis          0.3.8   2023-05-02 [1] RSPM (R 4.4.0)
 promises         1.3.0   2024-04-05 [1] RSPM (R 4.4.0)
 purrr            1.0.2   2023-08-10 [1] RSPM (R 4.4.0)
 R6               2.5.1   2021-08-19 [1] RSPM (R 4.4.0)
 Rcpp             1.0.13  2024-07-17 [1] RSPM (R 4.4.0)
 remotes          2.5.0   2024-03-17 [1] RSPM (R 4.4.0)
 rlang            1.1.4   2024-06-04 [1] RSPM (R 4.4.0)
 rmarkdown        2.27    2024-05-17 [1] RSPM (R 4.4.0)
 sessioninfo      1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
 shiny            1.9.1   2024-08-01 [1] RSPM (R 4.4.0)
 stringi          1.8.4   2024-05-06 [1] RSPM (R 4.4.0)
 stringr          1.5.1   2023-11-14 [1] RSPM (R 4.4.0)
 urlchecker       1.0.1   2021-11-30 [1] RSPM (R 4.4.0)
 usethis          3.0.0   2024-07-29 [1] RSPM (R 4.4.0)
 vctrs            0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
 xfun             0.46    2024-07-18 [1] RSPM (R 4.4.0)
 xtable           1.8-4   2019-04-21 [1] RSPM (R 4.4.0)
 yaml             2.3.10  2024-07-26 [1] RSPM (R 4.4.0)

 [1] /usr/local/lib/R/site-library
 [2] /usr/local/lib/R/library

──────────────────────────────────────────────────────────────────────────────