<- 1
x stopifnot(is.character(x))
Error: is.character(x) is not TRUE
In the RStudio (aka Posit) menu bar, notice the Debug
and Profile
drop-down menus. These are quick starting points for finding code bugs/bottlenecks. Studying the below chapters will give insight into finding where and how to improve code efficiency.
Required reading from Advanced R by Hadley Wickham:
Supplemental reading:
Norman Matloof’s Ch. 13.1-13.3.5: Debugging and Ch. 14: Performance Enhancement: Speed and Memory is slightly outdated by recommending RProf
to measure performance (as compared to profvis
used in RStudio) but has good content on speed and memory allocation.
Review the table of vectorization commands
Perform calculations one-at-a-time ONLY if necessary. Pause to think if you’re performing the same operation over and over again.
“Urn 1 contains ten blue marbles and eight yellow ones. In urn 2, the mixture is six blue and six yellow. We draw a marble at random from urn 1, transfer it to urn 2, and then draw a marble at random from urn 2. What is the probability that that second marble is blue? This is easy to find analytically, but we’ll use simulation.” Create a function that generates 100K replicates and returns the estimated probability.
When are loops faster than *apply (and vice-versa)
We saw some solutions to the urn problem that were more time consuming despite using an apply statement. What is happening here?
debug
functionssystem.time
and bench::mark
profvis::provis
rbind
’ing)data.table
as appropriatercpp
)How do you organize files on your computer?
A friend of mine stored all documents in the downloads folder! When it came time to find a file, it was like finding a needle in a haystack (there also wasn’t a uniform naming convention).
“Most good software developers agree that code should be written in a modular manner. Your first-level code should not be longer than, say, a dozen lines, with much of it consisting of function calls. And those functions should not be too lengthy and should call other functions if necessary. This makes the code easier to organize during the writing stage and easier for others to understand when it comes time for the code to be extended.”
Top-down modular level:
With debugging start top-down. Assume lower-level functions are correct until finishing debugging / checking top-level modules.
An example from collaboration:
An example from an R package: SeqSGPV
Question for HW 1: What can be “modularized”?
Pro-actively stop a function (or provide a warning or message) if a condition is not met (example similar to here):
<- 1
x stopifnot(is.character(x))
Error: is.character(x) is not TRUE
Here, is an equivalent stop command using if-logic. deparse(subsitute([var]))
prints the name of the object:
<- 1
x if(!is.character(x)){
stop(paste0(deparse(substitute(x)), " is not a character"))
}
Error in eval(expr, envir, enclos): x is not a character
<- 1
x if(!is.character(x)){
warning(paste0(deparse(substitute(x)), " is not a character"))
}
Warning: x is not a character
For a more in-depth consideration of debugging, see https://adv-r.hadley.nz/debugging.html. The examples below are from this book.
In R-Studio, notice the “Debug” drop-down menu. Under “on error” it has three options:
traceback
and debugonce
optionsdebugonce
The traceback
function lists call that provided the error and any sub-calls prior to it.
Options to debug a function from top-to-bottom:
debug
and (undebug)
debugonce
This will interactively walk you through the function call with the following commands:
n
: next line of codes
: step into the next functionf
: continue forward through the current {}
code block (example, a loop or the rest of the function)c
: continue the rest of the function to see if it’ll complete as anticipatedQ
: quit interactivewhere
: shows the trace stack (the possibly multiple layers of a function call)any r command
may be run when in debugger modeAn example where f()
calls g()
which calls h()
which calls i()
which is a function with conditional logic …
<- function(a) g(a)
f <- function(b) h(b)
g <- function(c) i(c)
h <- function(d) {
i if (!is.numeric(d)) {
stop("`d` must be numeric", call. = FALSE)
}+ 10
d
}
# Traceback and debug using Rstudio 'Rerun with Debug'
f("a")
# Enter debugging mode for a function until deactivating debugging mode
debug(f)
f("a")
f("a")
undebug(f)
# Enter debug mode for a single call
debugonce(f)
f("a")
f("a")
View
and debug
are good ways to learn the underlying code to a function.
View(lm)
debugonce(lm)
lm(1:10~1,method = "model.frame")
You can use the browser
function to enter debugging mode partially through a function
<- function(x, remainder){
f
# Step 1, add 1 to even elements
%%2] <- x[x%%2] + 1
x[x
# Step 2, see if each value in sequence is even or odd
<- x %% remainder
out
if(length(unique(out))!= 2) browser()
# Step 2, for elements with 0 remainder, add 1
== 0] <- x[out == 0] + 1
x[out
# Step 3, report number of unique elements
length(unique(x))
}f(x=1:10,remainder=3)
The browser function can be helpful if you want to debug a loop mid-way through iterations.
<- function(){
f for (i in 1:10^5){
print(i)
if(i == 100) browser()
}
}f()
While embedding browser
into code can be helpful, you must remember to remove the browser calls when done. Rstudio also provides a feature to set breakpoints by clicking on the line-number where you’d want to start debugging. See also breakpoints.
Using the urn problem, set breakpoints and run through debugonce()
.
Find a set of code you’ve written and know well to share with a classmate.
As the classmate,
When you get your code back, can you find and fix the bug?
For a quick profiling of code, use system.time()
.
<- runif(10000000)
x <- runif(10000000)
y <- system.time(z <- x + y)
t1
<- numeric(10000000)
z <- system.time({
t2 for (i in 1:length(x)) {
<- x[i] + y[i]
z[i]
}
})
# Time duration
t1
user system elapsed
0.005 0.008 0.015
t2
user system elapsed
0.407 0.010 0.421
# The loop is longer by a factor of:
"elapsed"] / t1["elapsed"] t2[
elapsed
28.06667
Why is t2
28.07? times longer?
:
and [
(subsetting) which add to the collective set of functions called (i.e. the “stack frame”).system.time
is not as precise for quick calls and can require many iterations to get a meaningful timing (example: 10e6 in above example). bench::mark
can provide more precise timings (see warm up problem).
relative
option in mark
does not report in raw time but in comparing different callscheck=FALSE
allows for running different code which may return different results (example: due to sampling)profvis::profvis
is a good go-to for evaluating performance of multiple calls.
We’ll work through Ch. 23: Measuring Performance together. Begin by the below script as a file called “profiling-example.R”:
<- function() {
f ::pause(0.1)
profvisg()
h()
}<- function() {
g ::pause(0.1)
profvish()
}<- function() {
h ::pause(0.1)
profvis }
Install the profvis
package if needed, and call:
source("profiling-example.R")
::profvis(f()) profvis
Alternatively, call
<- function() {
f ::pause(0.1)
profvisg()
h()
}<- function() {
g ::pause(0.1)
profvish()
}<- function() {
h ::pause(0.1)
profvis
}::profvis(f()) profvis
Now, we’ll turn to the notes Ch. 23: Measuring Performance as we walk through the:
Notice in the example below (and even better as seen in Ch. 23: Measuring Performance) how much time is spent in
::profvis({
profvis<- integer()
x for (i in 1:1e4) {
<- c(x, i)
x
} })
Run your urn solution (or another posted here) through profvis::profvis()
.
The bench::mark
function provides more accurate time of functions and allows for checking whether two functions return the same results (check=TRUE
). We’ve seen this here.
library(bench)
::mark( <solutions>, relative = TRUE, check = FALSE) bench
Parts to highlight:
“It’s easy to get caught up in trying to remove all bottlenecks. Don’t! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don’t spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and optimise only up to that goal. This means you will not eliminate all bottlenecks.”
Be aware of all the steps required for a function. If it requires additional steps, see if there are alternative strategies that provide equivalent solutions
Avoiding copies is when you iteratively build up an object that includes previous values. For example:
# NOTE: an example of what to AVOID
<- 1
out for (i in 1:10){
<- c(out,i)
out
}
# BETTER: Pre-allocate space
<- rep(NA,11)
out 1] <- 1
out[for (i in 1:10){
+1] <- i
out[i }
*check for existing solutions; also, appreciate the value of cpp
::session_info() devtools
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.5
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Denver
date 2024-09-03
pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.4.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.4.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.4.0)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
knitr 1.46 2024-04-06 [1] CRAN (R 4.4.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.4.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0)
rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.0)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
usethis 3.0.0 2024-07-29 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
xfun 0.43 2024-03-25 [1] CRAN (R 4.4.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────