Lab 02 - R Essentials

Learning goals

Today’s lab practices vectorization and labeling objects. Although the second half entails Bayesian statistics, the emphasis is on coding. We will walk through what is needed for the Bayesian estimation.

The second half of the lab is also a first step toward completing HW 1.

Lab

Urn problem revisited (see problem here)

  1. Take your solution to urn problem and turn it into a function. The function should take inputs with defaults: nreps=100000, b1=10, y1=8, b2=6, y2=6 which has default values.

  2. Post function here.

  3. We will evaluate the performance of each function and then discuss how to speed up the function.

library(bench)
bench::mark( <solutions>, relative = TRUE, check = FALSE)
  1. For the submission, provide your original function, a version that is updated for more efficiency (if applicable), and a brief summary of what you learned for future coding.

Estimating the probability one treatment arm is better than another

Create a simulated dataset that enrolls 40 participants (\(i = 1, \dots, 40\)) equally randomized to 4 arms (\(t=0, 1, 2, 3, 4\)). Generate outcomes supposing the probability of success under each arm, \(p_t\), is 0.35.

Begin by creating a matrix with 10 rows and 4 columns. Label the columns to indicate each treatment arm and generate the observed outcomes under each treatment arm. The total number of observations and successes under arm \(t\) is, respectively, \(n_t\) and \(y_t\).

Compare the probability of success under each experimental arm compared to control, i.e., Pr (\(p_t > p_0\)).

Here, we’ll use a bayesian framework to estimate the distribution of \(p_t\) under each arm. With Bernoulli outcomes and a Beta(\(\alpha_t = 0.35, \beta_t = 0.65\)) prior distribution on the success rate, the posterior distribution is Pr(\(p_t\) | \(y_t\)) \(\sim\) Beta(\(\alpha_t + y_t, \beta_t + n_t - y_t\)).

To estimate Pr (\(p_t > p_0\)), use vectors take many [such as 1000] random draws from each arm’s posterior distribution: rbeta(n=1000, 0.35 + y_t, 0.65 + n_t - y_t). Use vectorization to compare how often a randomly generated experimental success rate is greater than a randomly generated control success rate.

Declare the trial a “success” if the maximum of Pr (\(p_t > p_0\)) > \(\delta\) = 0.025.

Throughout the work, use labels wherever appropriate. Submit this as your lab assignment.

Next steps

With minor modifications, this is the set-up for the first of two trial designs of HW 1. If you have extra time, begin working on HW 1.