Intro to C++

PHS 7045: Advanced Programming

George G. Vega Yon, Ph.D.

The University of Utah

2024-09-17

Introduction

Learning objectives:

Understand the basics of C++ programming: syntax, types, and classes.
Learn how to write a simple C++ program, compile it, and run it.
Understand the differences between C++ and R.

We will need a compiler:

Windows: You can download Rtools from here.
MacOS: It is a bit complicated… Here are some options:
- CRAN’s manual to get the clang, clang++, and gfortran compilers here.
- A great guide by the coatless professor here
If you don’t have compiler installed, you can join the class via posit.cloud.

Start!

Hello world

The program

1#include<iostream>

2int main() {
3  std::cout << "Hello world" << std::endl;
4  return 0;
}

1: The equivalent to library() in R. This is part of the standard library.
2: C++ is type-explicit, so we always declare what are the data types.
3: Like in R, we have namespaces. We access the cout function from std (standard library). Also, the code ends with semicolon (;).
4: Explicit return.

We can use g++ to compile the code (-std=c++14 is the C++14 standard):

g++ -std=c++14 hello-world.cpp -o hello-world
./hello-world

Hello world

Example: Computing the mean

Download the program here.

#include<iostream> // To print
#include<vector>   // To use vectors

int main() {

  // Defining the data
  std::vector< double > dat = {1.0, 2.5, 4.4};
  
  // Making room for the output
  double ans = 0.0;

  // Looping through the data
  for (int i = 0; i < dat.size(); ++i)
    ans = ans + dat[i];

  ans = ans/dat.size();

  // Print out the value to the screen
  std::cout << "The mean of dat is " << ans << std::endl;

  // Returning
  return 0;

}

Loading the libraries

Creating a vector double with three values.

A for-loop that starts from zero and goes to the size of the vector, incrementing by one (++i).

To compile it, run the following command:

g++ -std=c++14 means.cpp -o means
./means

The mean of dat is 2.63333

Example: Computing the mean (take 2)

We can leverage modern C++ to make the code shorter with std::accumulate()

#include<iostream> // To print
#include<vector>   // To use vectors
#include<numeric>  // To use the accumulate function

int main() {

  // Defining the data
  std::vector< double > dat = {1.0, 2.5, 4.4};
  
  // Making room for the output
  double ans = std::accumulate(
    dat.begin(), dat.end(), 0.0
    );
  ans /= dat.size();

  // Print out the value to the screen
  printf("The mean of dat is %.2f\n", ans);

  // Returning
  return 0;

}

Using the numeric library that has the accumulate function

The std::accumulate function summs the elements of the vector.

To compile it, run the following command:

#| eval: true
#| echo: true
g++ -std=c++14 means2.cpp -o means2
./means2

Language fundamentals

Differences with R

Here are some differences between C++ and R:

Feature	C++	R
Type	Compiled	Interpreted
Type explicit?	Yes	No
Index starts at	0	1
`for` loop	`for (int i = 0; i < n; ++i)`	`for (i in 1:n)`
Line ending	“`;`”	“`\n`” (implicit)

Compiled: the code is translated to machine code before running, allowing for faster execution. Interpreted: the code is executed, allowing interactive programming.
Type explicit: in C++, we always declare the type of the variables. In R, we don’t need to.

Fundamental types

Adapted from W3 Schools:

int my_num           = 5;       // Integer (whole number)
float my_float_num   = 5.99;    // Floating point number
double my_double_num = 9.98;    // Floating point number
char my_letter       = 'D';     // Character
bool my_boolean      = true;    // Boolean
std::string my_text  = "Hello"; // String

Vectors in C++ are similar to lists in R:

std::vector< int > my_vector = {1, 2, 3, 4, 5};
std::vector< std::string > my_str_vector = {"a", "b", "c"};
std::vector< std::vector< int > > my_matrix = {{1, 2}, {3, 4}};

Vectors in C++

Vectors make life easier, avoiding the need to manage memory.
Vectors store contiguous memory, allowing for fast access.
Vectors have many methods to manipulate the data:

my_vector.push_back(6); // Add an element
my_vector.pop_back();   // Remove the last element
my_vector.size();       // Number of elements
my_vector[0];           // Access the first element
my_vector.at(0)         // Access the first element (safer)

Vectors in C++: Looping

Looping through vectors can be done in different ways:

// Suppose we have this:
std::vector< int > my_vector = {1, 2, 3, 4, 5};`

// Typical loop
for (int i = 0; i < my_vector.size(); ++i) {
  std::cout << my_vector[i] << std::endl;
}

// Using vector's iterators (begin and end)
// and the auto keyword
for (auto i = my_vector.begin(); i != my_vector.end(); ++i) {
  std::cout << *i << std::endl;
}

// Using range-based for loop (with the auto keyword)
for (auto i: my_vector) {
  std::cout << i << std::endl;
}

The typical loop, access elements by index.

Using iterators, a bit more complex, i becomes a pointer to the value.

The range-based for loop, simpler and cleaner.

Important keywords

Types can go accompained by keywords:

1const int x = 5;
2double fun(int x)
3double fun(const int x)
4double fun(int & x)
5double fun(const int & x)
6double fun(int * x)

1: const: the value of x cannot be changed. Trying to modify it will result in a compilation error.
2: x is passed by copy (not ideal for large objects). It can be modified inside the function.
3: x is still a copy, but it cannot be modified.
4: &: passing by reference. Ideal for large objects. It can be modified.
5: const &: passing by reference, but cannot be modified.
6: *: passing by pointer. The value can be modified. NOT RECOMENDED FOR C++

Important keywords: Example with pointers

The following code (pointers.cpp) illustrates how these keywords work:

#include <cstdio> // For the std version of printf

void set_x_copy(int x, int y) {x = y;}; 
void set_x(int * x, int y) {*x = y;};   
void set_x_ref(int & x, int y) {x = y;};

int main() {

    int x = 0;

    set_x_copy(x, 3);
    std::printf("x = %d\n", x);
    set_x(&x, 2);
    std::printf("x = %d\n", x);
    set_x_ref(x, 1);
    std::printf("x = %d\n", x);

    return 0;

}

set_x_copy: x is passed by copy, and thus the value is not modified.

set_x: x is passed by pointer, and thus the value is modified. *x access the value at x, and &x is it’s memory address.

set_x_ref: x is passed by reference, and thus the value is modified. Passed by reference is the preferred way in C++.

To compile and run the code:

g++ -std=c++14 pointers.cpp -o pointers
./pointers

x = 0
x = 2
x = 1

Classes in C++

Example class (you can download the file here):

1#ifndef PERSON_HPP
#define PERSON_HPP

#include<string>
#include<iostream>

class Person {
2private:
    std::string name;
    int age;
    double height;

3public:
  // Constructor
4  Person(std::string n, int a, double h) {
    name = n;
    age = a;
    height = h;
  };

  // Default constructor
  Person() : name("Unknown"), age(0), height(0.0) {};

  // Destructor
5  ~Person() {
    std::cout <<
6      this->name + " destroyed" <<
      std::endl;
  };

  // Getters and setters
7  std::string get_name() { return name; };
  void set_name(std::string n) { name = n; };
};

#endif

1: The #ifndef + #define + #endif is the include guard. Avoids multiple inclusions.
2: Private members: only accessible within the class.
3: Public members: accessible from outside the class.
4: Constructor: initializes the object.
5: Destructor: called when deleting the object.
6: Internal elements can be accessed with this->.
7: Access: methods to access and modify private members.

Classes in C++ (cont.)

Using the class (you can download the file here):

#include<string>
#include<iostream>
#include "person.hpp"

int main() {
  Person p1; // Default constructor
  Person p2("John", 30, 1.80); // Other constructor

  std::cout << p1.get_name() << std::endl;
  std::cout << p2.get_name() << std::endl;

  return 0;
}

Compiling and executing the program:

g++ -std=c++14 person.cpp -o person
./person

Unknown
John
John destroyed
Unknown destroyed

Notice that the destroyer is called when p1 and p2 go out of scope (in reverse order).

Classes in C++: Declaration and Implementation

A good practice is to separate the declaration (bones) from the implementation (meat).
Looking at an extract of the class Person:

// ---------------------------------------
// Declarations: Arguments and data types
// ---------------------------------------
class Person {
private:
    std::string name;
    int age;
    double height;

public:
  // Constructor
  Person(std::string n, int a, double h);

  // Getters and setters
  std::string get_name();
};

// ---------------------------------------
// Implementation: Body of the functions
// ---------------------------------------
inline Person::Person(std::string n, int a, double h) {
  name = n;
  age = a;
  height = h;
};

inline std::string Person::get_name() {
  return name;
};

Overloading

In C++, we can have multiple functions with the same name, but different arguments. This is called overloading.
The compiler will choose the correct function based on the arguments. Both of these functions are valid:

int add_int(int x, int y) {
  return x + y;
}

double add_double(double x, double y) {
  return x + y;
}

float add_float(float x, float y) {
  return x + y;
}

int add(int x, int y) {
  return x + y;
}

double add(double x, double y) {
  return x + y;
}

float add(float x, float y) {
  return x + y;
}

Templates

In C++, we can use templates to create functions or classes that can work with any data type.
This is useful when we want to create a function that works with int, double, float, etc.

int add(int x, int y) {
  return x + y;
}

double add(double x, double y) {
  return x + y;
}

float add(float x, float y) {
  return x + y;
}

1template<typename T>
T add(T x, T y) {
  return x + y;
}

2template<>
float add(float x, float y) {
  std::cout<< "This is a float!" << std::endl;
  return x + y;
}

1: Template declaration (the generic type is T).
2: Specialization for float.

Templates (cont.)

Classes can also be templated (defined in template_class.cpp):

#include<iostream>

template<typename T>
class MyAdder {
private:
  T x;
  T y;
public:
  MyAdder(T x, T y) : x(x), y(y) {};

  T add() {
    return x + y;
  };
};

int main() {
  MyAdder<int> a(1, 2);
  MyAdder<double> b(1.0, 2.0);

  std::cout << a.add() << std::endl;
  std::cout << b.add() << std::endl;

  return 0;
}

The class is templated. The value T can be any type.

The template can be used any time we specify the type.

The class is instantiated with int and double. The compiler with generate two classes during compilation.

g++ -std=c++14 template_class.cpp -o template_class
./template_class

3
3

Compared with R

Simulating pi

One way to estimate \(\pi\) is to simulate points in a square and count how many are inside a circle.
The following is an optimized R function to do this:

\(A = \pi r^2\), thus \(\pi = \frac{A}{r^2}\).

my_pi_sim <- function(n) {
  xy <- matrix(runif(n*2, min=-1, max=1), ncol = 2)
  message(
    sprintf(
      "pi approx to: %.4f",
      mean(sqrt(rowSums(xy^2)) <= 1) * 4
    )
  )
}

set.seed(331)
my_pi_sim(1e6)

pi approx to: 3.1393

Let’s see how we can do this in C++.

Simulating pi in C++

#include <vector>
1#include <random>
int main() {
  
  // Setting the seed
2  std::mt19937 rng_engine;
  rng_engine.seed(123);

3  std::uniform_real_distribution<double> dist(-1.0, 1.0);

  // Number of simulations
  size_t n_sims = 5e6;

  // Defining the data
  double pi_approx = 0.0;
  for (size_t i = 0u; i < n_sims; ++i)
  {

    // Generating a point in the unit square
    double x = dist(rng_engine);
    double y = dist(rng_engine);

    double dist = std::sqrt(
4        std::pow(x, 2.0) + std::pow(y, 2.0)
        );

    // Checking if the point is inside the unit circle 
    if (dist <= 1.0)
      pi_approx += 1.0;

  }

  printf("pi approx to %.4f\n", 4.0*pi_approx/n_sims);

  return 0;

}

1: Library for random rumbers and stats distributions.
2: Random number engine (used in comb. with the distributions).
3: Uniform distribution between -1 and 1.
4: std::pow is the power function.

g++ -std=c++14 pi.cpp -o pi
./pi

pi approx to 3.1420

Extended example: Writing a summary class

Writing a summary class

The task is to write a class that computes the mean, standard deviation, minimum and maximum of a vector.
The class should be a template class so it can deal with double and int.

Full program

You can download the full C++ code here and the header file here:

#ifndef SUMMARY_HPP
#define SUMMARY_HPP

#include <vector>
#include <numeric>
#include <cmath>
#include <cstdio>

template<typename T>
class Summarizer {
private:
    const std::vector<T>* dat = nullptr;
    double n;

public:
    // Constructors
    Summarizer(const std::vector<T> & dat_);

    // Calculators
    double mean() const;
    double sd() const;
    T min() const;
    T max() const;

    // Printer
    void print() const;
    
};

template<typename T>
inline Summarizer<T>::Summarizer(const std::vector<T> & dat_) {
    dat = &dat_;
    n = dat->size();
};

template<typename T>
inline double Summarizer<T>::mean() const {
    return std::accumulate(
        dat->begin(), dat->end(), 0.0
        ) / n;
};

template<typename T>
inline double Summarizer<T>::sd() const {
    double m = mean();
    double sum = 0.0;
    for (auto & i: *dat)
        sum += std::pow(i - m, 2.0);
    return std::sqrt(sum / (dat->size() - 1));
};

template<typename T>
inline T Summarizer<T>::min() const {
    T min = (*dat)[0];
    for (std::size_t i = 1u; i < dat->size(); ++i)
        if ((*dat)[i] < min)
            min = (*dat)[i];
    return min;
};

template<typename T>
inline T Summarizer<T>::max() const {
    T max = (*dat)[0];
    for (std::size_t i = 1u; i < dat->size(); ++i)
        if ((*dat)[i] > max)
            max = (*dat)[i];
    return max;
};

template<>
inline void Summarizer<double>::print() const {
    std::printf("Summary for double data\n");
    std::printf("Mean : %.2f\n", mean());
    std::printf("SD   : %.2f\n", sd());
    std::printf("Min  : %.2f\n", min());
    std::printf("Max  : %.2f\n", max());
};

template<>
inline void Summarizer<int>::print() const {
    std::printf("Summary for int data\n");
    std::printf("Mean : %.2f\n", mean());
    std::printf("SD   : %.2f\n", sd());
    std::printf("Min  : %d\n", min());
    std::printf("Max  : %d\n", max());
};

#endif

Details: Declaration of the class

template<typename T>
class Summarizer {
private:
    const std::vector<T>* dat = nullptr;
    double n;

public:
    // Constructors
    Summarizer(const std::vector<T> & dat_);

    // Calculators
    double mean() const;
    double sd() const;
    T min() const;
    T max() const;

    // Printer
    void print() const;
    
};

Details: The constructor

template<typename T>
inline Summarizer<T>::Summarizer(const std::vector<T> & dat_) {
    dat = &dat_;
    n = dat->size();
};

The implementation of the constructor is done outside of the function.
The inline keyword is used to tell the compiler to insert the code in the place where the function is called (more efficient).
Here, data is passed by reference and then the pointer is stored.

Details: The mean function

template<typename T>
inline double Summarizer<T>::mean() const {
    return std::accumulate(
        dat->begin(), dat->end(), 0.0
        ) / n;
};

The function is declared as const to tell the compiler that the function does not modify the object (the class itself).
The mean function uses the std::accumulate function.
Since dat is a pointer to a vector, we can access the members of dat via the -> operator (otherwise it would be using a . operator).

Running the example

#include "summary.hpp"

int main() {
    // Some data
    std::vector< double > dat = {1.0, 2.5, 4.4};
    std::vector< int > dat2 = {1, 2, 3, 4, 5};

    // Summarize the data
    Summarizer<double> s_double(dat);
    s_double.print();

    Summarizer<int> s_int(dat2);
    s_int.print();

    return 0;

}

g++ -std=c++14 summary.cpp -o summary
./summary

Summary for double data
Mean : 2.63
SD   : 1.70
Min  : 1.00
Max  : 4.40
Summary for int data
Mean : 3.00
SD   : 1.58
Min  : 1
Max  : 5