Writing Scientific Documents with Quarto

Author

George G. Vega Yon, Ph.D.
george.vegayon@utah.edu

University of Utah

Published

January 12, 2023

Quarto is a modern publishing system developed by Posit (formerly RStudio) that provides a robust framework for creating all sorts of documents, including scientific papers, books, websites, and presentations. Quarto can work with R, Python, Julia, and Observable. Some great applications of Quarto include:

Quarto is built on top of Pandoc, which is a universal document converter that can convert between many different formats, including Markdown, HTML, LaTeX, and PDF. Quarto uses Pandoc to convert your documents into the desired format.

Quarto files

  • These are plain-text (not binary) files

    ---
    title: "Hello World"
    author: "Truly Yours"
    date: "`r Sys.Date()`"
    format: html
    ---
    
    # First level header
    
    ## Second level header
    
    Some text that goes along with the document
    
    Code chunks can have tags, like the one here
    
    
    ```{r first-code-chunk}
    sqrt(pi)
    ```
    
    
    And also, they can have options. For example, if you don't want the source code
    to be printed out, you add the option `echo: false` as in the following code
    chunk
    
    
    ```{r second-code-chunk}
    #| echo: false
    plot(USArrests)
    ```
    

Main components of a qmd file

  • The header: Information about the document in yaml format

    ---
    title: "Hello World"
    author: "Truly Yours"
    date: "`r Sys.Date()`"
    format: html
    ---
  • R code chunks (with options)

    
    ```{r first-code-chunk}
    sqrt(pi)
    ```
    
  • R code chunks (without options)

    
    ```{r second-code-chunk}
    #| echo: false
    plot(USArrests)
    ```
    

  • Some other options include:

    • cache: Logical, when true saves the result of the code chunk so it doesn’t need to compute it every time (handy for time-consuming code!)

    • messages: Logical, when true it suppresses whatever message the R code in the chunk generates.

    • fig.cap: Character vector. Specifies the title of plots generated within the chunk.

    More here.

How it works

Source: Quarto website https://quarto.org/docs/faq/rmarkdown.html

  • The function quarto passes the qmd file to knitr

  • knitr executes the R code (or whatever code is there) and creates an md file (markdown, not Rmarkdown)

  • Then the md file is passed to pandoc, which ultimately compiles the document in the desired format as specified in the output option of the header.

Quarto supports other formats

  • The following code chunk requires having the reticulate R package (R interface to Python)

    
    ```{py some-py-code}
    print "Hello World"
    import this
    ```
    
    Hello World
    The Zen of Python, by Tim Peters
    
    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

Tables with Quarto

  • Suppose that we want to include the following data as a table part of our document

    Code
    # Loading the package
    library(gapminder)
    
    # Calculating stats at the year level
    stats_by_year <- gapminder %>%
      group_by(year) %>%
      summarise(
        `Life Expectancy` = mean(lifeExp),
        `Population`      = mean(pop),
        `GDP pp`          = mean(gdpPercap)
      ) %>%
      arrange(year)
    
    stats_by_year
    # A tibble: 12 × 4
        year `Life Expectancy` Population `GDP pp`
       <int>             <dbl>      <dbl>    <dbl>
     1  1952              49.1  16950402.    3725.
     2  1957              51.5  18763413.    4299.
     3  1962              53.6  20421007.    4726.
     4  1967              55.7  22658298.    5484.
     5  1972              57.6  25189980.    6770.
     6  1977              59.6  27676379.    7313.
     7  1982              61.5  30207302.    7519.
     8  1987              63.2  33038573.    7901.
     9  1992              64.2  35990917.    8159.
    10  1997              65.0  38839468.    9090.
    11  2002              65.7  41457589.    9918.
    12  2007              67.0  44021220.   11680.

    There are at least two ways of doing it

Tabulation with knitr

  • The knitr package provides the function kable to print tables.

  • It has the nice feature that you don’t need to be explicit about the format, i.e., it will automatically guess what type of document you are working with.

    Code
    knitr::kable(
        head(stats_by_year),
        caption = "Year stats from the gapminder data",
        format.args = list(big.mark=",")
        )
    Year stats from the gapminder data
    year Life Expectancy Population GDP pp
    1,952 49.05762 16,950,402 3,725.276
    1,957 51.50740 18,763,413 4,299.408
    1,962 53.60925 20,421,007 4,725.812
    1,967 55.67829 22,658,298 5,483.653
    1,972 57.64739 25,189,980 6,770.083
    1,977 59.57016 27,676,379 7,313.166
  • Checkout kableExtra which provides extensions to the kable function.

Tabulation with pander

  • Another (very cool) R package is pander

  • It provides helper functions to work with pandoc’s markdown format

  • This means that you don’t need to think about what is the final output format

    Code
    pander::pandoc.table(
      head(stats_by_year), 
      caption = "Year stats from the gapminder data"
      )
    Year stats from the gapminder data
    year Life Expectancy Population GDP pp
    1952 49.06 16950402 3725
    1957 51.51 18763413 4299
    1962 53.61 20421007 4726
    1967 55.68 22658298 5484
    1972 57.65 25189980 6770
    1977 59.57 27676379 7313

Regression tables

  • There are a lot of functions around to include regression output

  • Suppose that we run the following models on the diamonds dataset

    Code
    data(diamonds, package="ggplot2")
    
    # Model 1
    model1 <- lm(price ~ carat, data = diamonds)
    model2 <- lm(price ~ carat + depth, data = diamonds)
    model3 <- lm(price ~ carat + table, data = diamonds)
    model4 <- lm(price ~ carat + depth + table, data = diamonds)
    
    # Let's put it all in a list to handle it together
    models <- list(model1, model2, model3, model4)
  • How can we include these in our report/paper?

Regression tables with texreg

  • The R package texreg

    Code
    texreg::htmlreg(models, doctype=FALSE)

    Statistical models

     

    Model 1

    Model 2

    Model 3

    Model 4

    (Intercept)

    -2256.36***

    4045.33***

    1961.99***

    13003.44***

     

    (13.06)

    (286.21)

    (171.81)

    (390.92)

    carat

    7756.43***

    7765.14***

    7820.04***

    7858.77***

     

    (14.07)

    (14.01)

    (14.22)

    (14.15)

    depth

     

    -102.17***

     

    -151.24***

     

     

    (4.64)

     

    (4.82)

    table

     

     

    -74.30***

    -104.47***

     

     

     

    (3.02)

    (3.14)

    R2

    0.85

    0.85

    0.85

    0.85

    Adj. R2

    0.85

    0.85

    0.85

    0.85

    Num. obs.

    53940

    53940

    53940

    53940

    ***p < 0.001; **p < 0.01; *p < 0.05

  • It also has the functions texreg, for LaTeX tables, and screenreg, for plaintext output

  • The problem, you have to be explicit in the type of table that you want to print

Regression tables with memisc

  • The R package memisc

    Code
    library(memisc)
    tab <- mtable(
      `Model 1` = model1,
      `Model 2` = model2,
      `Model 3` = model3,
      `Model 4` = model4,
      summary.stats=c("sigma","R-squared","F","p","N")
    ) %>% write.mtable(file = stdout(), format = "HTML")

    Model 1

    Model 2

    Model 3

    Model 4

    (Intercept)

    −2256

    .

    361***

    4045

    .

    333***

    1961

    .

    992***

    13003

    .

    441***

    (13

    .

    055)

    (286

    .

    205)

    (171

    .

    811)

    (390

    .

    918)

    carat

    7756

    .

    426***

    7765

    .

    141***

    7820

    .

    038***

    7858

    .

    771***

    (14

    .

    067)

    (14

    .

    009)

    (14

    .

    225)

    (14

    .

    151)

    depth

    −102

    .

    165***

    −151

    .

    236***

    (4

    .

    635)

    (4

    .

    820)

    table

    −74

    .

    301***

    −104

    .

    473***

    (3

    .

    018)

    (3

    .

    141)

    R-squared

    0

    .

    849

    0

    .

    851

    0

    .

    851

    0

    .

    854

    sigma

    1548

    .

    562

    1541

    .

    649

    1539

    .

    946

    1526

    .

    094

    F

    304050

    .

    906

    153634

    .

    765

    154034

    .

    567

    104890

    .

    460

    p

    0

    .

    000

    0

    .

    000

    0

    .

    000

    0

    .

    000

    N

    53940

    53940

    53940

    53940

    Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05

Plots with Quarto

  • In the case of plots, these just work!

    Code
    ggplot(diamonds, aes(x = carat, y = price, color=cut)) + 
      geom_point() +
      ggtitle("Plots with Quarto just work")

The University of Utah