Writing Scientific Documents with Quarto

Author

George G. Vega Yon, Ph.D.
george.vegayon@utah.edu

University of Utah

Published

January 12, 2023

Quarto is a modern publishing system developed by Posit (formerly RStudio) that provides a robust framework for creating all sorts of documents, including scientific papers, books, websites, and presentations. Quarto can work with R, Python, Julia, and Observable. Some great applications of Quarto include:

Creating lab reports to share with your team.
Create websites (like this workshop).
Writing books and scientific papers.
Creating presentations that are heavy on coding.

Quarto is built on top of Pandoc, which is a universal document converter that can convert between many different formats, including Markdown, HTML, LaTeX, and PDF. Quarto uses Pandoc to convert your documents into the desired format.

Quarto files

These are plain-text (not binary) files

---
title: "Hello World"
author: "Truly Yours"
date: "`r Sys.Date()`"
format: html
---

# First level header

## Second level header

Some text that goes along with the document

Code chunks can have tags, like the one here


```{r first-code-chunk}
sqrt(pi)
```


And also, they can have options. For example, if you don't want the source code
to be printed out, you add the option `echo: false` as in the following code
chunk


```{r second-code-chunk}
#| echo: false
plot(USArrests)
```

Main components of a qmd file

The header: Information about the document in yaml format

---
title: "Hello World"
author: "Truly Yours"
date: "`r Sys.Date()`"
format: html
---

R code chunks (with options)
```
```{r first-code-chunk}
sqrt(pi)
```
```

R code chunks (without options)


```{r second-code-chunk}
#| echo: false
plot(USArrests)
```

Some other options include:
- cache: Logical, when true saves the result of the code chunk so it doesn’t need to compute it every time (handy for time-consuming code!)
- messages: Logical, when true it suppresses whatever message the R code in the chunk generates.
- fig.cap: Character vector. Specifies the title of plots generated within the chunk.
More here.

How it works

Source: Quarto website https://quarto.org/docs/faq/rmarkdown.html

The function quarto passes the qmd file to knitr
knitr executes the R code (or whatever code is there) and creates an md file (markdown, not Rmarkdown)
Then the md file is passed to pandoc, which ultimately compiles the document in the desired format as specified in the output option of the header.

Quarto supports other formats

The following code chunk requires having the reticulate R package (R interface to Python)


```{py some-py-code}
print "Hello World"
import this
```

Hello World

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Tables with Quarto

Suppose that we want to include the following data as a table part of our document

Code

# Loading the package
library(gapminder)

# Calculating stats at the year level
stats_by_year <- gapminder %>%
  group_by(year) %>%
  summarise(
    `Life Expectancy` = mean(lifeExp),
    `Population`      = mean(pop),
    `GDP pp`          = mean(gdpPercap)
  ) %>%
  arrange(year)

stats_by_year

# A tibble: 12 × 4
    year `Life Expectancy` Population `GDP pp`
   <int>             <dbl>      <dbl>    <dbl>
 1  1952              49.1  16950402.    3725.
 2  1957              51.5  18763413.    4299.
 3  1962              53.6  20421007.    4726.
 4  1967              55.7  22658298.    5484.
 5  1972              57.6  25189980.    6770.
 6  1977              59.6  27676379.    7313.
 7  1982              61.5  30207302.    7519.
 8  1987              63.2  33038573.    7901.
 9  1992              64.2  35990917.    8159.
10  1997              65.0  38839468.    9090.
11  2002              65.7  41457589.    9918.
12  2007              67.0  44021220.   11680.

There are at least two ways of doing it

Tabulation with `knitr`

The knitr package provides the function kable to print tables.

It has the nice feature that you don’t need to be explicit about the format, i.e., it will automatically guess what type of document you are working with.

Code

knitr::kable(
    head(stats_by_year),
    caption = "Year stats from the gapminder data",
    format.args = list(big.mark=",")
    )

Year stats from the gapminder data
year	Life Expectancy	Population	GDP pp
1,952	49.05762	16,950,402	3,725.276
1,957	51.50740	18,763,413	4,299.408
1,962	53.60925	20,421,007	4,725.812
1,967	55.67829	22,658,298	5,483.653
1,972	57.64739	25,189,980	6,770.083
1,977	59.57016	27,676,379	7,313.166

Checkout kableExtra which provides extensions to the kable function.

Tabulation with `pander`

Another (very cool) R package is pander
It provides helper functions to work with pandoc’s markdown format

This means that you don’t need to think about what is the final output format

Code

pander::pandoc.table(
  head(stats_by_year), 
  caption = "Year stats from the gapminder data"
  )

Year stats from the gapminder data
year	Life Expectancy	Population	GDP pp
1952	49.06	16950402	3725
1957	51.51	18763413	4299
1962	53.61	20421007	4726
1967	55.68	22658298	5484
1972	57.65	25189980	6770
1977	59.57	27676379	7313

Regression tables

There are a lot of functions around to include regression output

Suppose that we run the following models on the diamonds dataset

Code

data(diamonds, package="ggplot2")

# Model 1
model1 <- lm(price ~ carat, data = diamonds)
model2 <- lm(price ~ carat + depth, data = diamonds)
model3 <- lm(price ~ carat + table, data = diamonds)
model4 <- lm(price ~ carat + depth + table, data = diamonds)

# Let's put it all in a list to handle it together
models <- list(model1, model2, model3, model4)

How can we include these in our report/paper?

Regression tables with `texreg`

The R package texreg

Code

texreg::htmlreg(models, doctype=FALSE)

Statistical models
	Model 1	Model 2	Model 3	Model 4
(Intercept)	-2256.36^***	4045.33^***	1961.99^***	13003.44^***
	(13.06)	(286.21)	(171.81)	(390.92)
carat	7756.43^***	7765.14^***	7820.04^***	7858.77^***
	(14.07)	(14.01)	(14.22)	(14.15)
depth		-102.17^***		-151.24^***
		(4.64)		(4.82)
table			-74.30^***	-104.47^***
			(3.02)	(3.14)
R²	0.85	0.85	0.85	0.85
Adj. R²	0.85	0.85	0.85	0.85
Num. obs.	53940	53940	53940	53940
^*p < 0.001; ^p < 0.01; ^*p < 0.05

It also has the functions texreg, for LaTeX tables, and screenreg, for plaintext output
The problem, you have to be explicit in the type of table that you want to print

Regression tables with `memisc`

The R package memisc

Code

library(memisc)
tab <- mtable(
  `Model 1` = model1,
  `Model 2` = model2,
  `Model 3` = model3,
  `Model 4` = model4,
  summary.stats=c("sigma","R-squared","F","p","N")
) %>% write.mtable(file = stdout(), format = "HTML")

	Model 1			Model 2			Model 3			Model 4
(Intercept)	−2256	.	361***	4045	.	333***	1961	.	992***	13003	.	441***
	(13	.	055)	(286	.	205)	(171	.	811)	(390	.	918)
carat	7756	.	426***	7765	.	141***	7820	.	038***	7858	.	771***
	(14	.	067)	(14	.	009)	(14	.	225)	(14	.	151)
depth				−102	.	165***				−151	.	236***
				(4	.	635)				(4	.	820)
table							−74	.	301***	−104	.	473***
							(3	.	018)	(3	.	141)
R-squared	0	.	849	0	.	851	0	.	851	0	.	854
sigma	1548	.	562	1541	.	649	1539	.	946	1526	.	094
F	304050	.	906	153634	.	765	154034	.	567	104890	.	460
p	0	.	000	0	.	000	0	.	000	0	.	000
N	53940			53940			53940			53940
Significance: * = p < 0.001; = p < 0.01; * = p < 0.05

Plots with Quarto

In the case of plots, these just work!

Code

ggplot(diamonds, aes(x = carat, y = price, color=cut)) + 
  geom_point() +
  ggtitle("Plots with Quarto just work")

The University of Utah

Quarto files

Main components of a qmd file

How it works

Quarto supports other formats

Tables with Quarto

Tabulation with knitr

Tabulation with pander

Regression tables

Regression tables with texreg

Regression tables with memisc

Plots with Quarto

Tabulation with `knitr`

Tabulation with `pander`

Regression tables with `texreg`

Regression tables with `memisc`