---
title: "Format columns as markdown"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Format columns as markdown}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette introduces how to format columns in flextable.

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = rmarkdown::pandoc_available("2.7.2") &&
    ("dplyr" %in% rownames(installed.packages()))
)
```

```{r setup}
library(flextable)
library(ftExtra)
```

# Why markdown?

The **flextable** package is an excellent package that allows fine controls on styling tables, and export it to variety of formats (HTML, MS Word, PDF).
Especially, when output format is MS Word, this package is the best solution in R.

On the other hand, styling texts with the **flextable** package often require large efforts.
The following example subscripts numeric values in chemical formulas.

```{r}
df <- data.frame(Oxide = c("SiO2", "Al2O3"), stringsAsFactors = FALSE)
ft <- flextable::flextable(df)

ft %>%
  flextable::compose(
    i = 1, j = "Oxide",
    value = flextable::as_paragraph(
      "SiO", as_sub("2")
    )
  ) %>%
  flextable::compose(
    i = 2, j = "Oxide",
    value = flextable::as_paragraph(
      "Al", as_sub("2"), "O", as_sub("3")
    )
  )
```

The above example has two problems:

1. This is just a *manual* re-writing of the table.
    - Basically, users will explicitly input which characters to subscript.
    - For fine formatting, users have to apply `compose` for each cells one by one.
1. Users have to learn a lot of functions from the **flextable** package
    - `compose`, `as_paragraph`, and `as_sub` in the above example

The first point can be solved by using a `for` loop, however, the code becomes quite complex.

```{r}
df <- data.frame(Oxide = c("SiO2", "Fe2O3"), stringsAsFactors = FALSE)
ft <- flextable::flextable(df)

for (i in seq_len(nrow(df))) {
  ft <- flextable::compose(
    ft,
    i = i,
    j = "Oxide",
    value = flextable::as_paragraph(
      list_values = df$Oxide[i] %>%
        stringr::str_replace_all("([2-9]+)", " \\1 ") %>%
        stringr::str_split(" ", simplify = TRUE) %>%
        purrr::map_if(
          function(x) stringr::str_detect(x, "[2-9]+"),
          flextable::as_sub
        )
    )
  )
}
ft
```
The **ftExtra** package provides easy solution by introducing markdown.
As markdown texts self-explain their formats by plain texts, what users have to do is manipulations of character columns with their favorite tools such as the famous **dplyr** and **stringr** packages.

1. Preprocess a data frame to decorate texts with markdown syntax.
2. Convert the data frame into a flextable object with the `flextable` function or `flextable` function.
3. Format markdown columns with `colformat_md`

The following example elegantly simplifies the prior example.

```{r}
df <- data.frame(Oxide = c("SiO2", "Fe2O3"), stringsAsFactors = FALSE)

df %>%
  dplyr::mutate(
    Oxide = stringr::str_replace_all(Oxide, "([2-9]+)", "~\\1~")
  ) %>%
  flextable::flextable() %>%
  ftExtra::colformat_md()
```

The `colformat_md` function is smart enough to detect character columns, so users can start without learning its arguments.
Of course, it is possible to chose columns.

Another workflow is to read a markdown-formatted table from a external file.
Again, markdown is by design a plain text, and can easily be embed in any formats such as  CSV and Excel.
So users can do something like

```r
readr::read_csv("example.csv") %>%
  flextable::flextable() %>%
  ftExtra::colformat_md()
```

By default, the **ftExtra** package employs Pandoc's markdown, which is also employed by R Markdown.
This enables consistent user experience when using the **ftExtra** package in R Markdown.

# Basic examples

The example below shows that `colformat_md()` function parses markdown texts in the flextable object.

```{r}
data.frame(
  a = c("**bold**", "*italic*"),
  b = c("^superscript^", "~subscript~"),
  c = c("`code`", "[underline]{.underline}"),
  d = c(
    "*[**~ft~^Extra^**](https://ftextra.atusy.net/) is*",
    "[Cool]{.underline shading.color='skyblue'}"
  ),
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md()
```

The table header can also be formatted by specifying `part = "header"` or `"all"` to `colformat_md()`

Supported syntax are

- **bold**
- *italic*
- `code`
- ^superscript^
- ~subscript~
- link
- footnote
- image
- line break
- citations
- math
- attributes with Span, Link, Code, and so on
    - to underline: `[foo]{.underline})`
    - to color: `[foo]{color=red}`
    - to highlight: `[foo]{shading.color=gray}`
    - to change font: `[foo]{font.family=Roboto}`
    - and the combinations of the above

Notes:

- other syntax may result in unexpected behaviors.
- multiple paragraphs are collapsed to a single paragraph with a separator given to the `.sep` argument (default: `"\n\n"`).

# Footnotes

An easy way to add a footnote is inline footnote.

```{r}
data.frame(
  package = "ftExtra",
  description = "Extensions for 'Flextable'^[Supports of footnotes]",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md() %>%
  flextable::autofit(add_w = 0.5)
```

Reference symbols can be configured by `footnote_options()`.
Of course, markdown can be used inside footnotes as well.

```{r}
data.frame(
  package = "ftExtra^[Short of *flextable extra*]",
  description = "Extensions for 'Flextable'^[Supports of footnotes]",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md(
    .footnote_options = footnote_options(
      ref = "i",
      prefix = "[",
      suffix = "]",
      start = 2,
      inline = TRUE,
      sep = "; "
    )
  ) %>%
  flextable::autofit(add_w = 0.5)
```

In order to add multiple footnotes to a cell, use normal footnotes syntax.

```{r}
data.frame(
  x =
    "foo[^a]^,^ [^b]

[^a]: aaa

[^b]: bbb",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md()
```

Experimentally, reference symbols can be formatted by an user-defined function.

```{r}
#' Custom formatter of reference symbols
#'
#' @param n n-th reference symbol (integer)
#' @param part where footnote exists: "body" or "header"
#' @param footer whether to format symbols in the footer: `TRUE` or `FALSE`
#'
#' @return a character vector which will further be processed as markdown texts
ref <- function(n, part, footer) {
  # Header uses letters and body uses integers for the symbols
  s <- if (part == "header") {
    letters[n]
  } else {
    as.character(n)
  }

  # Suffix symbols with ": " (a colon and a space) in the footer
  if (footer) {
    return(paste0(s, ":\\ "))
  }

  # Use superscript in the header and the body
  return(paste0("^", s, "^"))
}

# Apply custom function to format a table with footnotes
tibble::tibble(
  "header1^[note a]" = c("x^[note 1]", "y"),
  "header2" = c("a", "b^[note 2]")
) %>%
  flextable() %>%
  # process header first
  colformat_md(
    part = "header", .footnote_options = footnote_options(ref = ref)
  ) %>%
  # process body next
  colformat_md(
    part = "body", .footnote_options = footnote_options(ref = ref)
  ) %>%
  # tweak width for visibility
  flextable::autofit(add_w = 0.2)
```

Some notes:

- `colformat_md()` should be applied separately to the header and the body. In other words, `part = "all"` is not recommended. That may order footnotes unexpectedly.
- `footnote_options(ref)` should not be shared among the header and the body.

  ```r
  # DO NOT SHARE fopts among header and body
  fopts <- footnote_options(ref)
  ... %>%
    colformat_md(part = "header", .footnote_options = fopts) %>%
    colformat_md(part = "body", .footnote_options = fopts)
  ```

# Images

Images can be inserted optionally with width and/or height attributes.
Specifying one of them changes the other while keeping the aspect ratio.

```{r}
data.frame(
  R = sprintf("![](%s)", file.path(R.home("doc"), "html", "logo.jpg")),
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md() %>%
  flextable::autofit()
```

The R logo is distributed by The R Foundation with the [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license.

# Line breaks

By default, soft line breaks becomes spaces.

```{r}
data.frame(linebreak = c("a\nb"), stringsAsFactors = FALSE) %>%
  flextable() %>%
  colformat_md()
```

Pandoc's markdown supports hard line breaks by adding a backslash or double spaces at the end of a line.

```{r}
data.frame(linebreak = c("a\\\nb"), stringsAsFactors = FALSE) %>%
  flextable() %>%
  colformat_md()
```

It is also possible to make `\n` as a hard line break by extending Pandoc's Markdown.

```{r}
data.frame(linebreak = c("a\nb"), stringsAsFactors = FALSE) %>%
  flextable() %>%
  colformat_md(md_extensions = "+hard_line_breaks")
```

Markdown treats continuous linebreaks as a separator of blocks such as paragraphs.
However, **flextable** package lacks the support for multiple paragraphs in a cell.
To workaround, `colformat_md` collapses them to a single paragraph with a separator given to `.sep` (default: `\n\n`).

```{r}
data.frame(linebreak = c("a\n\nb"), stringsAsFactors = FALSE) %>%
  flextable() %>%
  colformat_md(.sep = "\n\n")
```

# Citations

Citations is experimentally supported.
Note that there are no citation lists.
It is expected to be produced by using R Markdown.

First, create a `ftExtra.bib` file like below.

```{r, echo=FALSE, collapse=FALSE, class.output="bibtex", warning=FALSE, comment=""}
knitr::write_bib("ftExtra")
```

Second, specify it, and optionally a CSL file, within the YAML front matter.

```yaml
---
bibliography: ftExtra.bib
# csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl
---
```

Finally, cite the references within tables.

```{r, eval=FALSE}
data.frame(
  Cite = c("@R-ftExtra", "[@R-ftExtra]", "[-@R-ftExtra]"),
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md() %>%
  flextable::autofit(add_w = 0.2)
```

<!--
NOTE: Avoid strange errors from some environments by error=TRUE
(e.g., Fedora Linux, R-devel, GCC on R-hub)
> pandoc-citeproc: Error in $: Incompatible API versions:
> encoded with [1,20] but attempted to decode with [1,17,0,4].
-->

```{r, echo=FALSE, warning=FALSE, error=TRUE}
tf <- tempfile(fileext = ".bib")
knitr::write_bib("ftExtra", tf)
data.frame(
  Cite = c("@R-ftExtra", "[@R-ftExtra]", "[-@R-ftExtra]"),
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md(pandoc_args = c("--bibliography", tf)) %>%
  flextable::autofit(add_w = 0.2)
```

If citation style such as Vancouver requires citations be numbered sequentially and consistently with the body,
manually offset the number for example by `colformat_md(.cite_offset = 5)`.

# Math

The rendering of math is also possible.

```{r}
data.frame(
  math = "$e^{i\\theta} = \\cos \\theta + i \\sin \\theta$",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md() %>%
  flextable::autofit(add_w = 0.2)
```

Note that results can be insufficient.
This feature relies on Pandoc's HTML writer, which

> render TeX math as far as possible using Unicode characters \
> https://pandoc.org/MANUAL.html#math-rendering-in-html

# Emoji

Pandoc's markdown provides an extension, `emoji`.
To use it with `colformat_md()`, specify `md_extensions="+emoji"`.

```{r}
data.frame(emoji = c(":+1:"), stringsAsFactors = FALSE) %>%
  flextable() %>%
  colformat_md(md_extensions = "+emoji")
```

# Other input formats

`colformat_md` supports variety of formats.
They can even be HTML despite the name of the function.

```{r}
data.frame(
  x = "H<sub>2</sub>O",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md(.from = "html")
```

Note that multiple paragraphs are not supported if `.from` is not `"markdown"`.
Below is an example with commonmark.


```{r}
data.frame(
  x = "foo\n\nbar",
  stringsAsFactors = FALSE
) %>%
  flextable() %>%
  colformat_md(.from = "commonmark")
```