Getting started with logrittr • logrittr

Motivation

In SAS, every DATA step prints a log that tells you exactly what happened:

NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.SALES has 112847 observations and 11 variables.

R’s dplyr pipelines are silent by default. logrittr fills that gap with %>=%, a pipe operator that logs at each step:

row count before/after (with signed delta)
column count before/after (with signed delta)
column names added or dropped
elapsed time

No function masking, no dependencies beyond cli and stringr for coloring and formatting in console.

Tip: Fira Code users: with ligatures enabled, %>=% renders as a single wide arrow, close from a regular pipe.

Installation

install.packages("logrittr", repos = "https://guillaumepressiat.r-universe.dev")

# alternatively
# remotes::install_github("GuillaumePressiat/logrittr")

Basic usage

library(logrittr)
library(dplyr)

iris %>=%
  as_tibble() %>=%
  filter(Sepal.Length < 5) %>=%
  mutate(rn = row_number()) %>=%
  group_by(Species) %>=%
  summarise(n = n_distinct(rn))

── iris  [rows:       150  cols:    5] ────────────────────────────────────────────
ℹ as_tibble()                                       rows:  150 +0     cols:  5 +0    [   0.0 ms]
ℹ filter(Sepal.Length < 5)                          rows:   22 -128   cols:  5 +0    [   1.0 ms]
ℹ mutate(rn = row_number())                         rows:   22 +0     cols:  6 +1    [   1.0 ms]
  added: rn
ℹ group_by(Species)                                 rows:   22 +0     cols:  6 +0    [   1.0 ms]
ℹ summarise(n = n_distinct(rn))                     rows:    3 -19    cols:  2 -4    [   1.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 other
  added: n

%>=% is fully composable with |> and %>%. Use it only where you want visibility, and fall back to the native pipe for the rest.

Nested pipelines

When %>=% appears inside an argument (e.g. inside semi_join()), the nested steps are automatically indented with a > prefix so they are visually distinct from the main pipeline:

iris %>=%
  as_tibble() %>=%
  filter(Sepal.Length < 5) %>=%
  mutate(rn = row_number()) %>=%
  semi_join(
    iris %>% as_tibble() %>=%
      filter(Species == "setosa"),
    by = "Species"
  ) %>=%
  group_by(Species) %>=%
  summarise(n = n_distinct(rn))

── iris  [rows:       150  cols:    5] ─────────────────────────────────────
ℹ as_tibble()                                              rows:   150 +0     cols:  5 +0    [   1.0 ms]
ℹ filter(Sepal.Length < 5)                                 rows:    22 -128   cols:  5 +0    [   1.0 ms]
ℹ mutate(rn = row_number())                                rows:    22 +0     cols:  6 +1    [   1.0 ms]
  added: rn
ℹ > filter(Species == "setosa")                            rows:    50 -100   cols:  5 +0    [   2.0 ms]
ℹ semi_join(iris %>% as_tibble() %>=% filter(Species ==    rows:    20 -2     cols:  6 +0    [  32.0 ms]
  "setosa"), by = "Species")
ℹ group_by(Species)                                        rows:    20 +0     cols:  6 +0    [   1.0 ms]
ℹ summarise(n = n_distinct(rn))                            rows:     1 -19    cols:  2 -4    [   1.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 others
  added: n

Options

All display options are controlled via logrittr_options():

logrittr_options()
#> $wrap_width
#> [1] 52
#> $big_mark
#> [1] " "
#> $lang
#> [1] "en"
#> $max_cols
#> [1] 5

Language

Switch to French with lang = "fr" (the metrics line uses lignes instead of rows):

logrittr_options(lang = "fr")

iris %>=%
  select(Species, Sepal.Length, Sepal.Width) %>=%
  filter(Sepal.Length > 5)

── iris  [lignes:       150  cols:    5] ─────────────────────────────────────────
ℹ select(Species, Sepal.Length, Sepal.Width)      lignes:   150 +0    cols:    3 -2  [   3.0 ms]
  dropped: Petal.Length, Petal.Width
ℹ filter(Sepal.Length > 5)                        lignes:   118 -32   cols:    3 +0  [   1.0 ms]

Thousands separator

logrittr_options(lang = "en", big_mark = ",")

big <- data.frame(x = seq_len(1e6), y = rnorm(1e6))
big %>=% filter(x > 500000)

── big  [rows: 1,000,000  cols:    2] ────────────────────────────────────────────
ℹ filter(x > 5e+05)                       rows:   500,000 -500000   cols:    2 +0    [  11.0 ms]

or underscore

── big  [rows: 1_000_000  cols:    2] ────────────────────────────────────────────
ℹ filter(x > 5e+05)                       rows:   500_000 -500000   cols:    2 +0    [  11.0 ms]

Column name truncation

When a select or join adds or drops many columns, only the first max_cols names are shown to keep the log readable:

logrittr_options(max_cols = 2, lang = "en")

iris %>=%
  as_tibble() %>=%
  select(Species, Sepal.Length)

── iris  [rows:       150  cols:    5] ───────────────────────────────────────────────
ℹ as_tibble()                                 rows:   150 +0    cols:  5 +0   [   0.0 ms]
ℹ select(Species, Sepal.Length)               rows:   150 +0    cols:  2 -3   [   1.0 ms]
  dropped: Sepal.Width, Petal.Length, and 1 other

Use max_cols = Inf to always display all names:

logrittr_options(max_cols = Inf)

Restoring defaults

logrittr_options() invisibly returns the previous values, which makes it easy to restore the state after a temporary change:

old <- logrittr_options(lang = "fr", big_mark = ",")
# ... work ...
do.call(logrittr_options, old)  # restore previous state

Using logrittr with lumberjack

If you already use the lumberjack package, logrittr_logger plugs directly into its %L>% pipe. The same console output as %>=% is produced, and you keep access to all lumberjack features (run_file(), custom loggers, etc.).

library(lumberjack)
library(dplyr)

iris  %L>%
  start_log(log = logrittr_logger$new(), label = "Iris Example") %L>%
  as_tibble() %L>%
  filter(Sepal.Length < 5) %L>%
  mutate(rn = row_number()) %L>%
  group_by(Species) %L>%
  summarise(n = n_distinct(rn)) %L>%
  dump_log(stop = TRUE)

── Iris Example  [rows:       150  cols:    5] ───────────────────────────────────────
ℹ as_tibble()                                      rows:       150 +0        cols:    5 +0    [    NA ms]
ℹ filter(Sepal.Length < 5)                         rows:        22 -128      cols:    5 +0    [  47.0 ms]
ℹ mutate(rn = row_number())                        rows:        22 +0        cols:    6 +1    [   5.0 ms]
  added: rn
ℹ group_by(Species)                                rows:        22 +0        cols:    6 +0    [  11.0 ms]
ℹ summarise(n = n_distinct(rn))                    rows:         3 -19       cols:    2 -4    [   4.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, rn
  added: n
✔ Log from Iris Example step written to ~/Documents/GitHub/logrittr/Iris Example_simple.csv

The first step always shows NA ms because lumberjack does not provide a start time – elapsed is measured as the interval between consecutive steps.