Home > Software design >  dplyr: Why do some operations work "rowwise" without calling rowwise() and others dont?
dplyr: Why do some operations work "rowwise" without calling rowwise() and others dont?

Time:01-08

I am still trying to figure out, how rowwise works exactly in R/dplyr.

For example I have this code:

library(dplyr)
df = data.frame(
  group = c("a", "a", "a", "b", "b", "c"),
  var1 = 1:6,
  var2 = 7:12
)

df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group), # work on rows
    meanNotRW = mean(c(var1, var2)), # works not on rows
    charsNotRW = strsplit(concatNotRW, "-") # works on rows
  ) %>%
  rowwise() %>%
  mutate(
    concatRW = paste0(var1, "-", group), # all work on rows
    meanRW = mean(c(var1, var2)),
    charsRW = strsplit(concatRW, "-")
  ) -> res

The res dataframe looks like this:

  group  var1  var2 concatNotRW meanNotRW charsNotRW concatRW meanRW chars    
  <chr> <int> <int> <chr>           <dbl> <list>     <chr>     <dbl> <list>   
1 a         1     7 1-a               6.5 <chr [2]>  1-a           4 <chr [2]>
2 a         2     8 2-a               6.5 <chr [2]>  2-a           5 <chr [2]>
3 a         3     9 3-a               6.5 <chr [2]>  3-a           6 <chr [2]>
4 b         4    10 4-b               6.5 <chr [2]>  4-b           7 <chr [2]>
5 b         5    11 5-b               6.5 <chr [2]>  5-b           8 <chr [2]>
6 c         6    12 6-c               6.5 <chr [2]>  6-c           9 <chr [2]>

What I do not understand is why paste0 can take each cell of a row and pastes them together (essentially performing a rowwise-operation), yet mean can't do that. What am I missing and are there any rules on what already works rowwise without the call to rowwise() ? I did not find so much info in the rowwise()-vignette here https://dplyr.tidyverse.org/articles/rowwise.html

CodePudding user response:

paste can take vectors as input in the variadic argument (...) and return the same length as vector whereas mean takes the variadic argument for other inputs (trim etc) and return a single value. Here we need rowMeans. Regarding strsplit, it returns a list of split elements

library(dplyr)
df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group),
    meanNotRW = rowMeans(across(c(var1, var2))),
    charsNotRW = strsplit(concatNotRW, "-") 
  )

> mean(c(1:5, 6:10))
[1] 5.5

Note that the vector we are passing is a single vector by concatenating both vectors 1:5 and 6:10

whereas

> paste(1:5, 6:10)
[1] "1 6"  "2 7"  "3 8"  "4 9"  "5 10"

are two vectors passed into paste


For splitting the column into two columns, we can use separate

library(tidyr)
 df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group),
    meanNotRW = rowMeans(across(c(var1, var2)))) %>% 
    separate(concatNotRW, into = c("ind", "chars"))
 group var1 var2 ind chars meanNotRW
1     a    1    7   1     a         4
2     a    2    8   2     a         5
3     a    3    9   3     a         6
4     b    4   10   4     b         7
5     b    5   11   5     b         8
6     c    6   12   6     c         9

Why some operations work on rowwise depends on the function. If the function is vectorized, it works on the whole column and doesn't need rowwise. Here, both functions paste and mean are vectorized except that paste is vectorized for variadic input and mean is only vectorized to take a single vector and return a single value as output. Suppose, we have a function that checks each value with if/else, then it is not vectorized as if/else expects a single logical value. In that case, can use either rowwise or Vectorize the function

  • Related