I am still trying to figure out, how rowwise
works exactly in R/dplyr.
For example I have this code:
library(dplyr)
df = data.frame(
group = c("a", "a", "a", "b", "b", "c"),
var1 = 1:6,
var2 = 7:12
)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group), # work on rows
meanNotRW = mean(c(var1, var2)), # works not on rows
charsNotRW = strsplit(concatNotRW, "-") # works on rows
) %>%
rowwise() %>%
mutate(
concatRW = paste0(var1, "-", group), # all work on rows
meanRW = mean(c(var1, var2)),
charsRW = strsplit(concatRW, "-")
) -> res
The res
dataframe looks like this:
group var1 var2 concatNotRW meanNotRW charsNotRW concatRW meanRW chars
<chr> <int> <int> <chr> <dbl> <list> <chr> <dbl> <list>
1 a 1 7 1-a 6.5 <chr [2]> 1-a 4 <chr [2]>
2 a 2 8 2-a 6.5 <chr [2]> 2-a 5 <chr [2]>
3 a 3 9 3-a 6.5 <chr [2]> 3-a 6 <chr [2]>
4 b 4 10 4-b 6.5 <chr [2]> 4-b 7 <chr [2]>
5 b 5 11 5-b 6.5 <chr [2]> 5-b 8 <chr [2]>
6 c 6 12 6-c 6.5 <chr [2]> 6-c 9 <chr [2]>
What I do not understand is why paste0
can take each cell of a row and pastes them together (essentially performing a rowwise-operation), yet mean
can't do that. What am I missing and are there any rules on what already works rowwise without the call to rowwise()
? I did not find so much info in the rowwise()-vignette here https://dplyr.tidyverse.org/articles/rowwise.html
CodePudding user response:
paste
can take vectors as input in the variadic argument (...
) and return the same length as vector whereas mean
takes the variadic argument for other inputs (trim
etc) and return a single value. Here we need rowMeans
. Regarding strsplit
, it returns a list
of split elements
library(dplyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2))),
charsNotRW = strsplit(concatNotRW, "-")
)
> mean(c(1:5, 6:10))
[1] 5.5
Note that the vector we are passing is a single vector by c
oncatenating both vectors 1:5 and 6:10
whereas
> paste(1:5, 6:10)
[1] "1 6" "2 7" "3 8" "4 9" "5 10"
are two vectors passed into paste
For splitting the column into two columns, we can use separate
library(tidyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2)))) %>%
separate(concatNotRW, into = c("ind", "chars"))
group var1 var2 ind chars meanNotRW
1 a 1 7 1 a 4
2 a 2 8 2 a 5
3 a 3 9 3 a 6
4 b 4 10 4 b 7
5 b 5 11 5 b 8
6 c 6 12 6 c 9
Why some operations work on rowwise
depends on the function. If the function is vectorized, it works on the whole column and doesn't need rowwise
. Here, both functions paste
and mean
are vectorized except that paste
is vectorized for variadic input and mean
is only vectorized to take a single vector and return a single value as output. Suppose, we have a function that checks each value with if/else
, then it is not vectorized as if/else
expects a single logical value. In that case, can use either rowwise
or Vectorize
the function