Home > database >  Is there a way to force mutate in R to use single values rather than whole column values?
Is there a way to force mutate in R to use single values rather than whole column values?


For example, I have tried:

df <- tibble(x = c(1,0,0,1), y = c(1,1,0,0), z = c(0,1,1,0))
df <- df %>% mutate(pos_seq = c(x,y,z))
df <- df %>% rowwise() %>% mutate(pos_seq = c(x,y,z))

both of which give errors due to different sizes.

And I have tried:

df <- tibble(x = c(1,0,0,1), y = c(1,1,0,0), z = c(0,1,1,0))
df <- df %>% mutate(pos_seq = list(c(x,y,z)))
df <- df %>% rowwise() %>% mutate(pos_seq = list(c(x,y,z)))

which makes pos_seq a list of the full x column, y column, z column, not just the single row values.

Same problem when I use a different way of 'aggregating' x/y/z e.g. mutate(pos_str = paste((x,y,z), collapse = "")). I'm not understanding why something like sum() works on single row values, but other functions don't. Can I force it?

I want this result:

x y z pos_seq pos_str
1 1 0 c(1,1,0) "110"
0 1 1 c(0,1,1) "011"
0 0 1 c(0,0,1) "001"
1 0 0 c(1,0,0) "100"

In reality, I am wanting to run a complex function on a dataset that needs to take multiple variables from the row and use them, including characters and vectors, and some of these decisions rely on aggregates like "pos_seq" or "pos_str". But this demo seems to be the stem of my problems.

CodePudding user response:

You could use list and c in combination with rowwise for your pos_seq column and use paste0 with collapse to create one string of the values for your pos_str column like this:

df <- tibble(x = c(1,0,0,1), y = c(1,1,0,0), z = c(0,1,1,0))
df %>% 
  rowwise() %>% 
  mutate(pos_seq = list(c(x,y,z)),
         pos_str = paste0(pos_seq, collapse = ""))
#> # A tibble: 4 × 5
#> # Rowwise: 
#>       x     y     z pos_seq   pos_str
#>   <dbl> <dbl> <dbl> <list>    <chr>  
#> 1     1     1     0 <dbl [3]> 110    
#> 2     0     1     1 <dbl [3]> 011    
#> 3     0     0     1 <dbl [3]> 001    
#> 4     1     0     0 <dbl [3]> 100

Created on 2022-07-11 by the reprex package (v2.0.1)

CodePudding user response:

Her a one-liner that does the job.

cbind(df, list2DF(list(pos_seq=apply(df, 1, list))), pos_str=Reduce(paste0, df))
#   x y z pos_seq pos_str
# 1 1 1 0 1, 1, 0     110
# 2 0 1 1 0, 1, 1     011
# 3 0 0 1 0, 0, 1     001
# 4 1 0 0 1, 0, 0     100
  • Related