Home > Blockchain >  R Change smallest value in group based on condition
R Change smallest value in group based on condition

Time:12-09

I would like to know how to change the smallest non-zero value in group if the count of a condition in the group is 1.

For example, given the data frame:

df1 <- data.frame(x = unlist(map(letters[1:3], function(i) rep(i,4))),
                  y = rep('var',12),
                  z = c(c(10,0,'x',40), c(1,2,3,6),c(1,'x','x',6)))

df1

   x   y  z
1  a var 10
2  a var  0
3  a var  x
4  a var 40
5  b var  1
6  b var  2
7  b var  3
8  b var  6
9  c var  1
10 c var  x
11 c var  x
12 c var  6

I would like a[1,3] to change to x as there is only one "x" in the group a from col x, and the 10 is the smallest non-zero value in that group as to obtain the data frame:

  x   y  z
1  a var  x
2  a var  0
3  a var  x
4  a var 40
5  b var  1
6  b var  2
7  b var  3
8  b var  6
9  c var  1
10 c var  x
11 c var  x
12 c var  6

Thanks!

CodePudding user response:

We group by 'x', create a if/else condition by checking the count of 'x' values in 'z', if the count is 1, then replace the values in 'z' where the 'z' value is equal to the min of the numeric values (after the 0 is converted to NA - na_if) to 'x'

library(dplyr)
library(stringr)
df1 %>% 
   group_by(x) %>% 
   mutate(z = if(sum(z == 'x') == 1) replace(z, 
       z == min(as.numeric(str_subset(na_if(z, '0'), '^[0-9.] $')),
           na.rm = TRUE), 'x') else z) %>% 
   ungroup

-output

# A tibble: 12 × 3
   x     y     z    
   <chr> <chr> <chr>
 1 a     var   x    
 2 a     var   0    
 3 a     var   x    
 4 a     var   40   
 5 b     var   1    
 6 b     var   2    
 7 b     var   3    
 8 b     var   6    
 9 c     var   1    
10 c     var   x    
11 c     var   x    
12 c     var   6    

CodePudding user response:

I think akruns solution is better, but maybe just as an idea and because I like data.table more than dplyr:

library(data.table)
df1 = data.table(df1)

for (i in unique(df1$x)) {
  if (length(df1[x==i & z=="x", z]) == 1){
    df1[x==i & z==min(df1[x==i & z!=0, z]), z:="x"]
  }
}

And the output:

 > df1
    x   y  z
 1: a var  x
 2: a var  0
 3: a var  x
 4: a var 40
 5: b var  1
 6: b var  2
 7: b var  3
 8: b var  6
 9: c var  1
10: c var  x
11: c var  x
12: c var  6
  • Related