Home > Back-end >  Easiest way to replace non-NA values by column index
Easiest way to replace non-NA values by column index

Time:12-16

I have a data frame like the following, with some NAs:

mydf=data.frame(ID=LETTERS[1:10], aaa=runif(10), bbb=runif(10), ccc=runif(10), ddd=runif(10))
mydf[c(1,4,5,7:10),2]=NA
mydf[c(1,2,4:8),3]=NA
mydf[c(3,4,6:10),4]=NA
mydf[c(1,3,4,6,9,10),5]=NA

> mydf
   ID       aaa       bbb        ccc       ddd
1   A        NA        NA 0.08844614        NA
2   B 0.4912790        NA 0.88925139 0.1233173
3   C 0.1325188 0.1389260         NA        NA
4   D        NA        NA         NA        NA
5   E        NA        NA 0.60750723 0.6357998
6   F 0.8218579        NA         NA        NA
7   G        NA        NA         NA 0.5988206
8   H        NA        NA         NA 0.4008338
9   I        NA 0.8784563         NA        NA
10  J        NA 0.2959320         NA        NA

What I want to accomplish here is the following:

1- replace non-NA values by column index -1, so that the output looks like this:

> mydf
   ID       aaa       bbb        ccc       ddd
1   A        NA        NA          3        NA
2   B         1        NA          3         4
3   C         1         2         NA        NA
4   D        NA        NA         NA        NA
5   E        NA        NA          3         4
6   F         1        NA         NA        NA
7   G        NA        NA         NA         4
8   H        NA        NA         NA         4
9   I        NA         2         NA        NA
10  J        NA         2         NA        NA

2- Then I would like to add an extra column that shows the following:

  • 0 for all NAs in a row
  • 0 for a row with more than 1 non-NA value
  • the actual value when it is the only non-NA value in a row

The final result should look like this:

> mydf
   ID       aaa       bbb        ccc       ddd       final
1   A        NA        NA          3        NA           3
2   B         1        NA          3         4           0
3   C         1         2         NA        NA           0
4   D        NA        NA         NA        NA           0
5   E        NA        NA          3         4           0
6   F         1        NA         NA        NA           1
7   G        NA        NA         NA         4           4
8   H        NA        NA         NA         4           4
9   I        NA         2         NA        NA           2
10  J        NA         2         NA        NA           2

I could probably do all this with an ugly for loop, then aggregate for the final column, and substitute by 0 where appropriate...

But I was wondering if there would be a clean way to do this with some apply calls in just a few lines...

Thanks!

CodePudding user response:

You could do:

mydf[-1] <- sapply(1:4, \(x) x * mydf[x 1]/mydf[x 1])

mydf$final <- apply(mydf[-1], 1, function(x) { 
  if(all(is.na(x)) | sum(!is.na(x)) > 1) 0 else na.omit(x)
})

Result:

mydf
#>    ID aaa bbb ccc ddd final
#> 1   A  NA  NA   3  NA     3
#> 2   B   1  NA   3   4     0
#> 3   C   1   2  NA  NA     0
#> 4   D  NA  NA  NA  NA     0
#> 5   E  NA  NA   3   4     0
#> 6   F   1  NA  NA  NA     1
#> 7   G  NA  NA  NA   4     4
#> 8   H  NA  NA  NA   4     4
#> 9   I  NA   2  NA  NA     2
#> 10  J  NA   2  NA  NA     2

Created on 2022-12-16 with reprex v2.0.2

CodePudding user response:

Here is an idea,

mydf1 <- cbind.data.frame(ID = mydf$ID, mapply(function(x, y) replace(x, !is.na(x), y), 
                                               mydf, 
                                               seq(ncol(mydf)) - 1)[,-1])

mydf1$final <- apply(mydf1[-1], 1, \(i) 
       ifelse(sum(is.na(i)) == (ncol(mydf) - 1) | sum(!is.na(i)) > 1, 0, i[!is.na(i)]))

mydf1
   ID  aaa  bbb  ccc  ddd final
1   A <NA> <NA>    3 <NA>     3
2   B    1 <NA>    3    4     0
3   C    1    2 <NA> <NA>     0
4   D <NA> <NA> <NA> <NA>     0
5   E <NA> <NA>    3    4     0
6   F    1 <NA> <NA> <NA>     1
7   G <NA> <NA> <NA>    4     4
8   H <NA> <NA> <NA>    4     4
9   I <NA>    2 <NA> <NA>     2
10  J <NA>    2 <NA> <NA>     2
  • Related