Home > Software engineering >  Count pairs of non-NA observations by row in selected columns using R
Count pairs of non-NA observations by row in selected columns using R

Time:10-16

I have a dataframe:

  id    cog com emo
AUD-002 12  34  24
PAR-044 NA  28  38
BRE-019 0   NA  51
2-1-GRE NA  31  68

I am interested in counting non-NA values per row between all pairs of columns cog, com, emo

My required output is:

  id     cog com  emo cog-com cog-emo com-emo
AUD-002  12  34   24   1        1       1
PAR-044  NA  28   38   0        0       1
BRE-019  0   NA   51   0        1       0
2-1-GRE  NA  31   68   0        0       1

I found that the following question might be related: Count non-NA observations by row in selected columns but they count overall non-NA entries per row and not by pairs of columns of that row. Also, I can achieve this by using multiple statements like this:

library(dplyr)
df = df %>%
  mutate(count_cog_com = rowSums(!is.na(select(., 2:3))) - 1)

df = df %>%
  mutate(count_cog_emo = rowSums(!is.na(select(., 2,4))) - 1)

df = df %>%
  mutate(count_com_emo = rowSums(!is.na(select(., 3:4))) - 1)

But I don't want to use these on my actual data because I have several columns. Is there an easy dplyr way to achieve this functionality? Can these statements be joined somehow? Thank you fo your help!

The dput is as below:

dput(df)
structure(list(id = structure(c(2L, 4L, 3L, 1L), 
.Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), class = "factor"), cog = c(12L, NA, 0L, NA), 
com = c(34L, 28L, NA, 31L), 
emo = c(24L, 38L, 51L, 68L)), 
row.names = c(NA, -4L), class = "data.frame")

CodePudding user response:

Here is a base R way.
Function combn returns the combinations of its 1st argument and optionally applies a function to them. In this case it computes the row sums minus 1. The column names are then assigned in a similar way.

df <-
  structure(list(
    id = structure(c(2L, 4L, 3L, 1L), 
                   .Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), 
                   class = "factor"), 
    cog = c(12L, NA, 0L, NA), 
    com = c(34L, 28L, NA, 31L), 
    emo = c(24L, 38L, 51L, 68L)), 
    row.names = c(NA, -4L), class = "data.frame")

tmp <- combn(df[-1], 2, \(x) rowSums(!is.na(x)) - 1L)
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)

df
#>        id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002  12  34  24       1       1       1
#> 2 PAR-044  NA  28  38       0       0       1
#> 3 BRE-019   0  NA  51       0       1       0
#> 4 2-1-GRE  NA  31  68       0       0       1

Created on 2022-10-15 with reprex v2.0.2


Edit

Answering to the request in comment, yes, it is possible. Have the anonymous function called by combn compute the logical && and coerce the result to integer. This will return a 0 if any of the values is NA and 1 otherwise.

The line that needs to be changed is this:

tmp <- combn(df[-1], 2, \(x)  apply(!is.na(x), 1, \(y) y[1] && y[2]))

A complete code run:

tmp <- combn(df[-1], 2, \(x)  apply(!is.na(x), 1, \(y) y[1] && y[2]))
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)

df
#>        id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002  12  34  24       1       1       1
#> 2 PAR-044  NA  NA  38       0       0       0
#> 3 BRE-019   0  NA  51       0       1       0
#> 4 2-1-GRE  NA  31  68       0       0       1

Created on 2022-10-15 with reprex v2.0.2

More readable but equivalent:

tmp <- combn(df[-1], 2, \(x) {
  not_na <- apply(!is.na(x), 1, \(y) y[1] && y[2])
  as.integer(not_na)
})
  • Related