How to differentiate and sort data in R-CodePudding

Here is part of my data

dat<-read.table (text=" Name1   Weight1 Name2   Weight2 Name3   Weight3 Name4   Weight4 Name5   Weight5 Name6   Weight6 Name7   Weight7 Name8   Weight8 Name9   Weight9 Name10  Weight10

Rose    Y   Moli    N   Ali     N   Mo      Y   Ko      N   Rose    N   Ali     N   Moli    N   Rose    N   Ko      Y
Ali Y   Bob     N   Bob     N   Magg    N   Alo     N   Sarah   N   Ali Y   Rose    N   Bob     N   Sarah   N
Rose    Y   Moli    Y   Ali     N   Mo      N   Ko      N   Rose    N   Ali     Y   Moli    N   Rose    Y   Ko      Y
    ", header=TRUE)

The logic is that when two or more different Names say "Y", we get Y. When two names are the same and say Y, we get N. As an example row 2, Ali. Next, I want to count Y . So the outcome is

No  Weight
3   Y
2   N
4   Y

CodePudding user response：

I don't know if I understand the logic very well, but apparently I will only bring N when I have only a single individual repeating Y in the line. Otherwise, when I have two distinct individuals stating Y, I will always bring Y.

res = apply(dat, 1, function(l) {
  df = data.frame(matrix(as.character(l), nc = 2, byrow = T))
  df = subset(df, X2 == 'Y')
  if (length(unique(df$X1)) == 1) {
    return(data.frame(No = nrow(df), Weight = 'N'))
  } else {
    return(data.frame(No = length(unique(df$X1)), Weight = 'Y'))
  }
})
do.call(rbind, res)

CodePudding user response：

Your conditions are not mutually exclusive. For example, row 3 meets both conditions

There are 4 names with weight = "Y": Rose, Moli, Ali, and Ko
There is also a name that is repeated, and has "Y": Rose.

Therefore, I've shown below how to get both conditions separately:

library(data.table)
setDT(dat)[,row:=.I]
dat = cbind(
  melt(dat[, .SD, .SDcols = patterns("row|^W")],id="row")[, .(row,weight=value)],
  melt(dat[, .SD, .SDcols = patterns("row|^N")],id="row")[, .(name=value)]
)

# rows with two or more different names with "Y"
dat[weight=="Y", uniqueN(name), by=row][V1>=2, row]

Output

[1] 1 3

# rows with two same names, both with weight "Y"
dat[weight=="Y", .N, by=.(name,row)][N>=2, row]

Output

[1] 2 3