Here is part of my data
dat<-read.table (text=" Name1 Weight1 Name2 Weight2 Name3 Weight3 Name4 Weight4 Name5 Weight5 Name6 Weight6 Name7 Weight7 Name8 Weight8 Name9 Weight9 Name10 Weight10
Rose Y Moli N Ali N Mo Y Ko N Rose N Ali N Moli N Rose N Ko Y
Ali Y Bob N Bob N Magg N Alo N Sarah N Ali Y Rose N Bob N Sarah N
Rose Y Moli Y Ali N Mo N Ko N Rose N Ali Y Moli N Rose Y Ko Y
", header=TRUE)
The logic is that when two or more different Names say "Y", we get Y. When two names are the same and say Y, we get N. As an example row 2, Ali. Next, I want to count Y . So the outcome is
No Weight
3 Y
2 N
4 Y
CodePudding user response:
I don't know if I understand the logic very well, but apparently I will only bring N when I have only a single individual repeating Y in the line. Otherwise, when I have two distinct individuals stating Y, I will always bring Y.
res = apply(dat, 1, function(l) {
df = data.frame(matrix(as.character(l), nc = 2, byrow = T))
df = subset(df, X2 == 'Y')
if (length(unique(df$X1)) == 1) {
return(data.frame(No = nrow(df), Weight = 'N'))
} else {
return(data.frame(No = length(unique(df$X1)), Weight = 'Y'))
}
})
do.call(rbind, res)
CodePudding user response:
Your conditions are not mutually exclusive. For example, row 3 meets both conditions
- There are 4 names with weight = "Y": Rose, Moli, Ali, and Ko
- There is also a name that is repeated, and has "Y": Rose.
Therefore, I've shown below how to get both conditions separately:
library(data.table)
setDT(dat)[,row:=.I]
dat = cbind(
melt(dat[, .SD, .SDcols = patterns("row|^W")],id="row")[, .(row,weight=value)],
melt(dat[, .SD, .SDcols = patterns("row|^N")],id="row")[, .(name=value)]
)
# rows with two or more different names with "Y"
dat[weight=="Y", uniqueN(name), by=row][V1>=2, row]
Output
[1] 1 3
# rows with two same names, both with weight "Y"
dat[weight=="Y", .N, by=.(name,row)][N>=2, row]
Output
[1] 2 3