I want take a feature from reference column and search through all columns in dataset and sum up their corresponding value of features. I have dataset like this:
Reference X1 X2 X3 X4
Feature A Feature A 0.99 Feature A 0.99
Feature B Feature B 0.77 Feature C 0.89
Feature C Feature C 0.89 Feature D 0.65
Feature D Feature D 0.65 Feature B 0.77
Feature E
I want to make a new dataframe with feature name and their sum up score. for example a new data frame i want like this:
Feature column Score
Feature A 1.98
Feature B 1.54
CodePudding user response:
Perhaps this helps
stack(tapply(as.matrix(df1[c(3, 5)]), as.matrix(df1[c(2, 4)]), FUN = sum))[2:1]
ind values
1 Feature A 1.98
2 Feature B 1.54
3 Feature C 1.78
4 Feature D 1.30
data
df1 <- structure(list(Reference = c("Feature A", "Feature B", "Feature C",
"Feature D"), X1 = c("Feature A", "Feature B", "Feature C", "Feature D"
), X2 = c(0.99, 0.77, 0.89, 0.65), X3 = c("Feature A", "Feature C",
"Feature D", "Feature B"), X4 = c(0.99, 0.89, 0.65, 0.77)),
class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
This is a little complex because of the format of your data, but you could do
result <- sapply(df$Reference, function(i) {
sum(as.numeric(df[do.call(rbind,
lapply(seq_along(df[-1]), function(j) {
if(any(df[[j 1]] == i)) {
rows <- which(df[[j 1]] == i)
cbind(rows, rep(j 2, length(rows)))
} else NULL}))]))
})
data.frame(Feature = names(result), Score = c(result), row.names = NULL)
#> Feature Score
#> 1 Feature A 1.98
#> 2 Feature B 1.54
#> 3 Feature C 1.78
#> 4 Feature D 1.30