I am working to extend the following example based on the question Using a dataframe to input values to a function and aggregating the outputs
I've added a third team (Mary Bill), but am now having some problems because the "team3" contains a player that doesn't have any data, so the output is just "Mary"
The other issue I have is that the output of "team1" moves from "Mary Frank" in the input in "teams" to "Frank Mary" in the output (that is, the order is around the other way).
This makes it difficult to reconcile the output back to the teams.
I can make team1 and team2 work if I use regex and the start and finish of the teams$V2, but that doesn't help with missing data (ie. Bill).
Modified input:
input = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Frank", "Mary", "John"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(55, 76, 14)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
teams = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Mary Frank","Mary John", "Mary Bill")), class = "data.frame", row.names = c(NA, -3L))
result = function(data, combination)
{
purrr::map_df(lapply(combination,function(x){paste(x,collapse = '|')}), function(x){
df = data[grepl(x,data$V2),] %>% group_by(V3) %>% summarize(V1= paste(V1[1]), V2= paste(V2,collapse = ' '), V4= sum(V4))
data = df[,c(2,3,1,4)]
return(data)
}
)
}
apply(teams, 1, function(x) {
y <- list(str_split(x[2], ' \\ ', simplify = T))
result(input, y)
}) %>% do.call("rbind", .)
Output
structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"),
V2 = c("Frank Mary", "Mary John", "Mary"), V3 = c("Sydney",
"Sydney", "Sydney"), V4 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
Can anyone suggest how I can get to my desired output - either keeps the team1, team2 in the same order so I can join them back afterwards; or joins the team name as part of the input?
output1 = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"),
V2 = c("Mary Frank","Mary John", "Mary Bill"), V3 = c("Sydney",
"Sydney", "Sydney"), V4 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
output2 = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Team_2022", "Team_2022", "Team_2022"),
V3 = c("Frank Mary", "John Mary","Mary Bill"), V4 = c("Sydney",
"Sydney", "Sydney"), V5 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
Or even (because I could at least tie the team3 back to missing values):
output3 = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Team_2022", "Team_2022", "Team_2022"),
V3 = c("Frank Mary", "John Mary","Mary"), V4 = c("Sydney", "Sydney", "Sydney"), V5 = c(69, 90, 14)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
library(dplyr); library(tidyr)
teams %>%
separate_rows(V2) %>%
left_join(input, by = c("V2" = "V2")) %>%
replace_na(list(V4 = 0)) %>%
group_by(V1.x) %>% fill(V1.y, V3) %>%
summarize(V2 = paste(V2, collapse = " "),
V3 = first(V3),
V4 = sum(V4))
Result
# A tibble: 3 × 4
V1.x V2 V3 V4
<chr> <chr> <chr> <dbl>
1 team1 Mary Frank Sydney 131
2 team2 Mary John Sydney 90
3 team3 Mary Bill Sydney 76