I have this data that have these columns: goal for, goal against, and scorers. In one observation, cell in scorers column might have multiple scorers such as: "Tammy Abraham (34), Zaniolo (90)". How should I count each scorers goal at every match for the whole dataframe? To illustrate:
#create reprex to ask about goalscorer tally
gf <- c(5,4,0,0,1)
ga <- c(3,3,0,2,0)
scorer <- c("a,b,c,d,e","a,b,c,d", NA, NA,"e")
dat <- data.frame(gf,ga,scorer)
This is what the sample looks like:
gf ga scorer
1 5 3 a,b,c,d,e
2 4 3 a,b,c,d
3 0 0 <NA>
4 0 2 <NA>
5 1 0 e
Would anyone help? Also, is this reprex, enough?
I only put the goal for and goal against here in the example, but in the actual data, there are many more observations like formation, weather, competition etc. Is it possible to say that scorer a tends to score in a low scoring match in wet condition on 4-4-2 formation? Something like that. But the problem is, the scorer a is found together with other scorers in the scorer column.
CodePudding user response:
A tidyverse approach. You may want to strsplit
the scorer
string into a list and then unnest
the list:
library(dplyr)
library(tidyr)
dat |>
mutate(scorer = strsplit(scorer, ",")) |>
unnest(cols = c(scorer))
# A tibble: 12 × 3
gf ga scorer
<dbl> <dbl> <chr>
1 5 3 a
2 5 3 b
3 5 3 c
4 5 3 d
5 5 3 e
6 4 3 a
7 4 3 b
8 4 3 c
9 4 3 d
10 0 0 NA
11 0 2 NA
12 1 0 e
CodePudding user response:
Counting the number of goals per scorer and game:
library(dplyr)
library(tidyr)
dat %>%
# make match ID:
mutate(match_ID = row_number()) %>%
# split `scorer`:
separate_rows(scorer) %>%
# remove rows where `scorer` is `NA`:
filter(!is.na(scorer)) %>%
# group by:
group_by(match_ID,scorer) %>%
# count number of goals per match:
summarise(N_goals = n())
# A tibble: 10 × 3
# Groups: match_ID [3]
match_ID scorer N_goals
<int> <chr> <int>
1 1 a 1
2 1 b 1
3 1 c 1
4 1 d 1
5 1 e 1
6 2 a 1
7 2 b 1
8 2 c 1
9 2 d 1
10 5 e 1