I have the following table:
MatchID | Player | Event |
---|---|---|
1096 | Marcel | NA |
1096 | Juan | G70'IO85' |
1090 | Andre | P43'G87' |
I want to create two new columns based on the column Event. The new columns are: Event_type and Event_time, where the event type is the letter and event time is the numeric value after the letter. For example, in G70', the event type is G, and the time is 70. Note that the resulting table should show each event in a separate line. P43'G87' has two events for example. If Event has nothing, then show the new columns as NAs.
The resulting table should look like this:
MatchID | Player | Event_type | Event_time |
---|---|---|---|
1096 | Marcel | NA | NA |
1096 | Juan | G | 70 |
1096 | Juan | IO | 85 |
1090 | Andre | P | 43 |
1090 | Andre | G | 87 |
My approach to the problem is by using the following functions:
df%mutate(event_type=str_match_all(Event,"[A-Z] "),
event_time=str_match_all(Event,"[0-9] "))
The issue is that this doesn't give each event in a separate row. If it finds two events like in here: P43'G87', it just puts in the same row: "P,G". How can I separate them into a single line per event like it's shown in the expected table?
CodePudding user response:
We split the 'Event' column at the '
to expand the rows, then filter
out the blank (""
) elements, and capture the non digits ((\\D )
) and the digits ((\\d )
) separately from the 'Event' to create two columns 'Event_type' and 'Event_time'
library(dplyr)
library(tidyr)
df %>%
separate_rows(Event, sep = "'") %>%
filter(nzchar(Event)) %>%
extract(Event, into = c("Event_type", "Event_time"),
"(\\D )(\\d )", convert = TRUE)
-output
# A tibble: 5 × 4
MatchID Player Event_type Event_time
<int> <chr> <chr> <int>
1 1096 Marcel <NA> NA
2 1096 Juan G 70
3 1096 Juan IO 85
4 1090 Andre P 43
5 1090 Andre G 87
data
df <- structure(list(MatchID = c(1096L, 1096L, 1090L), Player = c("Marcel",
"Juan", "Andre"), Event = c(NA, "G70'IO85'", "P43'G87'")),
class = "data.frame", row.names = c(NA,
-3L))