Create new columns and a row per each string pattern instance in dplyr-CodePudding

I have the following table:

MatchID	Player	Event
1096	Marcel	NA
1096	Juan	G70'IO85'
1090	Andre	P43'G87'

I want to create two new columns based on the column Event. The new columns are: Event_type and Event_time, where the event type is the letter and event time is the numeric value after the letter. For example, in G70', the event type is G, and the time is 70. Note that the resulting table should show each event in a separate line. P43'G87' has two events for example. If Event has nothing, then show the new columns as NAs.

The resulting table should look like this:

MatchID	Player	Event_type	Event_time
1096	Marcel	NA	NA
1096	Juan	G	70
1096	Juan	IO	85
1090	Andre	P	43
1090	Andre	G	87

My approach to the problem is by using the following functions:

df%mutate(event_type=str_match_all(Event,"[A-Z] "),
          event_time=str_match_all(Event,"[0-9] "))

The issue is that this doesn't give each event in a separate row. If it finds two events like in here: P43'G87', it just puts in the same row: "P,G". How can I separate them into a single line per event like it's shown in the expected table?

CodePudding user response：

We split the 'Event' column at the ' to expand the rows, then filter out the blank ("") elements, and capture the non digits ((\\D )) and the digits ((\\d )) separately from the 'Event' to create two columns 'Event_type' and 'Event_time'

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(Event, sep = "'") %>% 
  filter(nzchar(Event)) %>%
   extract(Event, into = c("Event_type", "Event_time"), 
     "(\\D )(\\d )", convert = TRUE)

-output

# A tibble: 5 × 4
  MatchID Player Event_type Event_time
    <int> <chr>  <chr>           <int>
1    1096 Marcel <NA>               NA
2    1096 Juan   G                  70
3    1096 Juan   IO                 85
4    1090 Andre  P                  43
5    1090 Andre  G                  87

data

df <- structure(list(MatchID = c(1096L, 1096L, 1090L), Player = c("Marcel", 
"Juan", "Andre"), Event = c(NA, "G70'IO85'", "P43'G87'")), 
class = "data.frame", row.names = c(NA, 
-3L))