Home > OS >  Create new columns and a row per each string pattern instance in dplyr
Create new columns and a row per each string pattern instance in dplyr

Time:11-29

I have the following table:

MatchID Player Event
1096 Marcel NA
1096 Juan G70'IO85'
1090 Andre P43'G87'

I want to create two new columns based on the column Event. The new columns are: Event_type and Event_time, where the event type is the letter and event time is the numeric value after the letter. For example, in G70', the event type is G, and the time is 70. Note that the resulting table should show each event in a separate line. P43'G87' has two events for example. If Event has nothing, then show the new columns as NAs.

The resulting table should look like this:

MatchID Player Event_type Event_time
1096 Marcel NA NA
1096 Juan G 70
1096 Juan IO 85
1090 Andre P 43
1090 Andre G 87

My approach to the problem is by using the following functions:

df%mutate(event_type=str_match_all(Event,"[A-Z] "),
          event_time=str_match_all(Event,"[0-9] "))

The issue is that this doesn't give each event in a separate row. If it finds two events like in here: P43'G87', it just puts in the same row: "P,G". How can I separate them into a single line per event like it's shown in the expected table?

CodePudding user response:

We split the 'Event' column at the ' to expand the rows, then filter out the blank ("") elements, and capture the non digits ((\\D )) and the digits ((\\d )) separately from the 'Event' to create two columns 'Event_type' and 'Event_time'

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(Event, sep = "'") %>% 
  filter(nzchar(Event)) %>%
   extract(Event, into = c("Event_type", "Event_time"), 
     "(\\D )(\\d )", convert = TRUE)

-output

# A tibble: 5 × 4
  MatchID Player Event_type Event_time
    <int> <chr>  <chr>           <int>
1    1096 Marcel <NA>               NA
2    1096 Juan   G                  70
3    1096 Juan   IO                 85
4    1090 Andre  P                  43
5    1090 Andre  G                  87

data

df <- structure(list(MatchID = c(1096L, 1096L, 1090L), Player = c("Marcel", 
"Juan", "Andre"), Event = c(NA, "G70'IO85'", "P43'G87'")), 
class = "data.frame", row.names = c(NA, 
-3L))
  • Related