I have the following dataframe Teams
:
Team Year
012 Hortney 2017
012 Hortney 2018
013 James 2017
013 James 2018
014 Ilωus hero 2017
014 Ilωus hero 2018
015 Hortna 2017
015 Hortna 2018
016 Exclus race 2017
#with 25 000 more rows
And would like to transform it into the below df:
code name Year
012 Hortney 2017
012 Hortney 2018
013 James 2017
013 James 2018
014 Ilωus hero 2017
014 Ilωus hero 2018
015 Hortna 2017
015 Hortna 2018
016 Exclus race 2017
#with 25 000 more rows
I've tried this code separate(Team, c("code", "name"))
but it makes the names of data strange (especially the Greek letter (ω), where everything after it disappears, and I must have ω intact for later coding. The last part of the name disappears as well in Exclus. Like this: (within brackets what I'm looking for)
code name Year
012 Hortney 2017
012 Hortney 2018
013 James 2018
014 Il 2017 (Ilωus hero)
014 Il 2018 (Ilωus hero)
015 Hortna 2017
015 Hortna 2018
016 Exclus 2017 (Exclus race)
#With 25 00 more rows
Anyone have any ideas?
CodePudding user response:
Try this
library(dplyr)
Teams |>
mutate(code = gsub("\\D" , "" , Team) ,
name = trimws(gsub("\\d" , "" , Team))) |>
select(code , name , Year)
- output
code name Year
1 012 Hortney 2017
2 012 Hortney 2018
3 013 James 2017
4 013 James 2018
5 014 Ilωus hero 2017
6 014 Ilωus hero 2018
7 015 Hortna 2017
8 015 Hortna 2018
9 016 Exclus race 2017
CodePudding user response:
With some regex and the stringr
package:
require(stringr)
Teams = data.frame(Team = c(
"012 Hortney", " 013 James ", " 018 Alain Philippe have a very long name"),
Year = c( 2017, 2018, 2017) ) # The data for reproducible example
dplyr::mutate(Teams,
Team = str_squish(Team), # Supress the unwanted space in variable Team
code = str_extract(Team, "[0-9]*"),
# Extract the first successive digits in the variable Team
name = str_extract(Team, "[:alpha:] [ ?[:alpha:]]*") ) %>%
# Extract the first successive letters of the variable Team, possibly with a space between the letters.
dplyr::select(code, name, Year)
Output
code name Year
1 012 Hortney 2017
2 013 James 2018
3 018 Alain Philippe have a very long name 2017
Please note that the variable 'code' will be exclusively composed of the first digits in the variable'Team', with no "." or "-" between the numbers. For example a number like '01.18' before the team name will result in a 'code' variable with the value '01': the regex will be blocked by the character ('.') in the team number.