I have a data frame of team names and abbreviations. Currently, the team name column also contains the abbreviation. I'm trying to remove the abbreviation from the team name column to avoid repeating information.
Here's my current data frame:
Team Abbr. | Team Name |
---|---|
ARK | ArkansasARK |
BSU | Boise StateBSU |
DART | DartmouthDART |
My desired output is this:
Team Abbr. | Team Name |
---|---|
ARK | Arkansas |
BSU | Boise State |
DART | Dartmouth |
Thanks!
CodePudding user response:
Here is a base R solution. Loop by rows with apply
and replace the 1st column's value in the 2nd with the emp string.
df1 <- read.table(text = "
'Team Abbr.' 'Team Name'
ARK ArkansasARK
BSU 'Boise StateBSU'
DART DartmouthDART
", header = TRUE)
df1
#> Team.Abbr. Team.Name
#> 1 ARK ArkansasARK
#> 2 BSU Boise StateBSU
#> 3 DART DartmouthDART
df1$Team.Name <- apply(df1, 1, \(x) sub(x[1], "", x[2]))
df1
#> Team.Abbr. Team.Name
#> 1 ARK Arkansas
#> 2 BSU Boise State
#> 3 DART Dartmouth
Created on 2022-02-16 by the reprex package (v2.0.1)
CodePudding user response:
We may use str_remove
which is vectorized for both pattern and string
library(dplyr)
library(stringr)
df1 %>%
mutate(TeamName = str_remove(TeamName, fixed(TeamAbbr)))
TeamAbbr TeamName
1 ARK Arkansas
2 BSU Boise State
3 DART Dartmouth
If we want to do this from a column in different dataset and the lengths are different, one option is to paste
(str_c
) the elements together with |
(OR
) as pattern
df2 %>%
mutate(TeamName = str_remove(TeamName, str_c(TeamAbbr,
collapse = "|")))
data
df1 <- structure(list(TeamAbbr = c("ARK", "BSU", "DART"),
TeamName = c("ArkansasARK",
"Boise StateBSU", "DartmouthDART")), class = "data.frame", row.names = c(NA,
-3L))