Remove characters from a character column in R based on values in another column-CodePudding

I have a data frame of team names and abbreviations. Currently, the team name column also contains the abbreviation. I'm trying to remove the abbreviation from the team name column to avoid repeating information.

Here's my current data frame:

Team Abbr.	Team Name
ARK	ArkansasARK
BSU	Boise StateBSU
DART	DartmouthDART

My desired output is this:

Team Abbr.	Team Name
ARK	Arkansas
BSU	Boise State
DART	Dartmouth

Thanks!

CodePudding user response：

Here is a base R solution. Loop by rows with apply and replace the 1st column's value in the 2nd with the emp string.

df1 <- read.table(text = "
'Team Abbr.'    'Team Name'
ARK     ArkansasARK
BSU     'Boise StateBSU'
DART    DartmouthDART
", header = TRUE)
df1
#>   Team.Abbr.      Team.Name
#> 1        ARK    ArkansasARK
#> 2        BSU Boise StateBSU
#> 3       DART  DartmouthDART

df1$Team.Name <- apply(df1, 1, \(x) sub(x[1], "", x[2]))
df1
#>   Team.Abbr.   Team.Name
#> 1        ARK    Arkansas
#> 2        BSU Boise State
#> 3       DART   Dartmouth

^{Created on 2022-02-16 by the reprex package (v2.0.1)}

CodePudding user response：

We may use str_remove which is vectorized for both pattern and string

library(dplyr)
library(stringr)
df1 %>% 
  mutate(TeamName = str_remove(TeamName, fixed(TeamAbbr)))
  TeamAbbr    TeamName
1      ARK    Arkansas
2      BSU Boise State
3     DART   Dartmouth

If we want to do this from a column in different dataset and the lengths are different, one option is to paste (str_c) the elements together with | (OR) as pattern

df2 %>% 
  mutate(TeamName = str_remove(TeamName, str_c(TeamAbbr,
       collapse = "|")))

data

df1 <- structure(list(TeamAbbr = c("ARK", "BSU", "DART"), 
TeamName = c("ArkansasARK", 
"Boise StateBSU", "DartmouthDART")), class = "data.frame", row.names = c(NA, 
-3L))