Home > Mobile >  Remove characters from a character column in R based on values in another column
Remove characters from a character column in R based on values in another column

Time:02-17

I have a data frame of team names and abbreviations. Currently, the team name column also contains the abbreviation. I'm trying to remove the abbreviation from the team name column to avoid repeating information.

Here's my current data frame:

Team Abbr. Team Name
ARK ArkansasARK
BSU Boise StateBSU
DART DartmouthDART

My desired output is this:

Team Abbr. Team Name
ARK Arkansas
BSU Boise State
DART Dartmouth

Thanks!

CodePudding user response:

Here is a base R solution. Loop by rows with apply and replace the 1st column's value in the 2nd with the emp string.

df1 <- read.table(text = "
'Team Abbr.'    'Team Name'
ARK     ArkansasARK
BSU     'Boise StateBSU'
DART    DartmouthDART
", header = TRUE)
df1
#>   Team.Abbr.      Team.Name
#> 1        ARK    ArkansasARK
#> 2        BSU Boise StateBSU
#> 3       DART  DartmouthDART

df1$Team.Name <- apply(df1, 1, \(x) sub(x[1], "", x[2]))
df1
#>   Team.Abbr.   Team.Name
#> 1        ARK    Arkansas
#> 2        BSU Boise State
#> 3       DART   Dartmouth

Created on 2022-02-16 by the reprex package (v2.0.1)

CodePudding user response:

We may use str_remove which is vectorized for both pattern and string

library(dplyr)
library(stringr)
df1 %>% 
  mutate(TeamName = str_remove(TeamName, fixed(TeamAbbr)))
  TeamAbbr    TeamName
1      ARK    Arkansas
2      BSU Boise State
3     DART   Dartmouth

If we want to do this from a column in different dataset and the lengths are different, one option is to paste (str_c) the elements together with | (OR) as pattern

df2 %>% 
  mutate(TeamName = str_remove(TeamName, str_c(TeamAbbr,
       collapse = "|"))) 

data

df1 <- structure(list(TeamAbbr = c("ARK", "BSU", "DART"), 
TeamName = c("ArkansasARK", 
"Boise StateBSU", "DartmouthDART")), class = "data.frame", row.names = c(NA, 
-3L))
  •  Tags:  
  • r
  • Related