I have a data frame that looks like this :
names | value |
---|---|
John123abc | 1 |
George12894xyz | 2 |
Mary789qwe | 3 |
I want to rename all the name values of the column "names" and keep only the names (not the extra numbers and characters that its name has). Imagine that the code for each name changes and I have 100.000 rows.I thing that something like starts_with("John") ="John")
Ideally i want the new data frame to look like this:
names | value |
---|---|
John | 1 |
George | 2 |
Mary | 3 |
How I can do this in R using dplyr?
library(tidyverse)
names = c("John123abc","George12894xyz","Mary789qwe")
value = c(1,2,3)
dat = tibble(names,value)
CodePudding user response:
Using strings::str_remove
you could do:
library(tidyverse)
names = c("John123abc","George12894xyz","Mary789qwe")
value = c(1,2,3)
dat = tibble(names,value)
dat |>
mutate(names = str_remove(names, "\\d .*$"))
#> # A tibble: 3 × 2
#> names value
#> <chr> <dbl>
#> 1 John 1
#> 2 George 2
#> 3 Mary 3
CodePudding user response:
Using base R
dat$names <- trimws(dat$names, whitespace = "\\d .*")
-output
> dat
# A tibble: 3 × 2
names value
<chr> <dbl>
1 John 1
2 George 2
3 Mary 3