I have a data frame that looks like this :
var |
---|
A_CAT |
B_DOG |
A_CAT |
F_HORSE |
GEORGE_DOG |
HeLeN_CAT |
and I want to look like this :
var | var_new |
---|---|
A_CAT | CAT |
B_DOG | DOG |
A_CAT | CAT |
F_HORSE | HORSE |
GEORGE_DOG | DOG |
HeLeN_CAT | CAT |
How can I do this in R ?
library(tidyverse)
var = c("A_CAT","B_DOG","A_CAT","F_HORSE","GEORGE_DOG","HeLeN_CAT")
df = tibble(var);df
CodePudding user response:
df %>%
mutate(var_new = str_remove(var, '. _'))
# A tibble: 6 × 2
var var_new
<chr> <chr>
1 A_CAT CAT
2 B_DOG DOG
3 A_CAT CAT
4 F_HORSE HORSE
5 GEORGE_DOG DOG
6 HeLeN_CAT CAT
CodePudding user response:
Using R base sub
> df$var_new <- sub(".*_(.*)$", "\\1", df$var)
> df
# A tibble: 6 × 2
var var_new
<chr> <chr>
1 A_CAT CAT
2 B_DOG DOG
3 A_CAT CAT
4 F_HORSE HORSE
5 GEORGE_DOG DOG
6 HeLeN_CAT CAT
CodePudding user response:
We could use str_extract
to extract the desired srings, by using (?i)
we could make the search case insensitive:
librar(dplyr)
library(stringr)
df %>%
mutate(var_new = str_extract(var, "(?i)CAT|Dog|Horse"))
var var_new
1 A_CAT CAT
2 B_DOG DOG
3 A_CAT CAT
4 F_HORSE HORSE
5 GEORGE_DOG DOG
6 HeLeN_CAT CAT