Home > Back-end >  How to identify and then extract any special character from a column and save it as a new list in R
How to identify and then extract any special character from a column and save it as a new list in R

Time:02-23

I have a dataframe in R with a 'Name' column that contains some special characters - some more obvious than others.

Input

NAMES

�OS� M�REN�

P*TE* CAR** **

#LEX ##OPPS

Desired output

list of values that represent the 'special characters'

[#, *, �, ...]

I am currently flagging which rows contains these characters with the following code but I just want to identify and then create a new list of values that represent the non-ascii characters.

Code

library(dplyr)
df %>% mutate(
  has_non_letters = grepl("[^\\p{L} ]", df$names, perl = TRUE)

CodePudding user response:

Base R approach:

x <- unique(unlist(strsplit(df$NAMES, "")))
x <- x[x !=" "]
x <- gsub("[0-9A-Za-z/' ]","" , x ,ignore.case = TRUE)
x <- x[x !=""]
x
[1] "�" "*" "#"

First answer: For this example we could:

library(dplyr)
library(stringr)

df %>% 
  mutate(x = str_remove_all(NAMES, '[A-Z]')) %>% 
  pull(x)
[1] "�� ��"    "** ** **" "# ##" 
  • Related