I have a data field which consists of firm names that may contain special characters such as @,/,-. I need to identify whether the data field contains any special characters. I have tried the suggestions listed on r check if string contains special characters, How do I deal with special characters like \^$.?*| ()[{ in my regex? and R, check if special character in string but they are not giving the correct results.
The last two firm names should give a FALSE in the check field but none of the three approaches is yielding the right result. Please suggest how to correct my code. Thanks.
df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), Firm = c("Xi'an Feibao Technology Co Ltd",
"A&B PVT LTD", "Wonik Pne Co Ltd/Old","Wooree E&L Co Ltd"
, "X-Fab Silicon Foundries SE", "Yongsan S&C", "T-Gaia Corp",
"Suntech Co Ltd/Seoul","IBM","31 Inc"))
df$nwords <- str_count(df$Firm, "\\w ")
df$check1 <- grepl('[^[:alnum:]]', df$Firm)
df$check2 <- grepl('[^[:punct:]]', df$Firm)
pattern <- "/|:|\\?|<|>|\\|\\\\|\\|-|&|'|*"
df$check3 <- grepl(pattern, df$Firm)
> print(df)
ID Firm nwords check1 check2 check3
1 1 Xi'an Feibao Technology Co Ltd 6 TRUE TRUE TRUE
2 2 A&B PVT LTD 4 TRUE TRUE TRUE
3 3 Wonik Pne Co Ltd/Old 5 TRUE TRUE TRUE
4 4 Wooree E&L Co Ltd 5 TRUE TRUE TRUE
5 5 X-Fab Silicon Foundries SE 5 TRUE TRUE TRUE
6 6 Yongsan S&C 3 TRUE TRUE TRUE
7 7 T-Gaia Corp 3 TRUE TRUE TRUE
8 8 Suntech Co Ltd/Seoul 4 TRUE TRUE TRUE
9 9 IBM 1 FALSE TRUE TRUE
10 10 31 Inc 2 TRUE TRUE TRUE
CodePudding user response:
This seems to work,
grepl('[[:punct:]]', df$Firm)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE