Home > Back-end >  R function to search in character string
R function to search in character string

Time:04-24

I have a dataframe like this enter image description here

Now When I am loading this dataframe in R I want something like this

when a number in NAICS_CD col is present in top_3 , I need is_present col as 1 else 0 I need this to be done with R programming enter image description here

CodePudding user response:

We could use str_detect from stringr package together with an ifelse statement:

library(dplyr)
library(stringr)

df %>% 
  mutate(is_present = ifelse(str_detect(top_3, as.character(NAICS_CD)), 1, 0))
  NAICS_CD                        top_3 is_present
1   541611 ["541611","541618","611430"]          1
2   812990 ["561720","561740","561790"]          0
3   424950 ["444120","711510","811121"]          0
4   722330 ["311991","722310","722320"]          0
5   722320 ["722320","722330","722310"]          1
6   531180 ["531110","531190","531111"]          0
7   484121 ["484121","484110","484230"]          1
8   531311 ["531110","531311","531111"]          1

data:

df <- structure(list(NAICS_CD = c(541611L, 812990L, 424950L, 722330L, 
722320L, 531180L, 484121L, 531311L), top_3 = c("[\"541611\",\"541618\",\"611430\"]", 
"[\"561720\",\"561740\",\"561790\"]", "[\"444120\",\"711510\",\"811121\"]", 
"[\"311991\",\"722310\",\"722320\"]", "[\"722320\",\"722330\",\"722310\"]", 
"[\"531110\",\"531190\",\"531111\"]", "[\"484121\",\"484110\",\"484230\"]", 
"[\"531110\",\"531311\",\"531111\"]")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

CodePudding user response:

You can use grepl to check whether any pattern (in this case any targeted number) is present or not in the targeted row, and then use any to return a single TRUE or FALSE to be suitable with conditional ifelse. Then, use ifelse to assign 1 or 0 to is_present column. For example:

top3_row1 <-'["541611", "541618","611430"]'
is_present <- ifelse(any(grepl("54161", top3_row1)), 1, 0)
is_present
[1] 1

To apply this to your data frame, you can use for loop or other ways. For example:

for(k in 1:nrow(mydf)){
mydf$is_present[k] <- ifelse(any(grepl(mydf$NAICS_CD[k], mydf$top_3)), 1, 0)
}
  • Related