I have an R dataframe where one of the columns is a comma delimited string. I want to add a new column to the dataset to show whether the column contains a particular value
For example
> data <- data.frame(a = 1:5, b = c("123", "6475,320", "475", "905,1204,543", "567,475"))
> data
a b
1 1 123
2 2 6475,320
3 3 475
4 4 905,1204,543
5 5 567,475
I want to create a new column to indicate whether b
contains 475, which would leave me with
a b has_475
1 1 123 FALSE
2 2 6475,320 FALSE
3 3 475 TRUE
4 4 905,1204,543 FALSE
5 5 567,475 TRUE
CodePudding user response:
You can use boundaries '\b' to look for the number. This will ensure things like 1475
24756
are not matched
data$has_475 <- grepl('\\b475\\b', data$b)
data
a b has_475
1 1 123 FALSE
2 2 6475,320 FALSE
3 3 475 TRUE
4 4 905,1204,543 FALSE
5 5 567,475 TRUE
6 6 1475 FALSE
CodePudding user response:
You can use this regular expression
data["has_475"] = grepl("(^|,)475(,|$)",data$b)
Output:
a b has_475
1 1 123 FALSE
2 2 6475,320 FALSE
3 3 475 TRUE
4 4 905,1204,543 FALSE
5 5 567,475 TRUE