Home > OS >  Reducing nested if else statements with grepl in R
Reducing nested if else statements with grepl in R

Time:09-17

In R, I have a data frame, which has a column 'food' with 100 different string values.

For instance:

id<-c("1", "2", "3", "4", "5", "6")
food <- c("X1_", "X2_", "X3_", "X4_", "X5_", "X100_")
df <- data.frame(id, food)

I would like to create a new column ‘food_final’ based on the strings in the column ‘food’. I started writing the code using nested ifelses and grepl, but given that there are 100 different string values, I know having 100 if elses is definitely not the cleanest way of doing this and in any case, there is a limit to how many one can have.

Example of what I have tried so far:

df$food_final<-ifelse(grepl("X1_", df$food, ignore.case=TRUE), "1",
                      ifelse(grepl("X2_", df$food, ignore.case=TRUE), "2",
                             ifelse(grepl("X3_", df$food, ignore.case=TRUE), "3",
                                    ifelse(grepl("X4_", df$food, ignore.case=TRUE), "4",
                                        ifelse(grepl("X5_", df$food, ignore.case=TRUE), "5",
                                             ifelse(grepl("X100_", df$food, ignore.case=TRUE), "100", NA))))))

What is the best way of creating this new column 'food_final', instead of using so many nested ifelse statements?

Thank you in advance.

CodePudding user response:

You might just be able to use a single line solution with the help of sub:

df$food_final <- sub("^X(\\d )_$", "\\1", df$food)

CodePudding user response:

If you're just trying to extract the number from the string I like to use parse_number from readr.

df$food_final<-parse_number(df$food)

CodePudding user response:

In case you want to extract the number:

df$food_final <- gsub("\\D", "", df$food)

df
#  id  food food_final
#1  1   X1_          1
#2  2   X2_          2
#3  3   X3_          3
#4  4   X4_          4
#5  5   X5_          5
#6  6 X100_        100

or in case there are different linkages, doing basically the same what you are doing with the nested ifelse.

x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
apply(sapply(x, grepl, df$food, ignore.case=TRUE), 1, function(y) names(x)[y][1])
#[1] "1"   "2"   "3"   "4"   "5"   "100"

Or using Reduce:

x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
Reduce(function(a,b) {
  i <- is.na(a)
  a[i][grepl(x[b], df$food[i], ignore.case=TRUE)] <- b
  a
}, names(x), rep(NA, nrow(df)))
#[1] "1"   "2"   "3"   "4"   "5"   "100"

CodePudding user response:

You can also use str_extract to extract just the digits:

library(stringr)
df$food_final <- str_extract(df$food, "\\d ")

CodePudding user response:

We could use extract_numeric from tidyr package:

library(dplyr)
library(tidyr)

df %>% 
  mutate(final_food = extract_numeric(food))

output:

  id  food final_food
1  1   X1_          1
2  2   X2_          2
3  3   X3_          3
4  4   X4_          4
5  5   X5_          5
6  6 X100_        100
  • Related