In R, I have a data frame, which has a column 'food' with 100 different string values.
For instance:
id<-c("1", "2", "3", "4", "5", "6")
food <- c("X1_", "X2_", "X3_", "X4_", "X5_", "X100_")
df <- data.frame(id, food)
I would like to create a new column ‘food_final’ based on the strings in the column ‘food’. I started writing the code using nested ifelses and grepl, but given that there are 100 different string values, I know having 100 if elses is definitely not the cleanest way of doing this and in any case, there is a limit to how many one can have.
Example of what I have tried so far:
df$food_final<-ifelse(grepl("X1_", df$food, ignore.case=TRUE), "1",
ifelse(grepl("X2_", df$food, ignore.case=TRUE), "2",
ifelse(grepl("X3_", df$food, ignore.case=TRUE), "3",
ifelse(grepl("X4_", df$food, ignore.case=TRUE), "4",
ifelse(grepl("X5_", df$food, ignore.case=TRUE), "5",
ifelse(grepl("X100_", df$food, ignore.case=TRUE), "100", NA))))))
What is the best way of creating this new column 'food_final', instead of using so many nested ifelse statements?
Thank you in advance.
CodePudding user response:
You might just be able to use a single line solution with the help of sub
:
df$food_final <- sub("^X(\\d )_$", "\\1", df$food)
CodePudding user response:
If you're just trying to extract the number from the string I like to use parse_number
from readr
.
df$food_final<-parse_number(df$food)
CodePudding user response:
In case you want to extract the number:
df$food_final <- gsub("\\D", "", df$food)
df
# id food food_final
#1 1 X1_ 1
#2 2 X2_ 2
#3 3 X3_ 3
#4 4 X4_ 4
#5 5 X5_ 5
#6 6 X100_ 100
or in case there are different linkages, doing basically the same what you are doing with the nested ifelse
.
x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
apply(sapply(x, grepl, df$food, ignore.case=TRUE), 1, function(y) names(x)[y][1])
#[1] "1" "2" "3" "4" "5" "100"
Or using Reduce
:
x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
Reduce(function(a,b) {
i <- is.na(a)
a[i][grepl(x[b], df$food[i], ignore.case=TRUE)] <- b
a
}, names(x), rep(NA, nrow(df)))
#[1] "1" "2" "3" "4" "5" "100"
CodePudding user response:
You can also use str_extract
to extract just the digits:
library(stringr)
df$food_final <- str_extract(df$food, "\\d ")
CodePudding user response:
We could use extract_numeric
from tidyr
package:
library(dplyr)
library(tidyr)
df %>%
mutate(final_food = extract_numeric(food))
output:
id food final_food
1 1 X1_ 1
2 2 X2_ 2
3 3 X3_ 3
4 4 X4_ 4
5 5 X5_ 5
6 6 X100_ 100