I have 6 columns of numeric and non numeric data in R as follows:
V1 V2 V3 V4 V5 V6
1 N abc M 'apple' 2 60
2 C pqr R 'banana' 3 70
3 N pqr M 'tomato' 1 50
4 D abc A 'cheese' 2 300
5 D uio R 'potato' 1 60
6 C xyz A 'milk' 5 200
7 N gef M 'milk' 6 500
8 D wvy A 'yogurt' 1 300
9 C abc A 'apple' 7 100
10 C abc R 'potato' 8 100
I want to group the data into 2 groups according to some characteristics using the 6 columns.
For example: Grocery category: if V1= 'N' or V1=C and V3='M' and V4= 'apple' or V4= 'banana' or V4 = 'potato'or V4= 'tomato' and if V1='N' it is necessary to consider V6 <=100$ etc Milk Category = whatever does not belong to grocery.
How would I do it?
I tried using the case_when but it didn't work.
CodePudding user response:
Here is an approach using case_when
. I'm not sure what you tried that didn't work, please let me know if I can clarify further.
You can use %in%
to see if a particular letter or word is contained in a vector, as alternative to having multiple "OR" operations.
The final TRUE
case will be considered if there are no TRUE
evaluations earlier in the case_when
statement.
Edit: Added additional logic that would consider V6
in the event that V1
is "N".
df %>%
mutate(category = case_when(
(V1 == "C" | (V1 == "N" & V6 <= 100)) &
V3 == "M" &
V4 %in% c("apple", "banana", "potato", "tomato") ~ "grocery",
TRUE ~ "milk"
))
Output
V1 V2 V3 V4 V5 V6 category
1 N abc M apple 2 60 grocery
2 C pqr R banana 3 700 milk
3 N pqr M tomato 1 50 grocery
4 D abc A cheese 2 300 milk
5 D uio R potato 1 60 milk
6 C xyz A milk 5 20 milk
7 N gef M milk 6 500 milk
8 D wvy A yogurt 1 30 milk
9 C abc A apple 7 600 milk
10 C abc R potato 8 400 milk
CodePudding user response:
Probably not the nicest way but I would use ifelse()
Sample code:
category<-0
category<-ifelse((df$V1=="N" | df$V1=="C") & (df$V4=="apple" | df$V4=="banana" | df$V4=="potato" | df$V4=="tomato"), "Grocery", "Milk")
category<-as.data.frame(category)
ex<-cbind(category,df)
ex
Output:
category V1 V2 V3 V4 V5 V6
1 Grocery N abc M apple 2 60
2 Grocery C pqr R banana 3 700
3 Grocery N pqr M tomato 1 50
4 Milk D abc A cheese 2 300
5 Milk D uio R potato 1 60
6 Milk C xyz A milk 5 20
7 Milk N gef M milk 6 500
8 Milk D wvy A youghurt 1 30
9 Grocery C abc A apple 7 600
10 Grocery C abc R potato 8 400