Home > OS >  Need help categorizing data into 2 groups according to multiple columns in R?
Need help categorizing data into 2 groups according to multiple columns in R?

Time:03-07

I have 6 columns of numeric and non numeric data in R as follows:

     V1  V2      V3      V4              V5        V6
 1   N abc       M       'apple'         2         60 
 2   C pqr       R       'banana'        3        70
 3   N pqr       M       'tomato'        1         50
 4   D abc       A       'cheese'        2         300
 5   D uio       R       'potato'        1          60
 6   C xyz       A        'milk'         5          200
 7   N gef       M        'milk'         6          500
 8   D wvy       A        'yogurt'       1          300
 9   C abc       A        'apple'        7          100
 10  C abc       R        'potato'        8         100

I want to group the data into 2 groups according to some characteristics using the 6 columns.

For example: Grocery category: if V1= 'N' or V1=C and V3='M' and V4= 'apple' or V4= 'banana' or V4 = 'potato'or V4= 'tomato' and if V1='N' it is necessary to consider V6 <=100$ etc Milk Category = whatever does not belong to grocery.

How would I do it?

I tried using the case_when but it didn't work.

CodePudding user response:

Here is an approach using case_when. I'm not sure what you tried that didn't work, please let me know if I can clarify further.

You can use %in% to see if a particular letter or word is contained in a vector, as alternative to having multiple "OR" operations.

The final TRUE case will be considered if there are no TRUE evaluations earlier in the case_when statement.

Edit: Added additional logic that would consider V6 in the event that V1 is "N".

df %>%
  mutate(category = case_when(
    (V1 == "C" | (V1 == "N" & V6 <= 100)) & 
      V3 == "M" & 
      V4 %in% c("apple", "banana", "potato", "tomato") ~ "grocery",
    TRUE ~ "milk"
  ))

Output

   V1  V2 V3     V4 V5  V6 category
1   N abc  M  apple  2  60  grocery
2   C pqr  R banana  3 700     milk
3   N pqr  M tomato  1  50  grocery
4   D abc  A cheese  2 300     milk
5   D uio  R potato  1  60     milk
6   C xyz  A   milk  5  20     milk
7   N gef  M   milk  6 500     milk
8   D wvy  A yogurt  1  30     milk
9   C abc  A  apple  7 600     milk
10  C abc  R potato  8 400     milk

CodePudding user response:

Probably not the nicest way but I would use ifelse()

Sample code:

category<-0
category<-ifelse((df$V1=="N" | df$V1=="C") & (df$V4=="apple" |  df$V4=="banana" | df$V4=="potato" | df$V4=="tomato"), "Grocery", "Milk")
category<-as.data.frame(category)
                         
  ex<-cbind(category,df)
  ex

Output:

   category V1  V2 V3       V4 V5  V6
1   Grocery  N abc  M    apple  2  60
2   Grocery  C pqr  R   banana  3 700
3   Grocery  N pqr  M   tomato  1  50
4      Milk  D abc  A   cheese  2 300
5      Milk  D uio  R   potato  1  60
6      Milk  C xyz  A     milk  5  20
7      Milk  N gef  M     milk  6 500
8      Milk  D wvy  A youghurt  1  30
9   Grocery  C abc  A    apple  7 600
10  Grocery  C abc  R   potato  8 400
  • Related