In R, indicate whether another column has more than one unique value per ID-CodePudding

Background

Here's a dataset, d:

d <- data.frame(ID = c("a","a","b","b"),                  
                product_code = c("B78","X31","C12","C12"),                  
                stringsAsFactors=FALSE)

It looks like this:

The Problem and desired output

I'm trying to make an indicator column multiple_products that's marked 1 for IDs which have more than one unique product_code and 0 for those that don't. Here's what I'm looking for:

My attempts haven't worked yet, though.

What I've Tried

Here's my current code:

d <- d %>% 
  group_by(ID) %>%
  mutate(multiple_products = if_else(length(unique(d$product_code)) > 1, 1, 0)) %>%
  ungroup()

And this is the result:

Any thoughts?

CodePudding user response：

The d$ should be taken out as this will extract the whole column by removing the group attributes. Also, there is n_distinct. In addition, there is no need for ifelse or if_else as logical values (TRUE/FALSE) can be directly coerced to 1/0 as these are storage values by either using as.integer or

library(dplyr)
d %>% 
  group_by(ID) %>%
  mutate(multiple_products =  (n_distinct(product_code) > 1)) %>%
  ungroup()

-output

# A tibble: 4 x 3
  ID    product_code multiple_products
  <chr> <chr>                    <int>
1 a     B78                          1
2 a     X31                          1
3 b     C12                          0
4 b     C12                          0

CodePudding user response：

solution with data.table;

library(data.table)

setDT(d)

d[,multiple_products:=rleid(product_code),by=ID][
    ,multiple_products:=ifelse(max(multiple_products)>1,1,0),by=ID]
    
d

output;

  ID    product_code multiple_products
  <chr> <chr>                    <int>
1 a     B78                          1
2 a     X31                          1
3 b     C12                          0
4 b     C12                          0

CodePudding user response：

A base R option using ave

transform(
  d,
  multiple_products =  (ave(match(product_code, unique(product_code)), ID, FUN = var) > 0
  )
)

gives

  ID product_code multiple_products
1  a          B78                 1
2  a          X31                 1
3  b          C12                 0
4  b          C12                 0