Home > Software design >  R: If my vector grouped by ID contains a certain value, return one value, else return other value
R: If my vector grouped by ID contains a certain value, return one value, else return other value

Time:06-21

Each ID variable has multiple rows, and I'd like to create a vector that tells me if any of the runs (rows within that id) contains "orange."

Otherwise, I'd like it to return "apple" if "orange" is not contained on any of the rows for that id.

I'm guessing it's something like

data_desired <- data %>%
group_by("ID") %>%
mutate(AnyOrange = ...)

but that's where I'm stuck...sample data and desired outcome below:

library(tidyverse)

data <- tribble(
  ~ID, ~Run, ~Oranges,
  #--/---/---
  "a", 1, "orange",
  "a", 2,  "orange",
  "b", 1, "apple",
  "b", 2, "apple",
  "b", 3, "orange",
  "c", 1, "apple",
  "c", 2, "apple"
)

# Desired Outcome
data <- tribble(
  ~ID, ~Run, ~Oranges, ~AnyOrange,
  #--/---/---/---
  "a", 1, "orange","orange",
  "a", 2,  "orange","orange",
  "b", 1, "apple","orange",
  "b", 2, "apple","orange",
  "b", 3, "orange","orange",
  "c", 1, "apple","apple",
  "c", 2, "apple","apple"
)

CodePudding user response:

data %>%
   group_by(ID) %>%
   mutate(AnyOrange = ifelse(any(Oranges=='orange'), 'orange', Oranges))

# A tibble: 7 x 4
# Groups:   ID [3]
  ID      Run Oranges AnyOrange
  <chr> <dbl> <chr>   <chr>    
1 a         1 orange  orange   
2 a         2 orange  orange   
3 b         1 apple   orange   
4 b         2 apple   orange   
5 b         3 orange  orange   
6 c         1 apple   apple    
7 c         2 apple   apple    

CodePudding user response:

The column names should be unquoted within the tidyverse functions. Otherwise, after grouping by 'ID', match for 'orange' to get the index of first 'orange' value, use it to subset the 'Oranges' and then coalesce with the original 'Oranges' column

library(dplyr)
data %>% 
  group_by(ID) %>%
   mutate(AnyOrange = coalesce(Oranges[match('orange', Oranges)], Oranges)) %>%
  ungroup

-output

# A tibble: 7 × 4
  ID      Run Oranges AnyOrange
  <chr> <dbl> <chr>   <chr>    
1 a         1 orange  orange   
2 a         2 orange  orange   
3 b         1 apple   orange   
4 b         2 apple   orange   
5 b         3 orange  orange   
6 c         1 apple   apple    
7 c         2 apple   apple  

CodePudding user response:

Here is an alternative dplyr approach: Basically it is similar to @onyambu's solution. Here we use %in% operator:

data %>% 
  group_by(ID) %>% 
  mutate(AnyOrange = ifelse("orange" %in% Oranges, "orange","apple"))
ID      Run Oranges AnyOrange
  <chr> <dbl> <chr>   <chr>    
1 a         1 orange  orange   
2 a         2 orange  orange   
3 b         1 apple   orange   
4 b         2 apple   orange   
5 b         3 orange  orange   
6 c         1 apple   apple    
7 c         2 apple   apple   
  • Related