R - search for two conditions across two sets of columns-CodePudding

I have a data frame called fruits where each row has up to 3 fruits with their corresponding color. Color1 goes with Fruit1, Color2 with Fruit2, and Color3 with Fruit3.

  Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
1    red  green  green  apple  mango   kiwi
2 yellow  green    red banana   plum  mango
3  green    red         grape  apple       
4 yellow                apple

Using dplyr, I can return the rows that contain apples (1, 3 and 4). And I can return the rows with red (1, 2 and 3).

red <- filter_at(fruits, vars(Color1:Color3), any_vars(. == "red"))
apple <- filter_at(fruits, vars(Fruit1:Fruit3), any_vars(. == "apple"))

But how do I return only red apples, i.e. just the first row (Color1 = red, Fruit1 = apple) and the third (Color2 = red, Fruit2 = apple)?

Thanks.

p.s. Here is the code for table

Color1 <- c("red", "yellow", "green", "yellow")
Color2 <- c("green", "green", "red", "")
Color3 <- c("green", "red", "", "")
Fruit1 <- c("apple", "banana", "grape", "apple")
Fruit2 <- c("mango", "plum", "apple", "")
Fruit3 <- c("kiwi", "mango", "", "")

fruits <- data.frame (Color1, Color2, Color3, Fruit1, Fruit2, Fruit3)

CodePudding user response：

You can work with the sets of columns independently, create logical matrices, then combine them logically with &.

Up front:

if you have NA values in your data, this will need some mods to work properly;
this presumes that all columns are in the same order; for instance, if your columns were ordered "Color1, Color2, Color3" and "Fruit3, Fruit2, Fruit1", then this will not pair things correctly.

Assuming dplyr:

select(fruits, starts_with("Color")) == "red"
#   Color1 Color2 Color3
# 1   TRUE  FALSE  FALSE
# 2  FALSE  FALSE   TRUE
# 3  FALSE   TRUE  FALSE
# 4  FALSE  FALSE  FALSE
select(fruits, starts_with("Fruit")) == "apple"
#   Fruit1 Fruit2 Fruit3
# 1   TRUE  FALSE  FALSE
# 2  FALSE  FALSE  FALSE
# 3  FALSE   TRUE  FALSE
# 4   TRUE  FALSE  FALSE
select(fruits, starts_with("Color")) == "red" & select(fruits, starts_with("Fruit")) == "apple"
#   Color1 Color2 Color3
# 1   TRUE  FALSE  FALSE
# 2  FALSE  FALSE  FALSE
# 3  FALSE   TRUE  FALSE
# 4  FALSE  FALSE  FALSE

From here,

fruits %>%
  filter(
    rowSums(
      select(., starts_with("Color")) == "red" &
        select(., starts_with("Fruit")) == "apple"
    ) > 0)
#   Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
# 1    red  green  green  apple  mango   kiwi
# 3  green    red      .  grape  apple      .

Data. Because I didn't have yours initially, I first crafted this with . (since reading empty columns takes more effort than I initially had time for).

fruits <- structure(list(Color1 = c("red", "yellow", "green", "yellow"), Color2 = c("green", "green", "red", "."), Color3 = c("green", "red", ".", "."), Fruit1 = c("apple", "banana", "grape", "apple"), Fruit2 = c("mango", "plum", "apple", "."), Fruit3 = c("kiwi", "mango", ".", ".")), class = "data.frame", row.names = c("1", "2", "3", "4"))

CodePudding user response：

I feel that your data might be in a less-than-ideal shape. (not tidy).

Maybe the task you're trying to achieve should be easier if you tidy up the data first.

library(tidyverse)
Color1 <- c("red", "yellow", "green", "yellow") 
Color2 <- c("green", "green", "red", "") 
Color3 <- c("green", "red", "", "") 
Fruit1 <- c("apple", "banana", "grape", "apple") 
Fruit2 <- c("mango", "plum", "apple", "") 
Fruit3 <- c("kiwi", "mango", "", "") 
fruits <- data.frame (Color1, Color2, Color3, Fruit1, Fruit2, Fruit3) 

long_fruits <- fruits %>% 
  ## following r2evans suggestion to include row identifier in order to allow re-pivoting if needed
  rownames_to_column("row_id") %>%
  pivot_longer(-"row_id", names_to = c(".value", "ID"), names_pattern = "(\\w )(\\d )")

long_fruits
#> # A tibble: 12 × 4
#>    row_id ID    Color    Fruit   
#>    <chr>  <chr> <chr>    <chr>   
#>  1 1      1     "red"    "apple" 
#>  2 1      2     "green"  "mango" 
#>  3 1      3     "green"  "kiwi"  
#>  4 2      1     "yellow" "banana"
#>  5 2      2     "green"  "plum"  
#>  6 2      3     "red"    "mango" 
#>  7 3      1     "green"  "grape" 
#>  8 3      2     "red"    "apple" 
#>  9 3      3     ""       ""      
#> 10 4      1     "yellow" "apple" 
#> 11 4      2     ""       ""      
#> 12 4      3     ""       ""

long_fruits %>%
  filter(Fruit == "apple", Color == "red")
#> # A tibble: 2 × 4
#>   row_id ID    Color Fruit
#>   <chr>  <chr> <chr> <chr>
#> 1 1      1     red   apple
#> 2 3      2     red   apple

^{Created on 2021-12-21 by the reprex package (v2.0.1)}

CodePudding user response：

Here's an alternate solution using tidyverse/purrr:

This will match the columns that end in the same number (i.e., Color1 and Fruit1, Color20 and Fruit20)

There is an assumption that each color will have a matching fruit, or else the indexing (.[1] and .[2] will fail). You can also replace the "red" and "apple" with different values if needed.

subset_func <- function(data, num) {
  
  out <- data %>% 
    mutate(id = row_number()) %>% 
    select(id, ends_with(num)) %>% 
    filter(.[2] == "red" & .[3] == "apple") 

  data %>% 
    mutate(id = row_number()) %>% 
    filter(id %in% out$id) %>% 
    select(-id)
  
}

map_df(as.character(1:3), ~subset_func(fruits, .))

This gives us:

  Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
1    red  green  green  apple  mango   kiwi
3  green    red      .  grape  apple      .