I have a data frame called fruits where each row has up to 3 fruits with their corresponding color. Color1 goes with Fruit1, Color2 with Fruit2, and Color3 with Fruit3.
Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
1 red green green apple mango kiwi
2 yellow green red banana plum mango
3 green red grape apple
4 yellow apple
Using dplyr, I can return the rows that contain apples (1, 3 and 4). And I can return the rows with red (1, 2 and 3).
red <- filter_at(fruits, vars(Color1:Color3), any_vars(. == "red"))
apple <- filter_at(fruits, vars(Fruit1:Fruit3), any_vars(. == "apple"))
But how do I return only red apples, i.e. just the first row (Color1 = red, Fruit1 = apple) and the third (Color2 = red, Fruit2 = apple)?
Thanks.
p.s. Here is the code for table
Color1 <- c("red", "yellow", "green", "yellow")
Color2 <- c("green", "green", "red", "")
Color3 <- c("green", "red", "", "")
Fruit1 <- c("apple", "banana", "grape", "apple")
Fruit2 <- c("mango", "plum", "apple", "")
Fruit3 <- c("kiwi", "mango", "", "")
fruits <- data.frame (Color1, Color2, Color3, Fruit1, Fruit2, Fruit3)
CodePudding user response:
You can work with the sets of columns independently, create logical matrices, then combine them logically with &
.
Up front:
- if you have
NA
values in your data, this will need some mods to work properly; - this presumes that all columns are in the same order; for instance, if your columns were ordered "Color1, Color2, Color3" and "Fruit3, Fruit2, Fruit1", then this will not pair things correctly.
Assuming dplyr
:
select(fruits, starts_with("Color")) == "red"
# Color1 Color2 Color3
# 1 TRUE FALSE FALSE
# 2 FALSE FALSE TRUE
# 3 FALSE TRUE FALSE
# 4 FALSE FALSE FALSE
select(fruits, starts_with("Fruit")) == "apple"
# Fruit1 Fruit2 Fruit3
# 1 TRUE FALSE FALSE
# 2 FALSE FALSE FALSE
# 3 FALSE TRUE FALSE
# 4 TRUE FALSE FALSE
select(fruits, starts_with("Color")) == "red" & select(fruits, starts_with("Fruit")) == "apple"
# Color1 Color2 Color3
# 1 TRUE FALSE FALSE
# 2 FALSE FALSE FALSE
# 3 FALSE TRUE FALSE
# 4 FALSE FALSE FALSE
From here,
fruits %>%
filter(
rowSums(
select(., starts_with("Color")) == "red" &
select(., starts_with("Fruit")) == "apple"
) > 0)
# Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
# 1 red green green apple mango kiwi
# 3 green red . grape apple .
Data. Because I didn't have yours initially, I first crafted this with .
(since reading empty columns takes more effort than I initially had time for).
fruits <- structure(list(Color1 = c("red", "yellow", "green", "yellow"), Color2 = c("green", "green", "red", "."), Color3 = c("green", "red", ".", "."), Fruit1 = c("apple", "banana", "grape", "apple"), Fruit2 = c("mango", "plum", "apple", "."), Fruit3 = c("kiwi", "mango", ".", ".")), class = "data.frame", row.names = c("1", "2", "3", "4"))
CodePudding user response:
I feel that your data might be in a less-than-ideal shape. (not tidy).
Maybe the task you're trying to achieve should be easier if you tidy up the data first.
library(tidyverse)
Color1 <- c("red", "yellow", "green", "yellow")
Color2 <- c("green", "green", "red", "")
Color3 <- c("green", "red", "", "")
Fruit1 <- c("apple", "banana", "grape", "apple")
Fruit2 <- c("mango", "plum", "apple", "")
Fruit3 <- c("kiwi", "mango", "", "")
fruits <- data.frame (Color1, Color2, Color3, Fruit1, Fruit2, Fruit3)
long_fruits <- fruits %>%
## following r2evans suggestion to include row identifier in order to allow re-pivoting if needed
rownames_to_column("row_id") %>%
pivot_longer(-"row_id", names_to = c(".value", "ID"), names_pattern = "(\\w )(\\d )")
long_fruits
#> # A tibble: 12 × 4
#> row_id ID Color Fruit
#> <chr> <chr> <chr> <chr>
#> 1 1 1 "red" "apple"
#> 2 1 2 "green" "mango"
#> 3 1 3 "green" "kiwi"
#> 4 2 1 "yellow" "banana"
#> 5 2 2 "green" "plum"
#> 6 2 3 "red" "mango"
#> 7 3 1 "green" "grape"
#> 8 3 2 "red" "apple"
#> 9 3 3 "" ""
#> 10 4 1 "yellow" "apple"
#> 11 4 2 "" ""
#> 12 4 3 "" ""
long_fruits %>%
filter(Fruit == "apple", Color == "red")
#> # A tibble: 2 × 4
#> row_id ID Color Fruit
#> <chr> <chr> <chr> <chr>
#> 1 1 1 red apple
#> 2 3 2 red apple
Created on 2021-12-21 by the reprex package (v2.0.1)
CodePudding user response:
Here's an alternate solution using tidyverse
/purrr
:
This will match the columns that end in the same number (i.e., Color1
and Fruit1
, Color20
and Fruit20
)
There is an assumption that each color will have a matching fruit, or else the indexing (.[1]
and .[2]
will fail). You can also replace the "red" and "apple" with different values if needed.
subset_func <- function(data, num) {
out <- data %>%
mutate(id = row_number()) %>%
select(id, ends_with(num)) %>%
filter(.[2] == "red" & .[3] == "apple")
data %>%
mutate(id = row_number()) %>%
filter(id %in% out$id) %>%
select(-id)
}
map_df(as.character(1:3), ~subset_func(fruits, .))
This gives us:
Color1 Color2 Color3 Fruit1 Fruit2 Fruit3
1 red green green apple mango kiwi
3 green red . grape apple .