Fill Columns with TRUE or FALSE based on other Dataframe-CodePudding

Unfortunately I have a problem and would have to ask for your support.

I have a list of the most popular pet names. From the dataframe x, it appears that "Jerry" is the most popular mouse name, "Garfield" is the second most popular cat name, and so on.

x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
            names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
            rank = rep(c(1,2,3),4))

Dataframe y also contains pets and their names:

y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
            names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))

I would now like to find out if the names of the animals in dataframe y are among the most popular animal names and if so, what the rank of that name would be. Thus, three columns should be created for each of the three ranks. If there is then a match with the most popular pet names a TRUE should be entered in the column with the corresponding rank. For example, "Pam" is not one of the most popular mouse names, so FALSE is entered in each of the three columns (or NA, I wouldn't care about that). On the other hand, Garfield is one of the most popular cat names, so the value TRUE should be entered in the rank2 column. This result should be stored in dataframe z.

z <-  data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
             names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"),
             rank1 = c(FALSE, FALSE, FALSE, FALSE, TRUE),
             rank2 = c(FALSE, FALSE, TRUE, FALSE, FALSE),
             rank3 = c(FALSE, FALSE, FALSE, TRUE, FALSE))

I already thought about using the functions spread() or pivot_wider() from the tidyr package, but unfortunately I can't get the right result.

I would be very happy about your help and thank you very much in advance!

CodePudding user response：

The simplest way to get ranks on your y object is to use left_join. Since you just want to create three simple columns, it is probably easiest to just use mutate:

library(dplyr)

x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
                names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
                rank = rep(c(1,2,3),4))

y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
                names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))

z = left_join(y, x) %>% 
  mutate(rank1 = rank == 1,
         rank2 = rank == 2,
         rank3 = rank == 3) %>% 
  select(-rank) %>% # removing the rank column
  print()
#Joining, by = c("animals", "names")
#  animals    names rank1 rank2 rank3
#1    Mice      Pam    NA    NA    NA
#2    Cats  Michael    NA    NA    NA
#3    Cats Garfield FALSE  TRUE FALSE
#4    Dogs   Sirius FALSE FALSE  TRUE
#5   Birds   Tweety  TRUE FALSE FALSE

CodePudding user response：

Another approach with left_join. What is different with the above solutions is that I only use one left_join, and use an ifelse statement to assign NA to FALSE, others to TRUE.

library(tidyverse)

left_join(y, x, by = c("animals", "names")) %>% 
  pivot_wider(names_from = "rank", values_from = "rank", names_prefix = "rank", names_sort = T) %>% 
  mutate(across(starts_with("rank"), ~ifelse(is.na(.x), F, T))) %>% 
  select(-rankNA)

# A tibble: 5 × 5
  animals names    rank1 rank2 rank3
  <chr>   <chr>    <lgl> <lgl> <lgl>
1 Mice    Pam      FALSE FALSE FALSE
2 Cats    Michael  FALSE FALSE FALSE
3 Cats    Garfield FALSE TRUE  FALSE
4 Dogs    Sirius   FALSE FALSE TRUE 
5 Birds   Tweety   TRUE  FALSE FALSE

CodePudding user response：

You can use pivot_wider after just one ìnner_join to dummy encode the ranks into multiple logical columns. This will discard animals which are not appearing in x and thus have a unknown rank.

library(tidyverse)

x <- data.frame(
  animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs", "Birds", "Birds", "Birds"),
  names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
  rank = rep(c(1, 2, 3), 4)
)


y <- data.frame(
  animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
  names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety")
)

y %>%
  inner_join(x) %>%
  # dummy encoding
  mutate(present = TRUE) %>%
  pivot_wider(
    names_from = rank, values_from = present, names_prefix = "rank",  
    values_fill = list(present = FALSE)
  )
#> Joining, by = c("animals", "names")
#> # A tibble: 3 × 5
#>   animals names    rank2 rank3 rank1
#>   <chr>   <chr>    <lgl> <lgl> <lgl>
#> 1 Cats    Garfield TRUE  FALSE FALSE
#> 2 Dogs    Sirius   FALSE TRUE  FALSE
#> 3 Birds   Tweety   FALSE FALSE TRUE

^{Created on 2022-04-07 by the reprex package (v2.0.0)}

CodePudding user response：

One option is to serially left_join() x onto y and then convert the NA into logical at the end.

library(tidyverse)

x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
            names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
            rank = rep(c(1,2,3),4))

y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
            names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))

y %>% 
  left_join(x %>% filter(rank == 1) %>% rename(rank1 = rank)) %>% 
  left_join(x %>% filter(rank == 2) %>% rename(rank2 = rank)) %>% 
  left_join(x %>% filter(rank == 3) %>% rename(rank3 = rank)) %>% 
  mutate(across(starts_with("rank"), ~if_else(is.na(.x), F, T)))
#>   animals    names rank1 rank2 rank3
#> 1    Mice      Pam FALSE FALSE FALSE
#> 2    Cats  Michael FALSE FALSE FALSE
#> 3    Cats Garfield FALSE  TRUE FALSE
#> 4    Dogs   Sirius FALSE FALSE  TRUE
#> 5   Birds   Tweety  TRUE FALSE FALSE

^{Created on 2022-04-07 by the reprex package (v2.0.1)}