Unfortunately I have a problem and would have to ask for your support.
I have a list of the most popular pet names. From the dataframe x, it appears that "Jerry" is the most popular mouse name, "Garfield" is the second most popular cat name, and so on.
x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
rank = rep(c(1,2,3),4))
Dataframe y also contains pets and their names:
y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))
I would now like to find out if the names of the animals in dataframe y are among the most popular animal names and if so, what the rank of that name would be. Thus, three columns should be created for each of the three ranks. If there is then a match with the most popular pet names a TRUE should be entered in the column with the corresponding rank. For example, "Pam" is not one of the most popular mouse names, so FALSE is entered in each of the three columns (or NA, I wouldn't care about that). On the other hand, Garfield is one of the most popular cat names, so the value TRUE should be entered in the rank2 column. This result should be stored in dataframe z.
z <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"),
rank1 = c(FALSE, FALSE, FALSE, FALSE, TRUE),
rank2 = c(FALSE, FALSE, TRUE, FALSE, FALSE),
rank3 = c(FALSE, FALSE, FALSE, TRUE, FALSE))
I already thought about using the functions spread()
or pivot_wider()
from the tidyr
package, but unfortunately I can't get the right result.
I would be very happy about your help and thank you very much in advance!
CodePudding user response:
The simplest way to get ranks on your y
object is to use left_join
. Since you just want to create three simple columns, it is probably easiest to just use mutate
:
library(dplyr)
x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
rank = rep(c(1,2,3),4))
y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))
z = left_join(y, x) %>%
mutate(rank1 = rank == 1,
rank2 = rank == 2,
rank3 = rank == 3) %>%
select(-rank) %>% # removing the rank column
print()
#Joining, by = c("animals", "names")
# animals names rank1 rank2 rank3
#1 Mice Pam NA NA NA
#2 Cats Michael NA NA NA
#3 Cats Garfield FALSE TRUE FALSE
#4 Dogs Sirius FALSE FALSE TRUE
#5 Birds Tweety TRUE FALSE FALSE
CodePudding user response:
Another approach with left_join
. What is different with the above solutions is that I only use one left_join
, and use an ifelse
statement to assign NA
to FALSE
, others to TRUE
.
library(tidyverse)
left_join(y, x, by = c("animals", "names")) %>%
pivot_wider(names_from = "rank", values_from = "rank", names_prefix = "rank", names_sort = T) %>%
mutate(across(starts_with("rank"), ~ifelse(is.na(.x), F, T))) %>%
select(-rankNA)
# A tibble: 5 × 5
animals names rank1 rank2 rank3
<chr> <chr> <lgl> <lgl> <lgl>
1 Mice Pam FALSE FALSE FALSE
2 Cats Michael FALSE FALSE FALSE
3 Cats Garfield FALSE TRUE FALSE
4 Dogs Sirius FALSE FALSE TRUE
5 Birds Tweety TRUE FALSE FALSE
CodePudding user response:
You can use pivot_wider
after just one ìnner_join
to dummy encode the ranks into multiple logical columns. This will discard animals which are not appearing in x
and thus have a unknown rank.
library(tidyverse)
x <- data.frame(
animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs", "Birds", "Birds", "Birds"),
names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
rank = rep(c(1, 2, 3), 4)
)
y <- data.frame(
animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety")
)
y %>%
inner_join(x) %>%
# dummy encoding
mutate(present = TRUE) %>%
pivot_wider(
names_from = rank, values_from = present, names_prefix = "rank",
values_fill = list(present = FALSE)
)
#> Joining, by = c("animals", "names")
#> # A tibble: 3 × 5
#> animals names rank2 rank3 rank1
#> <chr> <chr> <lgl> <lgl> <lgl>
#> 1 Cats Garfield TRUE FALSE FALSE
#> 2 Dogs Sirius FALSE TRUE FALSE
#> 3 Birds Tweety FALSE FALSE TRUE
Created on 2022-04-07 by the reprex package (v2.0.0)
CodePudding user response:
One option is to serially left_join()
x
onto y
and then convert the NA
into logical
at the end.
library(tidyverse)
x <- data.frame(animals = c("Mice", "Mice", "Mice", "Cats", "Cats", "Cats", "Dogs", "Dogs", "Dogs" ,"Birds", "Birds", "Birds"),
names = c("Jerry", "Speedy", "Feivel", "Tom", "Garfield", "Pumpkin", "Snoopy", "Pluto", "Sirius", "Tweety", "Donald", "Hedwig"),
rank = rep(c(1,2,3),4))
y <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds"),
names = c("Pam", "Michael", "Garfield", "Sirius", "Tweety"))
y %>%
left_join(x %>% filter(rank == 1) %>% rename(rank1 = rank)) %>%
left_join(x %>% filter(rank == 2) %>% rename(rank2 = rank)) %>%
left_join(x %>% filter(rank == 3) %>% rename(rank3 = rank)) %>%
mutate(across(starts_with("rank"), ~if_else(is.na(.x), F, T)))
#> animals names rank1 rank2 rank3
#> 1 Mice Pam FALSE FALSE FALSE
#> 2 Cats Michael FALSE FALSE FALSE
#> 3 Cats Garfield FALSE TRUE FALSE
#> 4 Dogs Sirius FALSE FALSE TRUE
#> 5 Birds Tweety TRUE FALSE FALSE
Created on 2022-04-07 by the reprex package (v2.0.1)