I have two tibbles and need to index the data in one tibble and insert some specific data in the other tibble based on a variable in the first tibble.
I have two tibbles:
library(dplyr)
# Set seed
set.seed(10)
# Get data
df1 <- starwars %>%
select(name,species) %>%
filter(species %in% c("Human","Wookiee","Droid")) %>%
mutate(Fav_colour = sample(c("blue","red","green"),n(),replace=TRUE))
# Random table with typical colour preference
df2 <- tibble(Colour = c("blue","green","red"),
Human = c(0.5,0.3,0.1),
Wookiee = c(0.2,0.8,0.5),
Droid = c(0.3,0.1,0.5))
In df1 I need to insert the typical colour preference based on the species. To do this I can iterate through each row of the tibble in a for loop, add the relevant data, then compile into a list.
# Make empty list
data <- list()
# Iterate through each row
for (x in 1:nrow(df1)) {
# Take a slice
tmp <- slice(df1, x)
# Add new column to slice based on data in slice (species)
tmp$Typical <- df2 %>%
select(Colour,tmp$species) %>%
arrange(desc(.data[[tmp$species]])) %>%
slice_head(n = 1) %>%
select(Colour) %>%
as.character()
#Add data to list
data[[x]] <- tmp
}
#Recompile df1
df1 <- list.rbind(data)
I think there must be a more efficient way to do this, but can't figure out how to obtain filtered and arranged values from df2 without putting it through a for loop. I don't know how to do this, but is sapply with a function perhaps better option? What is the dplyr way of doing this without a for loop?
CodePudding user response:
It sounds like you want from df2
the largest value per species. If we pivot_longer
to make the species be specified in one column, and the value in another, we can group by species and keep the largest value. This lookup table (with colour value per species) can be joined to the original data.
df1 %>%
left_join(
df2 %>%
tidyr::pivot_longer(2:4, names_to = "species") %>%
group_by(species) %>%
slice_max(value)
)
Result
Joining with `by = join_by(species)`
# A tibble: 43 × 5
name species Fav_colour Colour value
<chr> <chr> <chr> <chr> <dbl>
1 Luke Skywalker Human green blue 0.5
2 C-3PO Droid blue red 0.5
3 R2-D2 Droid red red 0.5
4 Darth Vader Human green blue 0.5
5 Leia Organa Human red blue 0.5
6 Owen Lars Human green blue 0.5
7 Beru Whitesun lars Human green blue 0.5
8 R5-D4 Droid green red 0.5
9 Biggs Darklighter Human green blue 0.5
10 Obi-Wan Kenobi Human green blue 0.5
# … with 33 more rows
# ℹ Use `print(n = ...)` to see more rows
CodePudding user response:
Please check the alternate approach without the use of loop, check the df4 dataframe
df3 <- df2 %>%
pivot_longer(c('Human','Wookiee','Droid'),names_to = 'species', values_to = 'score') %>%
arrange(species, desc(score)) %>%
group_by(species) %>% slice(1)
df4 <- df1 %>% left_join(df3, by='species') %>% rename(Typical = Colour) %>% select(-score)