Home > Software engineering >  Rowwise mutation of tibble without for loop in R
Rowwise mutation of tibble without for loop in R

Time:01-18

I have two tibbles and need to index the data in one tibble and insert some specific data in the other tibble based on a variable in the first tibble.

I have two tibbles:

library(dplyr)

# Set seed
set.seed(10)

# Get data
df1 <-  starwars %>% 
  select(name,species) %>%
  filter(species %in% c("Human","Wookiee","Droid")) %>%
  mutate(Fav_colour = sample(c("blue","red","green"),n(),replace=TRUE))

# Random table with typical colour preference
df2 <- tibble(Colour = c("blue","green","red"),
                   Human = c(0.5,0.3,0.1),
                   Wookiee = c(0.2,0.8,0.5),
                   Droid = c(0.3,0.1,0.5))

In df1 I need to insert the typical colour preference based on the species. To do this I can iterate through each row of the tibble in a for loop, add the relevant data, then compile into a list.

# Make empty list
data <- list()

# Iterate through each row
for (x in 1:nrow(df1)) {
  # Take a slice
  tmp <- slice(df1, x)
  
  # Add new column to slice based on data in slice (species)
  tmp$Typical <- df2 %>%
    select(Colour,tmp$species) %>%
    arrange(desc(.data[[tmp$species]])) %>% 
    slice_head(n = 1) %>% 
    select(Colour) %>% 
    as.character()
    
  #Add data to list
  data[[x]] <- tmp
}

#Recompile df1
df1 <- list.rbind(data)

I think there must be a more efficient way to do this, but can't figure out how to obtain filtered and arranged values from df2 without putting it through a for loop. I don't know how to do this, but is sapply with a function perhaps better option? What is the dplyr way of doing this without a for loop?

CodePudding user response:

It sounds like you want from df2 the largest value per species. If we pivot_longer to make the species be specified in one column, and the value in another, we can group by species and keep the largest value. This lookup table (with colour value per species) can be joined to the original data.

df1 %>%
  left_join(
    df2 %>% 
      tidyr::pivot_longer(2:4, names_to = "species") %>%
      group_by(species) %>%
      slice_max(value)
  )

Result

Joining with `by = join_by(species)`
# A tibble: 43 × 5
   name               species Fav_colour Colour value
   <chr>              <chr>   <chr>      <chr>  <dbl>
 1 Luke Skywalker     Human   green      blue     0.5
 2 C-3PO              Droid   blue       red      0.5
 3 R2-D2              Droid   red        red      0.5
 4 Darth Vader        Human   green      blue     0.5
 5 Leia Organa        Human   red        blue     0.5
 6 Owen Lars          Human   green      blue     0.5
 7 Beru Whitesun lars Human   green      blue     0.5
 8 R5-D4              Droid   green      red      0.5
 9 Biggs Darklighter  Human   green      blue     0.5
10 Obi-Wan Kenobi     Human   green      blue     0.5
# … with 33 more rows
# ℹ Use `print(n = ...)` to see more rows

CodePudding user response:

Please check the alternate approach without the use of loop, check the df4 dataframe

df3 <- df2 %>% 
pivot_longer(c('Human','Wookiee','Droid'),names_to = 'species', values_to = 'score') %>% 
arrange(species, desc(score)) %>% 
  group_by(species) %>% slice(1)


df4 <- df1 %>% left_join(df3, by='species') %>% rename(Typical = Colour) %>% select(-score)
  • Related