Home > Enterprise >  Function in R / dplyr similar to VLOOKUP when using mutate()
Function in R / dplyr similar to VLOOKUP when using mutate()

Time:12-02

I want to use dplyr's mutate and subtract a variable from another one, that I have to find based on two criteria.

Here's an example of what I want to do:

# Generating the data
mode_of_travel <- c("car", "car", "plane", "plane", "train", "train")
variant <- c("slow","fast","slow","fast","slow","fast") 
speed <- c(5, 7, 10, 14, 6, 7)
df <- data.frame(mode_of_travel, variant, speed)
# Data result
  mode_of_travel variant speed
1            car    slow     5
2            car    fast     7
3          plane    slow    10
4          plane    fast    14
5          train    slow     6
6          train    fast     7

and now I want to find the difference in speed between every mode and variant and the corresponding "slow" variant:

# Computing the speed difference between the slow and the fast variant
df %>% mutate(speed_difference = speed - case_when(mode_of_travel == "car" ~ 7,
                                                  mode_of_travel == "plane" ~ 5,
                                                  mode_of_travel == "train" ~ 4))

so the output looks like this:

  mode_of_travel variant speed speed_difference
1            car    slow     5                0
2            car    fast     7                2
3          plane    slow    10                0
4          plane    fast    14                4
5          train    slow     6                0
6          train    fast     7                1

But of course I don't want to manually do this via the "case_when" function and typing in the value. How can this be done properly?

Thanks :)

CodePudding user response:

I would describe this as within each mode_of_travel group you want to subtract the "slow" variant speed from the current row's speed:

df %>%
  group_by(mode_of_travel) %>%
  mutate(speed_difference = speed - speed[variant == "slow"]) %>%
  ungroup()
# # A tibble: 6 × 4
#   mode_of_travel variant speed speed_difference
#   <chr>          <chr>   <dbl>            <dbl>
# 1 car            slow        5                0
# 2 car            fast        7                2
# 3 plane          slow       10                0
# 4 plane          fast       14                4
# 5 train          slow        6                0
# 6 train          fast        7                1

This code assumes there is exactly one "slow" variant within each mode of travel group.

  • Related