Home > front end >  How to apply a sequence of custom operations on a group by
How to apply a sequence of custom operations on a group by

Time:05-11

I have a dataframe of the form:

ID Rank Var1 Var2
a 3 na 6
b 2 3 na
a 1 na 5
a 2 2 1
b 1 7 1

I am trying to apply a sequence of operations:

  1. Group by ID column
  2. Sort by Rank Column descending
  3. For each variable column, select the highest ranked non na value So the output would be
ID Var1 Var 2
a 2 6
b 3 1

So far I have

df %>% group_by(ID) %>% arange(desc(Rank))

But am unsure how to then filter each column for non NA and select the highest. The NA filter should be applied on a column basis only - if Var 1 has na, it should be excluded from the calculation of Var 1, but not for Var2.

CodePudding user response:

Select the first non-NA value across the selected variables after grouping:

library(dplyr)

df %>% 
  na_if("na") %>% 
  group_by(ID) %>% 
  arrange(desc(Rank), .by_group = T) %>% 
  summarise(across(Var1:Var2, ~ head(.x[!is.na(.x)], 1)))

# A tibble: 2 × 3
  ID    Var1  Var2 
  <chr> <chr> <chr>
1 a     2     6    
2 b     3     1    

or with first:

df %>% 
  na_if("na") %>% 
  group_by(ID) %>% 
  summarise(across(Var1:Var2, ~ first(.x[!is.na(.x)], order_by = "Rank")))

data

df <- read.table(header = T, text = "ID Rank    Var1    Var2
a   3   na  6
b   2   3   na
a   1   na  5
a   2   2   1
b   1   7   1")

CodePudding user response:

library(tidyverse)

tribble(
  ~ID, ~Rank, ~Var1, ~Var2,
  "a", 3, NA, 6,
  "b", 2, 3, NA,
  "a", 1, NA, 5,
  "a", 2, 2, 1,
  "b", 1, 7, 1
) |> 
  group_by(ID) |> 
  arrange(ID, desc(Rank)) |> 
  fill(everything(), .direction = "up") |> 
  summarise(across(starts_with("Var"), first))
#> # A tibble: 2 × 3
#>   ID     Var1  Var2
#>   <chr> <dbl> <dbl>
#> 1 a         2     6
#> 2 b         3     1

Created on 2022-05-11 by the reprex package (v2.0.1)

  • Related