How can I get a value for mutate from a different row in R/tidyverse?-CodePudding

In the dataframe (tibble) from the reprex below, there are two groups of rows I would like to group by. I could almost group by doc_num, but there are rows with different values of doc_num but belong in the same group. Specifically, the PO-100s go with the 15s and the PO-101s go with the 17s.

I understand that my attempt below fails because I have no way to reference the doc_num in the current row in that cur_data call since it would think I'm referencing the doc_num column in cur_data. But I'm not sure of how else to proceed.

It is guaranteed that only one value of pur_ord will be populated per group of distinct doc_num values, i.e. per grp.

The expected output is the same example_df but with a new column, grp, which I can group by later and has the values of five 15s and then six 17s (i.e. c(rep(15, 5), rep(17, 6))

How can I achieve this?

library(tidyverse)
set.seed(123)
example_df <- tibble(
  doc_num = c(rep(15, 4), "PO-100", rep(17, 4), rep("PO-101", 2)),
  pur_ord = c("Purchase Order #PO-100", rep(NA_character_, 4),
              "Purchase Order #PO-101", rep(NA_character_, 5)),
  x = rnorm(11),
  y = sample(LETTERS, 11)
)

example_df
#> # A tibble: 11 × 4
#>    doc_num pur_ord                      x y    
#>    <chr>   <chr>                    <dbl> <chr>
#>  1 15      Purchase Order #PO-100 -0.560  Y    
#>  2 15      <NA>                   -0.230  I    
#>  3 15      <NA>                    1.56   C    
#>  4 15      <NA>                    0.0705 H    
#>  5 PO-100  <NA>                    0.129  G    
#>  6 17      Purchase Order #PO-101  1.72   J    
#>  7 17      <NA>                    0.461  Z    
#>  8 17      <NA>                   -1.27   S    
#>  9 17      <NA>                   -0.687  D    
#> 10 PO-101  <NA>                   -0.446  N    
#> 11 PO-101  <NA>                    1.22   A

example_df %>% 
  mutate(
    grp = case_when(
      !str_starts(doc_num, "PO") ~ doc_num,
      TRUE ~ cur_data() %>% 
        filter(str_detect(pur_ord, doc_num)) %>% 
        pull(doc_num)
    )
  )
#> Error in `mutate()`:
#> ! Problem while computing `grp = case_when(...)`.
#> Caused by error in `case_when()`:
#> ! `TRUE ~ cur_data() %>% filter(str_detect(pur_ord, doc_num)) %>%
#>   pull(doc_num)` must be length 11 or one, not 0.

^{Created on 2022-03-04 by the reprex package (v2.0.1)}

CodePudding user response：

Here's a dplyr solution that should work even if the data aren't strictly ordered.

# 1. Group by each value of "doc_num" and find the unique "pur_ord".
# 2. Copy the PO number for each row of each group.
# 3. Ensure that this value is always present.
# 4. Use the PO number as a lookup into the original "doc_num" column.



new_df <- example_df %>% 
  group_by(doc_num) %>% 
  mutate(
    po_col = max(pur_ord, na.rm = T),
    po_col = gsub('.*(PO-\\d )', '\\1', po_col),
    po_col = ifelse(!is.na(po_col), po_col, doc_num)
  ) %>% 
  group_by(po_col) %>% 
  mutate(
    grp = unique(doc_num[!grepl('PO', doc_num)])
  ) %>% 
  ungroup %>% 
  select(-po_col)
    
   doc_num pur_ord                      x y     grp  
   <chr>   <chr>                    <dbl> <chr> <chr>
 1 15      Purchase Order #PO-100 -0.560  Y     15   
 2 15      NA                     -0.230  I     15   
 3 15      NA                      1.56   C     15   
 4 15      NA                      0.0705 H     15   
 5 PO-100  NA                      0.129  G     15   
 6 17      Purchase Order #PO-101  1.72   J     17   
 7 17      NA                      0.461  Z     17   
 8 17      NA                     -1.27   S     17   
 9 17      NA                     -0.687  D     17   
10 PO-101  NA                     -0.446  N     17   
11 PO-101  NA                      1.22   A     17

CodePudding user response：

Assuming your data is structured and ordered (i.e. the PO-XXX will always come last per desired group), you can do:

library(tidyverse)
example_df %>%
  mutate(grp = if_else(str_detect(doc_num, '^PO-'), NA_character_, doc_num)) %>%
  fill(grp)

# A tibble: 11 x 5
   doc_num pur_ord                      x y     grp  
   <chr>   <chr>                    <dbl> <chr> <chr>
 1 15      Purchase Order #PO-100 -0.560  Y     15   
 2 15      <NA>                   -0.230  I     15   
 3 15      <NA>                    1.56   C     15   
 4 15      <NA>                    0.0705 H     15   
 5 PO-100  <NA>                    0.129  G     15   
 6 17      Purchase Order #PO-101  1.72   J     17   
 7 17      <NA>                    0.461  Z     17   
 8 17      <NA>                   -1.27   S     17   
 9 17      <NA>                   -0.687  D     17   
10 PO-101  <NA>                   -0.446  N     17   
11 PO-101  <NA>                    1.22   A     17