Home > Enterprise >  Replace values based on presence of a string by group
Replace values based on presence of a string by group

Time:07-29

I have a data frame with a grouping variable "id" and a string variable "id_c". Within each group, there may be an 'id_c' with one or more trailing >.

example_df <- data.frame(
         id = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5),
         id_c = c("1", "1" , "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>"))

   id id_c
1   1    1 #
2   1    1 #
3   1   1> # one trailing > in group 1
4   2    2 
5   2    2  
6   3    3  
7   3    3 
8   4    4  #
9   4    4  #
10  4    4  #
11  4  4>>  # two trailing > in group 4 
12  5    5  #
13  5   5>  # one trailing > in group 5

For each 'id', if there is an 'id_c' value with trailing > or >>, I want to paste either > or >> to the remaining rows (i.e. originally lacking >). It is a little hard to describe in words so here is my desired output:

   id id_c 
1   1   1> 
2   1   1> 
3   1   1> 
4   2    2 
5   2    2 
6   3    3 
7   3    3 
8   4  4>> 
9   4  4>> 
10  4  4>> 
11  4  4>> 
12  5   5> 
13  5   5> 

CodePudding user response:

## the initial version of your question used vectors
id <- c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5)
id_c <- c("1", "1", "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>")

A base R approach, using look-up:

## rows with ">"
rowid <- grep(">", id_c)
## look-up index
lookup <- match(id, id[rowid], nomatch = 0L)
## replacement using look-up
repl <- id_c[rowid][lookup]
## fill-in
id_c[lookup > 0L] <- repl

id_c
# [1] "1>"  "1>"  "1>"  "2"   "2"   "3"   "3"   "4>>" "4>>" "4>>" "4>>" "5>" 
#[13] "5>" 

The idea is not that transparent, but the code is vectorized and no type conversion or string manipulation is involved.

CodePudding user response:

Here's a dplyr approach. We first group_by the id column, and find out which record has one or more ">" symbol. Then we also need to "flag" the record that originally has the ">" symbol, so that we would skip these records when appending the ">" symbol, otherwise, we will append additional ">" to it.

library(dplyr)
library(tidyr)

example_df %>% 
  group_by(id) %>% 
  mutate(new_id_c = str_extract(id_c, "> "),
         flag = is.na(new_id_c)) %>% 
  fill(new_id_c, .direction = "up") %>% 
  mutate(new_id_c = ifelse(flag & !is.na(new_id_c), paste0(id_c, new_id_c), id_c), .keep = "unused")

# A tibble: 13 × 3
# Groups:   id [5]
      id   day new_id_c
   <dbl> <dbl> <chr>   
 1     1    10 1>      
 2     1    15 1>      
 3     1    NA 1>      
 4     2    10 2       
 5     2    15 2       
 6     3    10 3       
 7     3    15 3       
 8     4    10 4>>     
 9     4    15 4>>     
10     4    20 4>>     
11     4    NA 4>>     
12     5    10 5>      
13     5    NA 5>   
  •  Tags:  
  • r
  • Related