I have a data frame with a grouping variable "id" and a string variable "id_c". Within each group, there may be an 'id_c' with one or more trailing >
.
example_df <- data.frame(
id = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5),
id_c = c("1", "1" , "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>"))
id id_c
1 1 1 #
2 1 1 #
3 1 1> # one trailing > in group 1
4 2 2
5 2 2
6 3 3
7 3 3
8 4 4 #
9 4 4 #
10 4 4 #
11 4 4>> # two trailing > in group 4
12 5 5 #
13 5 5> # one trailing > in group 5
For each 'id', if there is an 'id_c' value with trailing >
or >>
, I want to paste either >
or >>
to the remaining rows (i.e. originally lacking >
). It is a little hard to describe in words so here is my desired output:
id id_c
1 1 1>
2 1 1>
3 1 1>
4 2 2
5 2 2
6 3 3
7 3 3
8 4 4>>
9 4 4>>
10 4 4>>
11 4 4>>
12 5 5>
13 5 5>
CodePudding user response:
## the initial version of your question used vectors
id <- c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5)
id_c <- c("1", "1", "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>")
A base R approach, using look-up:
## rows with ">"
rowid <- grep(">", id_c)
## look-up index
lookup <- match(id, id[rowid], nomatch = 0L)
## replacement using look-up
repl <- id_c[rowid][lookup]
## fill-in
id_c[lookup > 0L] <- repl
id_c
# [1] "1>" "1>" "1>" "2" "2" "3" "3" "4>>" "4>>" "4>>" "4>>" "5>"
#[13] "5>"
The idea is not that transparent, but the code is vectorized and no type conversion or string manipulation is involved.
CodePudding user response:
Here's a dplyr
approach. We first group_by
the id
column, and find out which record has one or more ">" symbol. Then we also need to "flag" the record that originally has the ">" symbol, so that we would skip these records when appending the ">" symbol, otherwise, we will append additional ">" to it.
library(dplyr)
library(tidyr)
example_df %>%
group_by(id) %>%
mutate(new_id_c = str_extract(id_c, "> "),
flag = is.na(new_id_c)) %>%
fill(new_id_c, .direction = "up") %>%
mutate(new_id_c = ifelse(flag & !is.na(new_id_c), paste0(id_c, new_id_c), id_c), .keep = "unused")
# A tibble: 13 × 3
# Groups: id [5]
id day new_id_c
<dbl> <dbl> <chr>
1 1 10 1>
2 1 15 1>
3 1 NA 1>
4 2 10 2
5 2 15 2
6 3 10 3
7 3 15 3
8 4 10 4>>
9 4 15 4>>
10 4 20 4>>
11 4 NA 4>>
12 5 10 5>
13 5 NA 5>