I am trying to concatenate certain row values (Strings
) given varying conditions in R. I have flagged the row values in Flag
(the flagging criteria are irrelevant in this example).
Notations: B is the beginning of a run and E the end. 0 is outside the run. 1 denotes any strings excluding B and E in the run. Your solution does not need to follow my convention.
Rules: Every run must begin with B and ends with E. There can be any number of 1 in the run. Any Strings
positioned between B and E (both inclusive) are to be concatenated in the order as they are positioned in the run, and replace the B-string. . 0-string will remain in the dataframe. 1- and E-strings will be removed after concatenation.
I haven't come up with anything close to the desired output.
set.seed(128)
df2 <- data.frame(Strings = sample(letters, 17, replace = T),
Flag = c(0,"B",1,1,"E","B","E","B","E",0,"B",1,1,1,"E",0,0))
Strings Flag
1 d 0
2 r B
3 q 1
4 r 1
5 v E
6 f B
7 y E
8 u B
9 c E
10 x 0
11 h B
12 w 1
13 x 1
14 t 1
15 j E
16 d 0
17 j 0
Intermediate output.
Strings Flag Result
1 d 0 d
2 r B r q r v
3 q 1 q
4 r 1 r
5 v E v
6 f B f y
7 y E y
8 u B u c
9 c E c
10 x 0 x
11 h B h w x t j
12 w 1 w
13 x 1 x
14 t 1 t
15 j E j
16 d 0 d
17 j 0 j
Desired output.
Result
1 d
2 r q r v
3 f y
4 u c
5 x
6 h w x t j
7 d
8 j
CodePudding user response:
Using dplyr
:
library(dplyr)
set.seed(128)
df2 <- data.frame(Strings = sample(letters, 17, replace = T),
Flag = c(0,"B",1,1,"E","B","E","B","E",0,"B",1,1,1,"E",0,0))
df2 %>%
group_by(group = cumsum( (Flag=="B") (lag(Flag,1,"0")=="E"))) %>%
mutate(Result=if_else(Flag=="B", paste0(Strings,collapse = " "),Strings)) %>%
filter(!(Flag %in% c("1", "E"))) %>% ungroup() %>%
select(-group, -Strings, -Flag)
#> # A tibble: 8 × 1
#> Result
#> <chr>
#> 1 d
#> 2 r q r v
#> 3 f y
#> 4 u c
#> 5 x
#> 6 h w x t j
#> 7 d
#> 8 j
CodePudding user response:
Here is a solution that might help you. However, I am still not sure if I got your point correctly:
library(dplyr)
df2 %>%
mutate(Flag2 = cumsum(Flag == 'B' | Flag == '0')) %>%
group_by(Flag2) %>%
summarise(Result = paste0(Strings, collapse = ' '))
# A tibble: 8 × 2
Flag2 Result
<int> <chr>
1 1 d
2 2 r q r v
3 3 f y
4 4 u c
5 5 x
6 6 h w x t j
7 7 d
8 8 j