Home > OS >  Concatenate row values given varying conditions in R
Concatenate row values given varying conditions in R

Time:10-20

I am trying to concatenate certain row values (Strings) given varying conditions in R. I have flagged the row values in Flag (the flagging criteria are irrelevant in this example).

Notations: B is the beginning of a run and E the end. 0 is outside the run. 1 denotes any strings excluding B and E in the run. Your solution does not need to follow my convention.

Rules: Every run must begin with B and ends with E. There can be any number of 1 in the run. Any Strings positioned between B and E (both inclusive) are to be concatenated in the order as they are positioned in the run, and replace the B-string. . 0-string will remain in the dataframe. 1- and E-strings will be removed after concatenation.

I haven't come up with anything close to the desired output.

set.seed(128)
df2 <- data.frame(Strings = sample(letters, 17, replace = T), 
                  Flag = c(0,"B",1,1,"E","B","E","B","E",0,"B",1,1,1,"E",0,0))

   Strings Flag
1        d    0
2        r    B
3        q    1
4        r    1
5        v    E
6        f    B
7        y    E
8        u    B
9        c    E
10       x    0
11       h    B
12       w    1
13       x    1
14       t    1
15       j    E
16       d    0
17       j    0

Intermediate output.

   Strings Flag    Result
1        d    0         d
2        r    B   r q r v
3        q    1         q
4        r    1         r
5        v    E         v
6        f    B       f y
7        y    E         y
8        u    B       u c
9        c    E         c
10       x    0         x
11       h    B h w x t j
12       w    1         w
13       x    1         x
14       t    1         t
15       j    E         j
16       d    0         d
17       j    0         j

Desired output.

     Result
1         d
2   r q r v
3       f y
4       u c
5         x
6 h w x t j
7         d
8         j

CodePudding user response:

Using dplyr:

library(dplyr)

set.seed(128)
df2 <- data.frame(Strings = sample(letters, 17, replace = T), 
                  Flag = c(0,"B",1,1,"E","B","E","B","E",0,"B",1,1,1,"E",0,0))

df2 %>% 
  group_by(group = cumsum( (Flag=="B")   (lag(Flag,1,"0")=="E"))) %>% 
  mutate(Result=if_else(Flag=="B", paste0(Strings,collapse = " "),Strings)) %>% 
  filter(!(Flag %in% c("1", "E"))) %>% ungroup() %>% 
  select(-group, -Strings, -Flag)

#> # A tibble: 8 × 1
#>   Result   
#>   <chr>    
#> 1 d        
#> 2 r q r v  
#> 3 f y      
#> 4 u c      
#> 5 x        
#> 6 h w x t j
#> 7 d        
#> 8 j

CodePudding user response:

Here is a solution that might help you. However, I am still not sure if I got your point correctly:

library(dplyr)

df2 %>%
  mutate(Flag2 = cumsum(Flag == 'B' | Flag == '0')) %>%
  group_by(Flag2) %>%
  summarise(Result = paste0(Strings, collapse = ' '))


# A tibble: 8 × 2
  Flag2 Result   
  <int> <chr>    
1     1 d        
2     2 r q r v  
3     3 f y      
4     4 u c      
5     5 x        
6     6 h w x t j
7     7 d        
8     8 j  
  • Related