Convert strings A1.B1.C1 A1.B2 C1 to A1.B1 B2.C1-CodePudding

I would like to convert this v_in string into this v_out in R:

v_in <- 'A1.B1.C1 A1.B2.C1 A1.B2.C2'
v_out<-'A1.B1 B2.C1 C2'

Another example:

v_in <- 'A1.B1.C1 A2.B1.C1 A3.B1.C1'
v_out <- 'A1 A2 A3.B1.C1'

Here other examples:

in<-'FRA.UNR.A DEU.UNR.A'
out<-'FRA DEU.UNR.A'

in<-'FRA.UNR.A DEU.UNR.A ITA.GDP.A'
out<-'FRA DEU ITA.UNR GDP.A'

in<-'FRA.UNR.A FRA.GDP.Q'
out<-'FRA.UNR GDP.A Q'

in<-'A.B.C A.D.E A.D.F G.H.I'
out<-'A G.B D H.C E F I'

The input pattern is: S1 S2 S3 (where S1 is A.B.C, and same thing for S1 and S3)

The output should be: X.Y.Z (where X is unique A codes (separate by ), Y: unique B codes (separate by ), and same thing for Z)

CodePudding user response：

Here is a solution:

In BASE R:

my_fun <- function(v){
  sapply(strsplit(v, '[ ]'), 
     function(x) 
      do.call(paste, aggregate(.~sep, cbind(read.table(text=x, sep='.'), sep ='.'), 
                   function(y)paste(unique(y), collapse = ' '))))
}

my_fun(v_in)
[1] "A1.B1 B2.C1 C2"        "A1 A2 A3.B1.C1"        "FRA DEU.UNR.A"        
[4] "FRA DEU ITA.UNR GDP.A" "FRA.UNR GDP.A Q"

In tidyverse:

library(tidyverse)
data.frame(v_in, id = v_in) %>%
  separate_rows(id, sep='[ ]') %>%
  separate(id, c('COUNTRY', 'VARIABLE', 'FREQUENCY')) %>%
  group_by(v_in) %>%
  summarise(across(everything(), ~paste0(unique(.x), collapse = ' ')))%>%
  unite(v_out,-v_in, sep='.')


v_in                          v_out                
  <chr>                         <chr>                
1 A1.B1.C1 A1.B2.C1 A1.B2.C2    A1.B1 B2.C1 C2       
2 A1.B1.C1 A2.B1.C1 A3.B1.C1    A1 A2 A3.B1.C1       
3 FRA.UNR.A DEU.UNR.A           FRA DEU.UNR.A        
4 FRA.UNR.A DEU.UNR.A ITA.GDP.A FRA DEU ITA.UNR GDP.A
5 FRA.UNR.A FRA.GDP.Q           FRA.UNR GDP.A Q

DATA:

v_in <- c("A1.B1.C1 A1.B2.C1 A1.B2.C2", "A1.B1.C1 A2.B1.C1 A3.B1.C1", 
"FRA.UNR.A DEU.UNR.A", "FRA.UNR.A DEU.UNR.A ITA.GDP.A", "FRA.UNR.A FRA.GDP.Q")