I would like to convert this v_in string into this v_out in R:
v_in <- 'A1.B1.C1 A1.B2.C1 A1.B2.C2'
v_out<-'A1.B1 B2.C1 C2'
Another example:
v_in <- 'A1.B1.C1 A2.B1.C1 A3.B1.C1'
v_out <- 'A1 A2 A3.B1.C1'
Here other examples:
in<-'FRA.UNR.A DEU.UNR.A'
out<-'FRA DEU.UNR.A'
in<-'FRA.UNR.A DEU.UNR.A ITA.GDP.A'
out<-'FRA DEU ITA.UNR GDP.A'
in<-'FRA.UNR.A FRA.GDP.Q'
out<-'FRA.UNR GDP.A Q'
in<-'A.B.C A.D.E A.D.F G.H.I'
out<-'A G.B D H.C E F I'
The input pattern is: S1 S2 S3 (where S1 is A.B.C, and same thing for S1 and S3)
The output should be: X.Y.Z (where X is unique A codes (separate by ), Y: unique B codes (separate by ), and same thing for Z)
CodePudding user response:
Here is a solution:
In BASE R:
my_fun <- function(v){
sapply(strsplit(v, '[ ]'),
function(x)
do.call(paste, aggregate(.~sep, cbind(read.table(text=x, sep='.'), sep ='.'),
function(y)paste(unique(y), collapse = ' '))))
}
my_fun(v_in)
[1] "A1.B1 B2.C1 C2" "A1 A2 A3.B1.C1" "FRA DEU.UNR.A"
[4] "FRA DEU ITA.UNR GDP.A" "FRA.UNR GDP.A Q"
In tidyverse:
library(tidyverse)
data.frame(v_in, id = v_in) %>%
separate_rows(id, sep='[ ]') %>%
separate(id, c('COUNTRY', 'VARIABLE', 'FREQUENCY')) %>%
group_by(v_in) %>%
summarise(across(everything(), ~paste0(unique(.x), collapse = ' ')))%>%
unite(v_out,-v_in, sep='.')
v_in v_out
<chr> <chr>
1 A1.B1.C1 A1.B2.C1 A1.B2.C2 A1.B1 B2.C1 C2
2 A1.B1.C1 A2.B1.C1 A3.B1.C1 A1 A2 A3.B1.C1
3 FRA.UNR.A DEU.UNR.A FRA DEU.UNR.A
4 FRA.UNR.A DEU.UNR.A ITA.GDP.A FRA DEU ITA.UNR GDP.A
5 FRA.UNR.A FRA.GDP.Q FRA.UNR GDP.A Q
DATA:
v_in <- c("A1.B1.C1 A1.B2.C1 A1.B2.C2", "A1.B1.C1 A2.B1.C1 A3.B1.C1",
"FRA.UNR.A DEU.UNR.A", "FRA.UNR.A DEU.UNR.A ITA.GDP.A", "FRA.UNR.A FRA.GDP.Q")