Home > Software design >  how to dynamically intercalate columns with pattern in R?
how to dynamically intercalate columns with pattern in R?

Time:10-08

this is a follow up question . I wanna know how can I intercalate dynamically the columns in the bigger data set?

  • Rationale: I've conducted a for-loop to import 16 dataframes. After that, I did this to merge all dataframes:
### Merge all dataframes: (ps: I got this code here in SO :)
 mergefun <- function(x, y) merge(x, y, by= "ID", all = T)
 merged_DF <- Reduce(mergefun, dataList)

Each dataframes has an "ID" column (which is the same for every one), but they have different column names (the ones I've created based on the other posts' answer). Hence,

  • I have, in total (the head() of each dataframe):
ID NARR_G1_50_AAA NARR_G1_50_AAC NARR_G1_50_AC NARR_G1_50_AB
ID NARR_G1_100_AAA NARR_G1_100_AAC NARR_G1_100_AC NARR_G1_100_AB
ID NARR_G1_150_AAA NARR_G1_150_AAC NARR_G1_150_AC NARR_G1_150_AB
ID NARR_G1_200_AAA NARR_G1_200_AAC NARR_G1_200_AC NARR_G1_200_AB

ID NARR_G2_50_AAA NARR_G2_50_AAC NARR_G2_50_AC NARR_G2_50_AB
ID NARR_G2_100_AAA NARR_G2_100_AAC NARR_G2_100_AC NARR_G2_100_AB
ID NARR_G2_150_AAA NARR_G2_150_AAC NARR_G2_150_AC NARR_G2_150_AB
ID NARR_G2_200_AAA NARR_G2_200_AAC NARR_G2_200_AC NARR_G2_200_AB

ID ARG_G1_50_AAA ARG_G1_50_AAC ARG_G1_50_AC ARG_G1_50_AB
ID ARG_G1_100_AAA ARG_G1_100_AAC ARG_G1_100_AC ARG_G1_100_AB
ID ARG_G1_150_AAA ARG_G1_150_AAC ARG_G1_150_AC ARG_G1_150_AB
ID ARG_G1_200_AAA ARG_G1_200_AAC ARG_G1_200_AC ARG_G1_200_AB

ID ARG_G2_50_AAA ARG_G2_50_AAC ARG_G2_50_AC ARG_G2_50_AB
ID ARG_G2_100_AAA ARG_G2_100_AAC ARG_G2_100_AC ARG_G2_100_AB
ID ARG_G2_150_AAA ARG_G2_150_AAC ARG_G2_150_AC ARG_G2_150_AB
ID ARG_G2_200_AAA ARG_G2_200_AAC ARG_G2_200_AC ARG_G2_200_AB

I need two arrange the joined dataframe columns in these two orders:

SET 1 :

###Desired output 1:
NARR_G1_50_AAA, NARR_G2_50_AAA,
NARR_G1_50_AAC, NARR_G2_50_AAC,  
NARR_G1_50_AC, NARR_G2_50_AC, 
NARR_G1_50_AB, NARR_G2_50_AB,
ARG_G1_50_AAA, ARG_G2_50_AAA,
ARG_G1_50_AAC, ARG_G2_50_AAC,  
ARG_G1_50_AC, ARG_G2_50_AC, 
ARG_G1_50_AB, ARG_G2_50_AB........then with 100,150 and 200

SET 2 :

###Desired output 2:
NARR_G1_50_AAA, ARG_G1_50_AAA, 
NARR_G2_50_AAA, ARG_G2_50_AAA,  
NARR_G1_50_AAC, ARG_G1_50_AAC, 
NARR_G2_50_AAC, ARG_G2_50_AAC,
NARR_G1_50_AC, ARG_G1_50_AC, 
NARR_G2_50_AC, ARG_G2_50_AC,
NARR_G1_50_AB, ARG_G1_50_AB, 
NARR_G2_50_AB, ARG_G2_50_AB,........then with 100,150 and 200
  • I've tried many things, but I can't get the desired orders...the closer I got was this:
dfPaired <- merged_DF %>%   ###still doesn't produce the desired output
  # dplyr::select(sort(names(.))) %>% 
    dplyr::select(order(gsub("G1", "G2", names(.)))) %>% 

Question:

  • How can I get the desired orders (set 1 and set 2) without manually intercalating the columns in select() ?

  • Further notes:

SET 1:

I need to intercalate (in increasing order 50, then 100, then 150, then 200) "G1" and "G2" within each variable. Ex: NARR_G1_50_AAA, NARR_G2_50_AAA... There are 4 per number (AAA, AAB, AC and AB)

SET 2:

I need to intercalate (in increasing order 50, then 100, then 150, then 200) "NARR" and "ARG" comparing G1 and G2. Such as: NARR_G1_50_AAA, NARR_G2_50_AAA... thanks in advance :)

CodePudding user response:

If it should be custom order, an option would be to split up the column names at _, then convert to factor with levels specified in the order we wanted

lvls1 <- c("NARR", "ARG")
lvls2 <- c("G1", "G2")
lvls3 <- c("AAA", "AAC", "AC", "AB")
#v1 <- names(merged_DF)[-1] # assuming 'ID' is the first column
d1 <- read.table(text = v1, header = FALSE, sep = "_")
i1 <- !sapply(d1, is.numeric)
d1[i1] <- Map(factor, d1[i1], levels =  list(lvls1, lvls2, lvls3))
v2 <- v1[do.call(order, d1[c(3, 1,4, 2)])]
library(dplyr)
merged_DF %>%
   select(ID, all_of(v2))

where v2 is

> v2
 [1] "NARR_G1_50_AAA"  "NARR_G2_50_AAA"  "NARR_G1_50_AAC"  "NARR_G2_50_AAC"  "NARR_G1_50_AC"   "NARR_G2_50_AC"   "NARR_G1_50_AB"   "NARR_G2_50_AB"  
 [9] "ARG_G1_50_AAA"   "ARG_G2_50_AAA"   "ARG_G1_50_AAC"   "ARG_G2_50_AAC"   "ARG_G1_50_AC"    "ARG_G2_50_AC"    "ARG_G1_50_AB"    "ARG_G2_50_AB"   
[17] "NARR_G1_100_AAA" "NARR_G2_100_AAA" "NARR_G1_100_AAC" "NARR_G2_100_AAC" "NARR_G1_100_AC"  "NARR_G2_100_AC"  "NARR_G1_100_AB"  "NARR_G2_100_AB" 
[25] "ARG_G1_100_AAA"  "ARG_G2_100_AAA"  "ARG_G1_100_AAC"  "ARG_G2_100_AAC"  "ARG_G1_100_AC"   "ARG_G2_100_AC"   "ARG_G1_100_AB"   "ARG_G2_100_AB"  
[33] "NARR_G1_150_AAA" "NARR_G2_150_AAA" "NARR_G1_150_AAC" "NARR_G2_150_AAC" "NARR_G1_150_AC"  "NARR_G2_150_AC"  "NARR_G1_150_AB"  "NARR_G2_150_AB" 
[41] "ARG_G1_150_AAA"  "ARG_G2_150_AAA"  "ARG_G1_150_AAC"  "ARG_G2_150_AAC"  "ARG_G1_150_AC"   "ARG_G2_150_AC"   "ARG_G1_150_AB"   "ARG_G2_150_AB" 

data

# it is a random order of the column names which is ordered in the code
v1 <- c("NARR_G1_100_AB", "NARR_G1_150_AAC", "NARR_G2_50_AB", "NARR_G1_150_AB", 
"NARR_G2_100_AAA", "NARR_G1_100_AAC", "ARG_G1_150_AC", "ARG_G2_50_AAA", 
"ARG_G2_150_AAA", "ARG_G1_50_AAA", "ARG_G2_100_AC", "NARR_G1_150_AAA", 
"NARR_G2_100_AC", "ARG_G1_50_AC", "NARR_G1_100_AAA", "ARG_G2_50_AB", 
"NARR_G1_150_AC", "ARG_G2_50_AAC", "ARG_G2_150_AB", "NARR_G2_100_AAC", 
"NARR_G2_150_AAA", "NARR_G1_100_AC", "ARG_G1_150_AB", "ARG_G1_50_AAC", 
"NARR_G1_50_AC", "ARG_G2_150_AAC", "NARR_G1_50_AAA", "NARR_G2_150_AB", 
"NARR_G2_150_AAC", "ARG_G1_150_AAA", "ARG_G2_50_AC", "NARR_G2_50_AC", 
"ARG_G1_150_AAC", "ARG_G1_100_AC", "ARG_G1_100_AAA", "NARR_G1_50_AAC", 
"NARR_G2_150_AC", "ARG_G1_100_AAC", "ARG_G2_100_AAA", "ARG_G2_100_AAC", 
"NARR_G1_50_AB", "NARR_G2_100_AB", "ARG_G2_100_AB", "ARG_G1_50_AB", 
"NARR_G2_50_AAA", "ARG_G1_100_AB", "ARG_G2_150_AC", "NARR_G2_50_AAC"
)
  • Related