this is a follow up question . I wanna know how can I intercalate dynamically the columns in the bigger data set?
- Rationale: I've conducted a for-loop to import 16 dataframes. After that, I did this to merge all dataframes:
### Merge all dataframes: (ps: I got this code here in SO :)
mergefun <- function(x, y) merge(x, y, by= "ID", all = T)
merged_DF <- Reduce(mergefun, dataList)
Each dataframes has an "ID" column (which is the same for every one), but they have different column names (the ones I've created based on the other posts' answer). Hence,
- I have, in total (the
head()
of each dataframe):
ID NARR_G1_50_AAA NARR_G1_50_AAC NARR_G1_50_AC NARR_G1_50_AB
ID NARR_G1_100_AAA NARR_G1_100_AAC NARR_G1_100_AC NARR_G1_100_AB
ID NARR_G1_150_AAA NARR_G1_150_AAC NARR_G1_150_AC NARR_G1_150_AB
ID NARR_G1_200_AAA NARR_G1_200_AAC NARR_G1_200_AC NARR_G1_200_AB
ID NARR_G2_50_AAA NARR_G2_50_AAC NARR_G2_50_AC NARR_G2_50_AB
ID NARR_G2_100_AAA NARR_G2_100_AAC NARR_G2_100_AC NARR_G2_100_AB
ID NARR_G2_150_AAA NARR_G2_150_AAC NARR_G2_150_AC NARR_G2_150_AB
ID NARR_G2_200_AAA NARR_G2_200_AAC NARR_G2_200_AC NARR_G2_200_AB
ID ARG_G1_50_AAA ARG_G1_50_AAC ARG_G1_50_AC ARG_G1_50_AB
ID ARG_G1_100_AAA ARG_G1_100_AAC ARG_G1_100_AC ARG_G1_100_AB
ID ARG_G1_150_AAA ARG_G1_150_AAC ARG_G1_150_AC ARG_G1_150_AB
ID ARG_G1_200_AAA ARG_G1_200_AAC ARG_G1_200_AC ARG_G1_200_AB
ID ARG_G2_50_AAA ARG_G2_50_AAC ARG_G2_50_AC ARG_G2_50_AB
ID ARG_G2_100_AAA ARG_G2_100_AAC ARG_G2_100_AC ARG_G2_100_AB
ID ARG_G2_150_AAA ARG_G2_150_AAC ARG_G2_150_AC ARG_G2_150_AB
ID ARG_G2_200_AAA ARG_G2_200_AAC ARG_G2_200_AC ARG_G2_200_AB
I need two arrange the joined dataframe columns in these two orders:
SET 1 :
###Desired output 1:
NARR_G1_50_AAA, NARR_G2_50_AAA,
NARR_G1_50_AAC, NARR_G2_50_AAC,
NARR_G1_50_AC, NARR_G2_50_AC,
NARR_G1_50_AB, NARR_G2_50_AB,
ARG_G1_50_AAA, ARG_G2_50_AAA,
ARG_G1_50_AAC, ARG_G2_50_AAC,
ARG_G1_50_AC, ARG_G2_50_AC,
ARG_G1_50_AB, ARG_G2_50_AB........then with 100,150 and 200
SET 2 :
###Desired output 2:
NARR_G1_50_AAA, ARG_G1_50_AAA,
NARR_G2_50_AAA, ARG_G2_50_AAA,
NARR_G1_50_AAC, ARG_G1_50_AAC,
NARR_G2_50_AAC, ARG_G2_50_AAC,
NARR_G1_50_AC, ARG_G1_50_AC,
NARR_G2_50_AC, ARG_G2_50_AC,
NARR_G1_50_AB, ARG_G1_50_AB,
NARR_G2_50_AB, ARG_G2_50_AB,........then with 100,150 and 200
- I've tried many things, but I can't get the desired orders...the closer I got was this:
dfPaired <- merged_DF %>% ###still doesn't produce the desired output
# dplyr::select(sort(names(.))) %>%
dplyr::select(order(gsub("G1", "G2", names(.)))) %>%
Question:
How can I get the desired orders (set 1 and set 2) without manually intercalating the columns in
select()
?Further notes:
SET 1:
I need to intercalate (in increasing order 50, then 100, then 150, then 200) "G1" and "G2" within each variable. Ex: NARR_G1_50_AAA, NARR_G2_50_AAA... There are 4 per number (AAA, AAB, AC and AB)
SET 2:
I need to intercalate (in increasing order 50, then 100, then 150, then 200) "NARR" and "ARG" comparing G1 and G2. Such as: NARR_G1_50_AAA, NARR_G2_50_AAA... thanks in advance :)
CodePudding user response:
If it should be custom order, an option would be to split up the column names at _
, then convert to factor
with levels
specified in the order we wanted
lvls1 <- c("NARR", "ARG")
lvls2 <- c("G1", "G2")
lvls3 <- c("AAA", "AAC", "AC", "AB")
#v1 <- names(merged_DF)[-1] # assuming 'ID' is the first column
d1 <- read.table(text = v1, header = FALSE, sep = "_")
i1 <- !sapply(d1, is.numeric)
d1[i1] <- Map(factor, d1[i1], levels = list(lvls1, lvls2, lvls3))
v2 <- v1[do.call(order, d1[c(3, 1,4, 2)])]
library(dplyr)
merged_DF %>%
select(ID, all_of(v2))
where v2
is
> v2
[1] "NARR_G1_50_AAA" "NARR_G2_50_AAA" "NARR_G1_50_AAC" "NARR_G2_50_AAC" "NARR_G1_50_AC" "NARR_G2_50_AC" "NARR_G1_50_AB" "NARR_G2_50_AB"
[9] "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC" "ARG_G1_50_AC" "ARG_G2_50_AC" "ARG_G1_50_AB" "ARG_G2_50_AB"
[17] "NARR_G1_100_AAA" "NARR_G2_100_AAA" "NARR_G1_100_AAC" "NARR_G2_100_AAC" "NARR_G1_100_AC" "NARR_G2_100_AC" "NARR_G1_100_AB" "NARR_G2_100_AB"
[25] "ARG_G1_100_AAA" "ARG_G2_100_AAA" "ARG_G1_100_AAC" "ARG_G2_100_AAC" "ARG_G1_100_AC" "ARG_G2_100_AC" "ARG_G1_100_AB" "ARG_G2_100_AB"
[33] "NARR_G1_150_AAA" "NARR_G2_150_AAA" "NARR_G1_150_AAC" "NARR_G2_150_AAC" "NARR_G1_150_AC" "NARR_G2_150_AC" "NARR_G1_150_AB" "NARR_G2_150_AB"
[41] "ARG_G1_150_AAA" "ARG_G2_150_AAA" "ARG_G1_150_AAC" "ARG_G2_150_AAC" "ARG_G1_150_AC" "ARG_G2_150_AC" "ARG_G1_150_AB" "ARG_G2_150_AB"
data
# it is a random order of the column names which is ordered in the code
v1 <- c("NARR_G1_100_AB", "NARR_G1_150_AAC", "NARR_G2_50_AB", "NARR_G1_150_AB",
"NARR_G2_100_AAA", "NARR_G1_100_AAC", "ARG_G1_150_AC", "ARG_G2_50_AAA",
"ARG_G2_150_AAA", "ARG_G1_50_AAA", "ARG_G2_100_AC", "NARR_G1_150_AAA",
"NARR_G2_100_AC", "ARG_G1_50_AC", "NARR_G1_100_AAA", "ARG_G2_50_AB",
"NARR_G1_150_AC", "ARG_G2_50_AAC", "ARG_G2_150_AB", "NARR_G2_100_AAC",
"NARR_G2_150_AAA", "NARR_G1_100_AC", "ARG_G1_150_AB", "ARG_G1_50_AAC",
"NARR_G1_50_AC", "ARG_G2_150_AAC", "NARR_G1_50_AAA", "NARR_G2_150_AB",
"NARR_G2_150_AAC", "ARG_G1_150_AAA", "ARG_G2_50_AC", "NARR_G2_50_AC",
"ARG_G1_150_AAC", "ARG_G1_100_AC", "ARG_G1_100_AAA", "NARR_G1_50_AAC",
"NARR_G2_150_AC", "ARG_G1_100_AAC", "ARG_G2_100_AAA", "ARG_G2_100_AAC",
"NARR_G1_50_AB", "NARR_G2_100_AB", "ARG_G2_100_AB", "ARG_G1_50_AB",
"NARR_G2_50_AAA", "ARG_G1_100_AB", "ARG_G2_150_AC", "NARR_G2_50_AAC"
)