Home > other >  R Merging non-unique columns to consolidate data frame
R Merging non-unique columns to consolidate data frame

Time:01-16

I'm having issues figuring out how to merge non-unique columns that look like this:

2_2 2_3 2_4 2_2 3_2
1 2 3 NA NA
2 3 -1 NA NA
NA NA NA 3 -2
NA NA NA -2 4

To make them look like this:

2_2 2_3 2_4 3_2
1 2 3 NA
2 3 -1 NA
3 NA NA -2
-2 NA NA 4

Essentially reshaping any non-unique columns. I have a large data set to work with so this is becoming an issue!

CodePudding user response:

Note that data.frame doesn't allow for duplicate column names. Even if we create those, it may get modified when we apply functions as make.unique is automatically applied. Assuming we created the data.frame with duplicate names, an option is to use split.default to split the data into list of subset of data, then loop over the list with map and use coalesce

library(dplyr)
library(purrr)
 map_dfc(split.default(df1, names(df1)),~ invoke(coalesce, .x))

-output

# A tibble: 4 × 4
  `2_2` `2_3` `2_4` `3_2`
  <int> <int> <int> <int>
1     1     2     3    NA
2     2     3    -1    NA
3     3    NA    NA    -2
4    -2    NA    NA     4

data

df1 <- structure(list(`2_2` = c(1L, 2L, NA, NA), `2_3` = c(2L, 3L, NA, 
NA), `2_4` = c(3L, -1L, NA, NA), `2_2` = c(NA, NA, 3L, -2L), 
    `3_2` = c(NA, NA, -2L, 4L)), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

Also using coalesce:

You use non-syntactic names. R is strict in using names see here https://adv-r.hadley.nz/names-values.html and also notice the explanation by @akrun:

library(dplyr)
df %>% 
  mutate(X2_2 = coalesce(X2_2, X2_2.1), .keep="unused")

  X2_2 X2_3 X2_4 X3_2
1    1    2    3   NA
2    2    3   -1   NA
3    3   NA   NA   -2
4   -2   NA   NA    4
  •  Tags:  
  • r
  • Related