I'm having issues figuring out how to merge non-unique columns that look like this:
2_2 | 2_3 | 2_4 | 2_2 | 3_2 |
---|---|---|---|---|
1 | 2 | 3 | NA | NA |
2 | 3 | -1 | NA | NA |
NA | NA | NA | 3 | -2 |
NA | NA | NA | -2 | 4 |
To make them look like this:
2_2 | 2_3 | 2_4 | 3_2 |
---|---|---|---|
1 | 2 | 3 | NA |
2 | 3 | -1 | NA |
3 | NA | NA | -2 |
-2 | NA | NA | 4 |
Essentially reshaping any non-unique columns. I have a large data set to work with so this is becoming an issue!
CodePudding user response:
Note that data.frame
doesn't allow for duplicate column names. Even if we create those, it may get modified when we apply functions as make.unique
is automatically applied. Assuming we created the data.frame with duplicate names, an option is to use split.default
to split the data into list of subset of data, then loop over the list
with map
and use coalesce
library(dplyr)
library(purrr)
map_dfc(split.default(df1, names(df1)),~ invoke(coalesce, .x))
-output
# A tibble: 4 × 4
`2_2` `2_3` `2_4` `3_2`
<int> <int> <int> <int>
1 1 2 3 NA
2 2 3 -1 NA
3 3 NA NA -2
4 -2 NA NA 4
data
df1 <- structure(list(`2_2` = c(1L, 2L, NA, NA), `2_3` = c(2L, 3L, NA,
NA), `2_4` = c(3L, -1L, NA, NA), `2_2` = c(NA, NA, 3L, -2L),
`3_2` = c(NA, NA, -2L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
Also using coalesce
:
You use non-syntactic names. R is strict in using names see here https://adv-r.hadley.nz/names-values.html and also notice the explanation by @akrun:
library(dplyr)
df %>%
mutate(X2_2 = coalesce(X2_2, X2_2.1), .keep="unused")
X2_2 X2_3 X2_4 X3_2
1 1 2 3 NA
2 2 3 -1 NA
3 3 NA NA -2
4 -2 NA NA 4