I have a dataframe much like this one
ID Col2 Col3 Col4 Col5
ID_1 JDH 43 FJS f8j
ID_8 ABC 73 *NA* *NA*
ID_7 AFE 03 *NA* k32
ID_8 *NA* *NA* FDS z7d
where I have some rows with the same name/ID (ID_8 in this example) that have different values in the following columns. I now want to merge these rows into a new row ID_8_New without NAs. The dataframe should look something like this:
ID Col2 Col3 Col4 Col5
ID_1 JDH 43 FJS f8j
ID_8 ABC 73 *NA* *NA*
ID_7 AFE 03 *NA* k32
ID_8 *NA* *NA* FDS z7d
ID_8_New ABC 73 FDS z7d
Does anyone have any tips for me how I can accomplish this? My real dataframe has multiple of those rows. Thank you!
CodePudding user response:
We could filter
library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
group_by(ID) %>%
filter(n() > 1) %>%
summarise(across(everything(), ~ .x[order(is.na(.x))]), .groups = 'drop')%>%
drop_na() %>%
mutate(ID = str_c(ID, '_new')) %>%
bind_rows(df1, .)
-output
ID Col2 Col3 Col4 Col5
1 ID_1 JDH 43 FJS f8j
2 ID_8 ABC 73 <NA> <NA>
3 ID_7 AFE 3 <NA> k32
4 ID_8 <NA> NA FDS z7d
5 ID_8_new ABC 73 FDS z7d
-output
df1 <- structure(list(ID = c("ID_1", "ID_8", "ID_7", "ID_8"),
Col2 = c("JDH",
"ABC", "AFE", NA), Col3 = c(43L, 73L, 3L, NA), Col4 = c("FJS",
NA, NA, "FDS"), Col5 = c("f8j", NA, "k32", "z7d")),
class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
You can return the maximum value per group:
library(dplyr)
dat %>%
group_by(ID) %>%
summarise(across(everything(), ~ max(.x, na.rm = T)))
output
# A tibble: 3 × 5
ID Col2 Col3 Col4 Col5
<chr> <chr> <chr> <chr> <chr>
1 ID_1 JDH 43 FJS f8j
2 ID_7 AFE 03 NA k32
3 ID_8 ABC 73 FDS z7d