For example imagine there is a dataset that looks like this
Edit:Added Date and Num column for extra context
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y 5
1 10-10 Y Y 5
1 10-10 Y 5
2 09-17 Y 6
2 09-17 Y 6
3 12-14 Y 7
3 12-14 Y 7
4 06-06 Y 8
4 06-06
Is there a way to have an output that looks like this?
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y Y Y 5
2 09-17 Y Y 6
3 12-14 Y Y 7
4 06-06 Y 8
structure(list(ID = c("000001", "000001", "000001", "000001",
"000001", "000001", "000001", "000001", "000001", "000001", "000002",
"000002", "000002", "000003", "000003", "000003", "000003", "000003",
"000003", "000003"), Date = structure(c(1570492800, 1570492800,
1570492800, 1570492800, 1570492800, 1570492800, 1570492800, 1570492800,
1570492800, 1570492800, 1570665600, 1570665600, 1570665600, 1570838400,
1570838400, 1570838400, 1570838400, 1570838400, 1570838400, 1570838400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), Col1 = c(NA,
NA, NA, NA, "Y", NA, NA, "Y", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Col2 = c("Y", "Y", "Y", "Y", "Y", "Y", "Y",
"Y", "Y", "Y", "Y", "Y", "Y", NA, NA, NA, NA, "Y", "Y", "Y"),
Col3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, "Y", NA, NA, "Y", "Y", "Y"), Num1 = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 861, 861, 861, 497, 497, 497, 497, 497,
497, 497)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
We could group by 'ID', 'Date' and summarise
the rest of the columns (everything()
) by looping across
them and extracting the first non-NA element
library(dplyr)
df2 %>%
group_by(ID, Date) %>%
summarise(across(everything(), ~.x[!is.na(.x)][1]),
.groups = 'drop')
-output
# A tibble: 3 × 6
ID Date Col1 Col2 Col3 Num1
<chr> <dttm> <chr> <chr> <chr> <dbl>
1 000001 2019-10-08 00:00:00 Y Y <NA> 1
2 000002 2019-10-10 00:00:00 <NA> Y <NA> 861
3 000003 2019-10-12 00:00:00 <NA> Y Y 497
CodePudding user response:
A base R option with aggregate
ID Date Col1 Col2 Col3 Num1
1 000001 2019-10-08 Y Y 1
2 000002 2019-10-10 Y 861
3 000003 2019-10-12 Y Y 497