I have this dataframe where each cod_user has a presence (1) or absence (0) on each month. I would like to merge the values from the rows to one single row for each cod_user that gathers all the values.
# A tibble: 24 × 5
cod_user ene feb mar abr
<chr> <dbl> <dbl> <dbl> <dbl>
1 ES7823 1 NA NA NA
2 AR3442 1 NA NA NA
3 CO9382 1 NA NA NA
4 ES5611 1 NA NA NA
5 IT9982 1 NA NA NA
6 PT6628 1 NA NA NA
7 ES7823 NA 1 NA NA
8 AR3442 NA 1 NA NA
9 CO9382 NA 0 NA NA
10 ES5611 NA 1 NA NA
11 IT9982 NA 1 NA NA
12 PT6628 NA 0 NA NA
13 ES7823 NA NA 1 NA
14 AR3442 NA NA 1 NA
15 CO9382 NA NA 0 NA
16 ES5611 NA NA 1 NA
17 IT9982 NA NA 0 NA
18 PT6628 NA NA 0 NA
19 ES7823 NA NA NA 0
20 AR3442 NA NA NA 1
21 CO9382 NA NA NA 0
22 ES5611 NA NA NA 1
23 IT9982 NA NA NA 0
24 PT6628 NA NA NA 0
I want my output to look like this:
cod_user ene feb mar abr
<chr> <dbl> <dbl> <dbl> <dbl>
1 ES7823 1 1 1 0
2 AR3442 1 1 1 1
3 CO9382 1 0 0 0
Can someone help me? Thank you.
CodePudding user response:
Reshape the data to long format, get rid of the NA
's and reshape back to wide format.
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
df1 %>%
pivot_longer(-cod_user) %>%
drop_na() %>%
pivot_wider(cod_user)
#> # A tibble: 6 × 5
#> cod_user ene feb mar abr
#> <chr> <int> <int> <int> <int>
#> 1 ES7823 1 1 1 0
#> 2 AR3442 1 1 1 1
#> 3 CO9382 1 0 0 0
#> 4 ES5611 1 1 1 1
#> 5 IT9982 1 1 0 0
#> 6 PT6628 1 0 0 0
Created on 2022-12-18 with reprex v2.0.2
Data
df1 <- "cod_user ene feb mar abr
1 ES7823 1 NA NA NA
2 AR3442 1 NA NA NA
3 CO9382 1 NA NA NA
4 ES5611 1 NA NA NA
5 IT9982 1 NA NA NA
6 PT6628 1 NA NA NA
7 ES7823 NA 1 NA NA
8 AR3442 NA 1 NA NA
9 CO9382 NA 0 NA NA
10 ES5611 NA 1 NA NA
11 IT9982 NA 1 NA NA
12 PT6628 NA 0 NA NA
13 ES7823 NA NA 1 NA
14 AR3442 NA NA 1 NA
15 CO9382 NA NA 0 NA
16 ES5611 NA NA 1 NA
17 IT9982 NA NA 0 NA
18 PT6628 NA NA 0 NA
19 ES7823 NA NA NA 0
20 AR3442 NA NA NA 1
21 CO9382 NA NA NA 0
22 ES5611 NA NA NA 1
23 IT9982 NA NA NA 0
24 PT6628 NA NA NA 0"
df1 <- read.table(textConnection(df1), header = TRUE)
Created on 2022-12-18 with reprex v2.0.2
CodePudding user response:
Here is an alternative approach:
library(dplyr)
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df1 %>%
group_by(cod_user) %>%
summarise(across(everything(), coalesce_by_column))
cod_user ene feb mar abr
<chr> <int> <int> <int> <int>
1 AR3442 1 1 1 1
2 CO9382 1 0 0 0
3 ES5611 1 1 1 1
4 ES7823 1 1 1 0
5 IT9982 1 1 0 0
6 PT6628 1 0 0 0