Home > Enterprise >  Is there a way in R to merge rows based on condition?
Is there a way in R to merge rows based on condition?

Time:12-19

I have this dataframe where each cod_user has a presence (1) or absence (0) on each month. I would like to merge the values from the rows to one single row for each cod_user that gathers all the values.

# A tibble: 24 × 5
   cod_user   ene   feb   mar   abr
   <chr>    <dbl> <dbl> <dbl> <dbl>
 1 ES7823       1    NA    NA    NA
 2 AR3442       1    NA    NA    NA
 3 CO9382       1    NA    NA    NA
 4 ES5611       1    NA    NA    NA
 5 IT9982       1    NA    NA    NA
 6 PT6628       1    NA    NA    NA
 7 ES7823      NA     1    NA    NA
 8 AR3442      NA     1    NA    NA
 9 CO9382      NA     0    NA    NA
10 ES5611      NA     1    NA    NA
11 IT9982      NA     1    NA    NA
12 PT6628      NA     0    NA    NA
13 ES7823      NA    NA     1    NA
14 AR3442      NA    NA     1    NA
15 CO9382      NA    NA     0    NA
16 ES5611      NA    NA     1    NA
17 IT9982      NA    NA     0    NA
18 PT6628      NA    NA     0    NA
19 ES7823      NA    NA    NA     0
20 AR3442      NA    NA    NA     1
21 CO9382      NA    NA    NA     0
22 ES5611      NA    NA    NA     1
23 IT9982      NA    NA    NA     0
24 PT6628      NA    NA    NA     0

I want my output to look like this:

  cod_user   ene   feb   mar   abr
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 ES7823       1     1     1     0
2 AR3442       1     1     1     1
3 CO9382       1     0     0     0

Can someone help me? Thank you.

CodePudding user response:

Reshape the data to long format, get rid of the NA's and reshape back to wide format.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

df1 %>%
  pivot_longer(-cod_user) %>%
  drop_na() %>%
  pivot_wider(cod_user)
#> # A tibble: 6 × 5
#>   cod_user   ene   feb   mar   abr
#>   <chr>    <int> <int> <int> <int>
#> 1 ES7823       1     1     1     0
#> 2 AR3442       1     1     1     1
#> 3 CO9382       1     0     0     0
#> 4 ES5611       1     1     1     1
#> 5 IT9982       1     1     0     0
#> 6 PT6628       1     0     0     0

Created on 2022-12-18 with reprex v2.0.2


Data

df1 <- "cod_user   ene   feb   mar   abr
 1 ES7823       1    NA    NA    NA
 2 AR3442       1    NA    NA    NA
 3 CO9382       1    NA    NA    NA
 4 ES5611       1    NA    NA    NA
 5 IT9982       1    NA    NA    NA
 6 PT6628       1    NA    NA    NA
 7 ES7823      NA     1    NA    NA
 8 AR3442      NA     1    NA    NA
 9 CO9382      NA     0    NA    NA
10 ES5611      NA     1    NA    NA
11 IT9982      NA     1    NA    NA
12 PT6628      NA     0    NA    NA
13 ES7823      NA    NA     1    NA
14 AR3442      NA    NA     1    NA
15 CO9382      NA    NA     0    NA
16 ES5611      NA    NA     1    NA
17 IT9982      NA    NA     0    NA
18 PT6628      NA    NA     0    NA
19 ES7823      NA    NA    NA     0
20 AR3442      NA    NA    NA     1
21 CO9382      NA    NA    NA     0
22 ES5611      NA    NA    NA     1
23 IT9982      NA    NA    NA     0
24 PT6628      NA    NA    NA     0"
df1 <- read.table(textConnection(df1), header = TRUE)

Created on 2022-12-18 with reprex v2.0.2

CodePudding user response:

Here is an alternative approach:

library(dplyr)

coalesce_by_column <- function(df) {
  return(dplyr::coalesce(!!! as.list(df)))
}

df1 %>%
  group_by(cod_user) %>% 
  summarise(across(everything(), coalesce_by_column))       
  cod_user   ene   feb   mar   abr
  <chr>    <int> <int> <int> <int>
1 AR3442       1     1     1     1
2 CO9382       1     0     0     0
3 ES5611       1     1     1     1
4 ES7823       1     1     1     0
5 IT9982       1     1     0     0
6 PT6628       1     0     0     0
  • Related