Home > Enterprise >  r collapse binary response to single response
r collapse binary response to single response

Time:10-01

This is my dataset

 State   Response
 AL      NO
 AR      NO
 AK      NO
 AZ      NO
 CA      YES
 CA      NO
 CO      YES
 CO      NO
 FL      NO
 GA      NO

Some states are both Yes and No like CA, CO.. How can I collapse the Yes, No values per state and change the Yes, No to just Yes. This is my expected output

 State   Response
 AL      NO
 AR      NO
 AK      NO
 AZ      NO
 CA      YES
 CO      YES
 FL      NO
 GA      NO

Any suggestions much appreciated. Thanks.

CodePudding user response:

We can arrange the rows by 'State' and logical vector of 'Response', then filter the unique rows

library(dplyr)
df1 %>%
   arrange(State, Response != "YES") %>%
   filter(!duplicated(State))

-output

   State Response
1    AK       NO
2    AL       NO
3    AR       NO
4    AZ       NO
5    CA      YES
6    CO      YES
7    FL       NO
8    GA       NO

data

df1 <- structure(list(State = c("AL", "AR", "AK", "AZ", "CA", "CA", 
"CO", "CO", "FL", "GA"), Response = c("NO", "NO", "NO", "NO", 
"YES", "NO", "YES", "NO", "NO", "NO")), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response:

I'd recommend distinct() (with .keep_all = TRUE to keep all columns, but drop duplicate rows) from dplyr. For example:

State <- c("AL", "AR", "AK", "AZ", "CA", "CA", "CO", "CO", "FL", "GA")
Response <- c("NO", "NO", "NO", "NO", "YES", "NO", "YES", "NO", "NO", "NO")

data <- bind_cols(
  State, Response
) %>% 
  rename(
    State = `...1`,
    Response = `...2`
  ) %>% 
  distinct(State, .keep_all= TRUE)

data

Output:

> data
# A tibble: 8 × 2
  State Response
  <chr> <chr>   
1 AL    NO      
2 AR    NO      
3 AK    NO      
4 AZ    NO      
5 CA    YES     
6 CO    YES     
7 FL    NO      
8 GA    NO 

CodePudding user response:

An alternative solution:

library(dplyr)
df %>% 
  count(State) %>% 
  mutate(Response = ifelse(n>1,"YES","NO")) %>% 
  select(-n)

EDITED:

library(dplyr)
df %>% 
  count(State) %>% 
  mutate(Response = ifelse(n>1,"YES",NA)) %>%
  replace(is.na(.),"NO") %>%
  select(-n)

EDITED 2:

library(dplyr)
df %>% 
  add_count(State) %>% 
  mutate(Response = ifelse(n>1,"YES",Response)) %>% 
  distinct(select(.,-n))
  • Related