This is my dataset
State Response
AL NO
AR NO
AK NO
AZ NO
CA YES
CA NO
CO YES
CO NO
FL NO
GA NO
Some states are both Yes and No like CA, CO.. How can I collapse the Yes, No values per state and change the Yes, No to just Yes. This is my expected output
State Response
AL NO
AR NO
AK NO
AZ NO
CA YES
CO YES
FL NO
GA NO
Any suggestions much appreciated. Thanks.
CodePudding user response:
We can arrange
the rows by 'State' and logical vector of 'Response', then filter
the unique rows
library(dplyr)
df1 %>%
arrange(State, Response != "YES") %>%
filter(!duplicated(State))
-output
State Response
1 AK NO
2 AL NO
3 AR NO
4 AZ NO
5 CA YES
6 CO YES
7 FL NO
8 GA NO
data
df1 <- structure(list(State = c("AL", "AR", "AK", "AZ", "CA", "CA",
"CO", "CO", "FL", "GA"), Response = c("NO", "NO", "NO", "NO",
"YES", "NO", "YES", "NO", "NO", "NO")), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
I'd recommend distinct()
(with .keep_all = TRUE
to keep all columns, but drop duplicate rows) from dplyr
. For example:
State <- c("AL", "AR", "AK", "AZ", "CA", "CA", "CO", "CO", "FL", "GA")
Response <- c("NO", "NO", "NO", "NO", "YES", "NO", "YES", "NO", "NO", "NO")
data <- bind_cols(
State, Response
) %>%
rename(
State = `...1`,
Response = `...2`
) %>%
distinct(State, .keep_all= TRUE)
data
Output:
> data
# A tibble: 8 × 2
State Response
<chr> <chr>
1 AL NO
2 AR NO
3 AK NO
4 AZ NO
5 CA YES
6 CO YES
7 FL NO
8 GA NO
CodePudding user response:
An alternative solution:
library(dplyr)
df %>%
count(State) %>%
mutate(Response = ifelse(n>1,"YES","NO")) %>%
select(-n)
EDITED:
library(dplyr)
df %>%
count(State) %>%
mutate(Response = ifelse(n>1,"YES",NA)) %>%
replace(is.na(.),"NO") %>%
select(-n)
EDITED 2:
library(dplyr)
df %>%
add_count(State) %>%
mutate(Response = ifelse(n>1,"YES",Response)) %>%
distinct(select(.,-n))