My data is like this:
country supporter1 supporter2 supporter3 supporter4 supporter5
USA Albania Germany USA NA NA
France USA France NA NA NA
UK UK Chile Peru NA NA
Germany USA Iran Mexico India Pakistan
USA China Spain NA NA NA
Cuba Cuba UK Germany South Korea NA
China Russia NA NA NA NA
What I want to do is to create a new variable when the country column and one of the remaining supporter columns (supporter 1, supporter 2, supporter 3, supporter 4, and supporter 5) are the same (for instance country France and supporter2 France are the same). In this case, the new variable should take 1, 0 otherwise.
I expect to have this:
country supporter1 supporter2 supporter3 supporter4 supporter5 new variable
USA Albania Germany USA NA NA 1
France USA France NA NA NA 1
UK UK Chile Peru NA NA 1
Germany USA Iran Mexico India Pakistan 0
USA China Spain NA NA NA 0
Cuba Cuba UK Germany South Korea NA 1
China Russia NA NA NA NA 0
CodePudding user response:
Update dplyr only solution Using if_any
:
library(dplyr)
df %>%
rowwise() %>%
mutate(new_var = as.integer(as.logical(if_any(starts_with("supporter"), ~ . %in% country))))
country supporter1 supporter2 supporter3 supporter4 supporter5 new_var
<chr> <chr> <chr> <chr> <chr> <chr> <int>
1 USA Albania Germany USA NA NA 1
2 France USA France NA NA NA 1
3 UK UK Chile Peru NA NA 1
4 Germany USA Iran Mexico India Pakistan 0
5 USA China Spain NA NA NA 0
6 Cuba Cuba UK Germany South Korea NA 1
7 China Russia NA NA NA NA 0
First answer: also correct: Here is one possible solution:
- calculate
rowwise
- check in cols
supporter1
tosupporter5
if country is included unite
all new columns to one and with an ifelse statement take1
or0
library(dplyr)
library(stringr)
library(tidyr)
df %>%
rowwise() %>%
mutate(across(supporter1:supporter5, ~ifelse(. %in% country, 1,0), .names = "new_{col}")) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(New_Col = ifelse(str_detect(New_Col, "1"), 1,0))
country supporter1 supporter2 supporter3 supporter4 supporter5 New_Col
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 USA Albania Germany USA NA NA 1
2 France USA France NA NA NA 1
3 UK UK Chile Peru NA NA 1
4 Germany USA Iran Mexico India Pakistan 0
5 USA China Spain NA NA NA 0
6 Cuba Cuba UK Germany South Korea NA 1
7 China Russia NA NA NA NA 0
CodePudding user response:
Here is a base R solution.
First mapply
checks for equality of suporter*
and country
. NA
's are considered to return FALSE
. Then as.integer/rowSums
transforms rows with at least one TRUE
into 1, otherwise 0.
eq <- mapply(\(x, y){x == y & !is.na(x)}, df1[-1], df1[1])
as.integer(rowSums(eq) != 0)
#[1] 1 1 1 0 0 1 0
df1$new_variable <- as.integer(rowSums(eq) != 0)
Data
df1 <- read.table(text = "
country supporter1 supporter2 supporter3 supporter4 supporter5
USA Albania Germany USA NA NA
France USA France NA NA NA
UK UK Chile Peru NA NA
Germany USA Iran Mexico India Pakistan
USA China Spain NA NA NA
Cuba Cuba UK Germany 'South Korea' NA
China Russia NA NA NA NA
", header = TRUE)
CodePudding user response:
Another solution is checking per row whether country
is present in one of the columns:
df <- data.frame(country=c("USA","France","UK","Germany","USA","Cuba","China"),
supporter1=c("Albania","USA","UK","USA","China","Cuba","Russia"),
supporter2=c("Germany","France","Chile","Iran","Spain","UK","NA"),
supporter3=c("USA","NA","Peru","Mexico","NA","Germany","NA"),
supporter4=c("NA","NA","NA","India","NA","South Korea","NA"),
supporter5=c("NA","NA","NA","Pakistan","NA","NA","NA"))
That would give:
df$new <- sapply(seq(1,nrow(df)), function(x) ifelse(df$country[x] %in% df[x,2:6],1,0))
> df$new
[1] 1 1 1 0 0 1 0