I am studying the votes in home and host country elections looking at the turkish diaspora in Europe.
Since I already have a yes-no variable (meaning did you vote or not) for the home country election, I'd like to have the same for all host countries of my dataset.
As it is I have the following variables each for one country coded "partyname" or "none" or NA. Example for Germany D$partyvotedger its either "SPD", "none" or NA. They are character vectors.
1 NA
2 SPD
3 none / not chosen / not going to choose
4 NA
5 none / not chosen / not going to choose
6 none / not chosen / not going to choose
7 none / not chosen / not going to choose
8 NA
9 none / not chosen / not going to choose
10 none / not chosen / not going to choose
# … with 2,347 more rows
#
I'd like to combine all four countries into one variable coded "yes" "no" or NA where "yes" occurs whenever a party name appears and no whenever "none" appears.
Get something like this: (This the vote in home country election where 0= no and 1= yes)
`D$vote_turkey`
<dbl>
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 0
9 0
10 1
Has anyone here an Idea of what function or code structure I should use ? Do I need to first create a yes-no for each country before having a unique one ?
Thanks in advance for your answer.
All the best.
CodePudding user response:
Here is my answer. First I create an example data set based on what you have provided.
library(dplyr)
#creating the example data frame
example <- data.frame("partyvotedgr" = c(NA,
"SPD",
"none / not chosen / not going to choose",
NA,
"none / not chosen / not going to choose",
"none / not chosen / not going to choose",
"none / not chosen / not going to choose",
NA,
"none / not chosen / not going to choose",
"none / not chosen / not going to choose",
"CDU",
"DIE LINKE"))
#checking how the table looks like
example
"partyvotedgr
1 <NA>
2 SPD
3 none / not chosen / not going to choose
4 <NA>
5 none / not chosen / not going to choose
6 none / not chosen / not going to choose
7 none / not chosen / not going to choose
8 <NA>
9 none / not chosen / not going to choose
10 none / not chosen / not going to choose
11 CDU
12 DIE LINKE"
I noticed that you're currently focusing on Germany so I added some additional political parties for better simulation.
Then, you declare a string vector that contains a list of parties in a target company. For instance, it would be like this for Germany.
#retrieving only 2 & 3
party_value <- c("SPD", "CDU", "DIE LINKE")
Of course you should add more values to cover all the parties.
Then you create a similar column(vote_ger
) like the vote_turkey
. I give the NA value as a base value.
#creating the 'vote_ger' column
example <- example %>%
mutate(vote_ger = NA_character_)
Next, you change the vote_ger
values according to the values in the partyvotedgr
values.
#adjusting the vote_ger values
#creating the yes values according to the party_value
example <- example %>%
mutate(vote_ger=ifelse(partyvotedgr %in% party_value,
"yes",
vote_ger))
Please note that %in%
operator is a set operation returning boolean values(either True or False). In this case, it is returning if an element in the partyvotedgr
column is an element of the party_value
vector.
You do almost the exact same process to give "no"
values in the vote_ger
column.
#creating the no values if they have the value 'none / not chosen / not going to choose' in the partyvotedgr column
example <- example %>%
mutate(vote_ger=ifelse(partyvotedgr %in% "none / not chosen / not going to choose",
"no",
vote_ger))
This time, vote_ger
column will have a "no"
if it has "none / not chosen / not going to choose"
value in the partyvotedgr
column.
After all the process the data set example
looks like this.
#the result
example
"partyvotedgr vote_ger
1 <NA> <NA>
2 SPD yes
3 none / not chosen / not going to choose no
4 <NA> <NA>
5 none / not chosen / not going to choose no
6 none / not chosen / not going to choose no
7 none / not chosen / not going to choose no
8 <NA> <NA>
9 none / not chosen / not going to choose no
10 none / not chosen / not going to choose no
11 CDU yes
12 DIE LINKE yes"
CodePudding user response:
The solution is very simple in base-r because R vectorises comparison operators for you.
If the "no" value is always the same string, then !=
will be enough:
example$vote <- as.numeric(example$partyvotedgr != "none / not chosen / not going to choose")
example
partyvotedgr vote
1 <NA> NA
2 SPD 1
3 none / not chosen / not going to choose 0
4 <NA> NA
5 none / not chosen / not going to choose 0
6 none / not chosen / not going to choose 0
7 none / not chosen / not going to choose 0
8 <NA> NA
9 none / not chosen / not going to choose 0
10 none / not chosen / not going to choose 0
11 CDU 1
12 DIE LINKE 1
The !=
comparison returns an array of TRUE, FALSE and NA, and as.numeric
turns the TRUE/FALSE into 1/0. Consider whether it may be better to keep them as TRUE/FALSE.
If there are several options that count as "no", the operator would be %in%
, as in (!example$partyvotedgr %in% c("none", "not chosen", "not going to choose"))
Thank you to B_Heidel for the example df.