Change multiple party choices variables into one yes no variables for all my sample-CodePudding

I am studying the votes in home and host country elections looking at the turkish diaspora in Europe.

Since I already have a yes-no variable (meaning did you vote or not) for the home country election, I'd like to have the same for all host countries of my dataset.

As it is I have the following variables each for one country coded "partyname" or "none" or NA. Example for Germany D$partyvotedger its either "SPD", "none" or NA. They are character vectors.

1 NA                                     
 2 SPD                                    
 3 none / not chosen / not going to choose
 4 NA                                     
 5 none / not chosen / not going to choose
 6 none / not chosen / not going to choose
 7 none / not chosen / not going to choose
 8 NA                                     
 9 none / not chosen / not going to choose
10 none / not chosen / not going to choose
# … with 2,347 more rows
#

I'd like to combine all four countries into one variable coded "yes" "no" or NA where "yes" occurs whenever a party name appears and no whenever "none" appears.

Get something like this: (This the vote in home country election where 0= no and 1= yes)

`D$vote_turkey`
             <dbl>
 1               0
 2               0
 3               1
 4               1
 5               1
 6               1
 7               1
 8               0
 9               0
10               1

Has anyone here an Idea of what function or code structure I should use ? Do I need to first create a yes-no for each country before having a unique one ?

Thanks in advance for your answer.

All the best.

CodePudding user response：

Here is my answer. First I create an example data set based on what you have provided.

library(dplyr)

#creating the example data frame
example <- data.frame("partyvotedgr" = c(NA,                                     
                                           "SPD",                               
                                           "none / not chosen / not going to choose",
                                           NA,
                                           "none / not chosen / not going to choose",
                                           "none / not chosen / not going to choose",
                                           "none / not chosen / not going to choose",
                                           NA,
                                           "none / not chosen / not going to choose",
                                           "none / not chosen / not going to choose",
                                           "CDU",
                                           "DIE LINKE"))
#checking how the table looks like
example
"partyvotedgr
1                                     <NA>
2                                      SPD
3  none / not chosen / not going to choose
4                                     <NA>
5  none / not chosen / not going to choose
6  none / not chosen / not going to choose
7  none / not chosen / not going to choose
8                                     <NA>
9  none / not chosen / not going to choose
10 none / not chosen / not going to choose
11                                     CDU
12                               DIE LINKE"

I noticed that you're currently focusing on Germany so I added some additional political parties for better simulation.

Then, you declare a string vector that contains a list of parties in a target company. For instance, it would be like this for Germany.

#retrieving only 2 & 3
party_value <- c("SPD", "CDU", "DIE LINKE")

Of course you should add more values to cover all the parties.

Then you create a similar column(vote_ger) like the vote_turkey. I give the NA value as a base value.

#creating the 'vote_ger' column
example <- example %>% 
  mutate(vote_ger = NA_character_)

Next, you change the vote_ger values according to the values in the partyvotedgr values.

#adjusting the vote_ger values
#creating the yes values according to the party_value
example <- example %>%
  mutate(vote_ger=ifelse(partyvotedgr %in% party_value,
                         "yes",
                         vote_ger))

Please note that %in% operator is a set operation returning boolean values(either True or False). In this case, it is returning if an element in the partyvotedgr column is an element of the party_value vector.

You do almost the exact same process to give "no" values in the vote_ger column.

#creating the no values if they have the value 'none / not chosen / not going to choose' in the partyvotedgr column
example <- example %>%
  mutate(vote_ger=ifelse(partyvotedgr %in% "none / not chosen / not going to choose",
                         "no",
                         vote_ger))

This time, vote_ger column will have a "no" if it has "none / not chosen / not going to choose" value in the partyvotedgr column.

After all the process the data set example looks like this.

#the result
example

"partyvotedgr vote_ger
1                                     <NA>     <NA>
2                                      SPD      yes
3  none / not chosen / not going to choose       no
4                                     <NA>     <NA>
5  none / not chosen / not going to choose       no
6  none / not chosen / not going to choose       no
7  none / not chosen / not going to choose       no
8                                     <NA>     <NA>
9  none / not chosen / not going to choose       no
10 none / not chosen / not going to choose       no
11                                     CDU      yes
12                               DIE LINKE      yes"

CodePudding user response：

The solution is very simple in base-r because R vectorises comparison operators for you.

If the "no" value is always the same string, then != will be enough:

example$vote <- as.numeric(example$partyvotedgr != "none / not chosen / not going to choose")

example
                              partyvotedgr vote
1                                     <NA>   NA
2                                      SPD    1
3  none / not chosen / not going to choose    0
4                                     <NA>   NA
5  none / not chosen / not going to choose    0
6  none / not chosen / not going to choose    0
7  none / not chosen / not going to choose    0
8                                     <NA>   NA
9  none / not chosen / not going to choose    0
10 none / not chosen / not going to choose    0
11                                     CDU    1
12                               DIE LINKE    1

The != comparison returns an array of TRUE, FALSE and NA, and as.numeric turns the TRUE/FALSE into 1/0. Consider whether it may be better to keep them as TRUE/FALSE.

If there are several options that count as "no", the operator would be %in%, as in (!example$partyvotedgr %in% c("none", "not chosen", "not going to choose"))

Thank you to B_Heidel for the example df.