Home > Back-end >  Creating a dummy variable based on whether words appear in multiple columns
Creating a dummy variable based on whether words appear in multiple columns

Time:05-12

I'm working with a large df on cross-national time-series protest patterns. I would like to create a dummy variable based on whether one or a select group of words appears in any of those columns. I have included data below. Here's verbally what I want to do:

If the phrases (1) political behavior (2) police brutality, or (3) removal of politician appear in either protesterdemand1, protesterdemand2, protesterdemand3, protesterdemand1, then create a dummy variable sensitive_issue that takes a value of 1 and 0 otherwise.

Thanks!

structure(list(Country = c("Canada", "Canada", "Canada", "Canada", 
"Canada", "Canada"), COWcode = c(20L, 20L, 20L, 20L, 20L, 20L
), Year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L), Region = c("North America", 
"North America", "North America", "North America", "North America", 
"North America"), Protest = c(1L, 1L, 1L, 1L, 1L, 1L), protesterviolence = c(0L, 
0L, 0L, 1L, 1L, 0L), protesterdemand1 = c("political behavior, process", 
"political behavior, process", "political behavior, process", 
"land farm issue", "political behavior, process", "police brutality"
), protesterdemand2 = c("labor wage dispute", "", "", "", "", 
""), protesterdemand3 = c("", "", "", "", "", ""), protesterdemand4 = c("", 
"", "", "", "", ""), stateresponse1 = c("ignore", "ignore", "ignore", 
"accomodation", "crowd dispersal", "crowd dispersal"), stateresponse2 = c("", 
"", "", "", "arrests", "shootings"), stateresponse3 = c("", "", 
"", "", "accomodation", ""), stateresponse4 = c("", "", "", "", 
"", ""), stateresponse5 = c("", "", "", "", "", ""), stateresponse6 = c("", 
"", "", "", "", ""), stateresponse7 = c("", "", "", "", "", ""
), participants = c("1000s", "1000", "500", "100s", "950", "200"
), participants_category = c("", "", "", "", "", "")), row.names = c(NA, 
6L), class = "data.frame")

CodePudding user response:

base R

found <- sapply(dat[c("protesterdemand1", "protesterdemand2", "protesterdemand3", "protesterdemand1")],
                grepl, pattern = "political behavior|police brutality|removal of politician", ignore.case = TRUE) # ignore is just-in-case, over to you
found
#      protesterdemand1 protesterdemand2 protesterdemand3 protesterdemand1.1
# [1,]             TRUE            FALSE            FALSE               TRUE
# [2,]             TRUE            FALSE            FALSE               TRUE
# [3,]             TRUE            FALSE            FALSE               TRUE
# [4,]            FALSE            FALSE            FALSE              FALSE
# [5,]             TRUE            FALSE            FALSE               TRUE
# [6,]             TRUE            FALSE            FALSE               TRUE

dat$sensitive_issue <- rowSums(found) > 0

dat
#   Country COWcode Year        Region Protest protesterviolence            protesterdemand1   protesterdemand2 protesterdemand3
# 1  Canada      20 1990 North America       1                 0 political behavior, process labor wage dispute                 
# 2  Canada      20 1990 North America       1                 0 political behavior, process                                    
# 3  Canada      20 1990 North America       1                 0 political behavior, process                                    
# 4  Canada      20 1990 North America       1                 1             land farm issue                                    
# 5  Canada      20 1990 North America       1                 1 political behavior, process                                    
# 6  Canada      20 1990 North America       1                 0            police brutality                                    
#   protesterdemand4  stateresponse1 stateresponse2 stateresponse3 stateresponse4 stateresponse5 stateresponse6 stateresponse7
# 1                           ignore                                                                                          
# 2                           ignore                                                                                          
# 3                           ignore                                                                                          
# 4                     accomodation                                                                                          
# 5                  crowd dispersal        arrests   accomodation                                                            
# 6                  crowd dispersal      shootings                                                                           
#   participants participants_category sensitive_issue
# 1        1000s                                  TRUE
# 2         1000                                  TRUE
# 3          500                                  TRUE
# 4         100s                                 FALSE
# 5          950                                  TRUE
# 6          200                                  TRUE
  • Related