I'm working with a large df
on cross-national time-series protest patterns. I would like to create a dummy variable based on whether one or a select group of words appears in any of those columns. I have included data below. Here's verbally what I want to do:
If the phrases (1) political behavior (2) police brutality, or (3) removal of politician appear in either protesterdemand1
, protesterdemand2
, protesterdemand3
, protesterdemand1
, then create a dummy variable sensitive_issue
that takes a value of 1 and 0 otherwise.
Thanks!
structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada"), COWcode = c(20L, 20L, 20L, 20L, 20L, 20L
), Year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L), Region = c("North America",
"North America", "North America", "North America", "North America",
"North America"), Protest = c(1L, 1L, 1L, 1L, 1L, 1L), protesterviolence = c(0L,
0L, 0L, 1L, 1L, 0L), protesterdemand1 = c("political behavior, process",
"political behavior, process", "political behavior, process",
"land farm issue", "political behavior, process", "police brutality"
), protesterdemand2 = c("labor wage dispute", "", "", "", "",
""), protesterdemand3 = c("", "", "", "", "", ""), protesterdemand4 = c("",
"", "", "", "", ""), stateresponse1 = c("ignore", "ignore", "ignore",
"accomodation", "crowd dispersal", "crowd dispersal"), stateresponse2 = c("",
"", "", "", "arrests", "shootings"), stateresponse3 = c("", "",
"", "", "accomodation", ""), stateresponse4 = c("", "", "", "",
"", ""), stateresponse5 = c("", "", "", "", "", ""), stateresponse6 = c("",
"", "", "", "", ""), stateresponse7 = c("", "", "", "", "", ""
), participants = c("1000s", "1000", "500", "100s", "950", "200"
), participants_category = c("", "", "", "", "", "")), row.names = c(NA,
6L), class = "data.frame")
CodePudding user response:
base R
found <- sapply(dat[c("protesterdemand1", "protesterdemand2", "protesterdemand3", "protesterdemand1")],
grepl, pattern = "political behavior|police brutality|removal of politician", ignore.case = TRUE) # ignore is just-in-case, over to you
found
# protesterdemand1 protesterdemand2 protesterdemand3 protesterdemand1.1
# [1,] TRUE FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE TRUE
# [3,] TRUE FALSE FALSE TRUE
# [4,] FALSE FALSE FALSE FALSE
# [5,] TRUE FALSE FALSE TRUE
# [6,] TRUE FALSE FALSE TRUE
dat$sensitive_issue <- rowSums(found) > 0
dat
# Country COWcode Year Region Protest protesterviolence protesterdemand1 protesterdemand2 protesterdemand3
# 1 Canada 20 1990 North America 1 0 political behavior, process labor wage dispute
# 2 Canada 20 1990 North America 1 0 political behavior, process
# 3 Canada 20 1990 North America 1 0 political behavior, process
# 4 Canada 20 1990 North America 1 1 land farm issue
# 5 Canada 20 1990 North America 1 1 political behavior, process
# 6 Canada 20 1990 North America 1 0 police brutality
# protesterdemand4 stateresponse1 stateresponse2 stateresponse3 stateresponse4 stateresponse5 stateresponse6 stateresponse7
# 1 ignore
# 2 ignore
# 3 ignore
# 4 accomodation
# 5 crowd dispersal arrests accomodation
# 6 crowd dispersal shootings
# participants participants_category sensitive_issue
# 1 1000s TRUE
# 2 1000 TRUE
# 3 500 TRUE
# 4 100s FALSE
# 5 950 TRUE
# 6 200 TRUE