I have election data where two of the elections fail to differentiate between the contest for Treasurer at the state level and the contest for County Treasurer. Data is presented like so:
> elections
# A tibble: 13 x 6
year election contest data party number
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2010 general Treasurer Registered Voters NP 628
2 2010 general Treasurer Times Counted NP 320
3 2010 general Treasurer Times Blank Voted NP 4
4 2010 general Treasurer Times Over Voted NP 0
5 2010 general Treasurer Scott K. Summers G 15
6 2010 general Treasurer Dan Rutherford R 245
7 2010 general Treasurer Robin Kelly D 51
8 2010 general Treasurer James Pauly Lib 5
9 2010 general Treasurer Registered Voters NP 628
10 2010 general Treasurer Times Counted NP 320
11 2010 general Treasurer Times Blank Voted NP 38
12 2010 general Treasurer Times Over Voted NP 0
13 2010 general Treasurer Rebecca C. McNeil R 282
Lines 9-13 should be "County Treasurer," as most years are. The data is by precinct so manually editing the data would be a pain.
Both of the years that have this issue, Rebecca C. McNeil is the only candidate, so it's easy to identify where the rows I need to change are and use mutate
to fix the contest associated with the candidate.
library(dplyr)
> elections %>%
mutate(contest = case_when(
str_detect(data, "Rebecca C") ~ "County Treasurer",
TRUE ~ contest))
# A tibble: 13 x 6
year election contest data party number
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2010 general Treasurer Registered Voters NP 628
2 2010 general Treasurer Times Counted NP 320
3 2010 general Treasurer Times Blank Voted NP 4
4 2010 general Treasurer Times Over Voted NP 0
5 2010 general Treasurer Scott K. Summers G 15
6 2010 general Treasurer Dan Rutherford R 245
7 2010 general Treasurer Robin Kelly D 51
8 2010 general Treasurer James Pauly Lib 5
9 2010 general Treasurer Registered Voters NP 628
10 2010 general Treasurer Times Counted NP 320
11 2010 general Treasurer Times Blank Voted NP 38
12 2010 general Treasurer Times Over Voted NP 0
13 2010 general County Treasurer Rebecca C. McNeil R 282
But I need the 4 preceding rows to also have "County Treasurer" for the contest, with desired output as:
# A tibble: 13 x 6
year election contest data party number
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2010 general Treasurer Registered Voters NP 628
2 2010 general Treasurer Times Counted NP 320
3 2010 general Treasurer Times Blank Voted NP 4
4 2010 general Treasurer Times Over Voted NP 0
5 2010 general Treasurer Scott K. Summers G 15
6 2010 general Treasurer Dan Rutherford R 245
7 2010 general Treasurer Robin Kelly D 51
8 2010 general Treasurer James Pauly Lib 5
9 2010 general County Treasurer Registered Voters NP 628
10 2010 general County Treasurer Times Counted NP 320
11 2010 general County Treasurer Times Blank Voted NP 38
12 2010 general County Treasurer Times Over Voted NP 0
13 2010 general County Treasurer Rebecca C. McNeil R 282
I've been thinking I could possibly do this through adding row numbers and using them for identification but I haven't figured out how to make that work. Any help would be appreciated.
CodePudding user response:
Here's an example of how you can add a row index and count back from the target row with edits.
# example data
library(tidyverse)
df <- tibble(a = c(rep("election_A", 9), "election_B"), b = letters[1:10])
df
# A tibble: 10 × 2
a b
<chr> <chr>
1 election_A a
2 election_A b
3 election_A c
4 election_A d
5 election_A e
6 election_A f
7 election_A g
8 election_A h
9 election_A i
10 election_B j # <-- anchor row for updating previous rows
Now count back from the 'election_B'
row and update the values in column b
:
lag_ct <- 4
target <- "election_B"
new_label <- "foo"
df %>%
rowid_to_column() %>%
mutate(b = if_else(rowid >= rowid[a == target] - lag_ct, new_label, b))
# A tibble: 10 × 3
rowid a b
<int> <chr> <chr>
1 1 election_A a
2 2 election_A b
3 3 election_A c
4 4 election_A d
5 5 election_A e
6 6 election_A foo
7 7 election_A foo
8 8 election_A foo
9 9 election_A foo
10 10 election_B foo