Home > Software design >  How to use dplyr to edit subset of rows based on relative location?
How to use dplyr to edit subset of rows based on relative location?

Time:12-13

I have election data where two of the elections fail to differentiate between the contest for Treasurer at the state level and the contest for County Treasurer. Data is presented like so:

> elections
# A tibble: 13 x 6
    year election contest   data              party number
   <dbl> <chr>    <chr>     <chr>             <chr>  <dbl>
 1  2010 general  Treasurer Registered Voters NP       628
 2  2010 general  Treasurer Times Counted     NP       320
 3  2010 general  Treasurer Times Blank Voted NP         4
 4  2010 general  Treasurer Times Over Voted  NP         0
 5  2010 general  Treasurer Scott K. Summers  G         15
 6  2010 general  Treasurer Dan Rutherford    R        245
 7  2010 general  Treasurer Robin Kelly       D         51
 8  2010 general  Treasurer James Pauly       Lib        5
 9  2010 general  Treasurer Registered Voters NP       628
10  2010 general  Treasurer Times Counted     NP       320
11  2010 general  Treasurer Times Blank Voted NP        38
12  2010 general  Treasurer Times Over Voted  NP         0
13  2010 general  Treasurer Rebecca C. McNeil R        282

Lines 9-13 should be "County Treasurer," as most years are. The data is by precinct so manually editing the data would be a pain.

Both of the years that have this issue, Rebecca C. McNeil is the only candidate, so it's easy to identify where the rows I need to change are and use mutate to fix the contest associated with the candidate.

library(dplyr)

> elections %>% 
    mutate(contest = case_when(
      str_detect(data, "Rebecca C") ~ "County Treasurer",
      TRUE ~ contest))

# A tibble: 13 x 6
    year election contest          data              party number
   <dbl> <chr>    <chr>            <chr>             <chr>  <dbl>
 1  2010 general  Treasurer        Registered Voters NP       628
 2  2010 general  Treasurer        Times Counted     NP       320
 3  2010 general  Treasurer        Times Blank Voted NP         4
 4  2010 general  Treasurer        Times Over Voted  NP         0
 5  2010 general  Treasurer        Scott K. Summers  G         15
 6  2010 general  Treasurer        Dan Rutherford    R        245
 7  2010 general  Treasurer        Robin Kelly       D         51
 8  2010 general  Treasurer        James Pauly       Lib        5
 9  2010 general  Treasurer        Registered Voters NP       628
10  2010 general  Treasurer        Times Counted     NP       320
11  2010 general  Treasurer        Times Blank Voted NP        38
12  2010 general  Treasurer        Times Over Voted  NP         0
13  2010 general  County Treasurer Rebecca C. McNeil R        282

But I need the 4 preceding rows to also have "County Treasurer" for the contest, with desired output as:

# A tibble: 13 x 6
    year election contest          data              party number
   <dbl> <chr>    <chr>            <chr>             <chr>  <dbl>
 1  2010 general  Treasurer        Registered Voters NP       628
 2  2010 general  Treasurer        Times Counted     NP       320
 3  2010 general  Treasurer        Times Blank Voted NP         4
 4  2010 general  Treasurer        Times Over Voted  NP         0
 5  2010 general  Treasurer        Scott K. Summers  G         15
 6  2010 general  Treasurer        Dan Rutherford    R        245
 7  2010 general  Treasurer        Robin Kelly       D         51
 8  2010 general  Treasurer        James Pauly       Lib        5
 9  2010 general  County Treasurer Registered Voters NP       628
10  2010 general  County Treasurer Times Counted     NP       320
11  2010 general  County Treasurer Times Blank Voted NP        38
12  2010 general  County Treasurer Times Over Voted  NP         0
13  2010 general  County Treasurer Rebecca C. McNeil R        282

I've been thinking I could possibly do this through adding row numbers and using them for identification but I haven't figured out how to make that work. Any help would be appreciated.

CodePudding user response:

Here's an example of how you can add a row index and count back from the target row with edits.

# example data
library(tidyverse)

df <- tibble(a = c(rep("election_A", 9), "election_B"), b = letters[1:10])
df

# A tibble: 10 × 2
   a          b    
   <chr>      <chr>
 1 election_A a    
 2 election_A b    
 3 election_A c    
 4 election_A d    
 5 election_A e    
 6 election_A f    
 7 election_A g    
 8 election_A h    
 9 election_A i    
10 election_B j   # <-- anchor row for updating previous rows

Now count back from the 'election_B' row and update the values in column b:

lag_ct <- 4
target <- "election_B"
new_label <- "foo"

df %>% 
  rowid_to_column() %>% 
  mutate(b = if_else(rowid >= rowid[a == target] - lag_ct, new_label, b))
    
# A tibble: 10 × 3
rowid a          b    
<int> <chr>      <chr>
1     1 election_A a    
2     2 election_A b    
3     3 election_A c    
4     4 election_A d    
5     5 election_A e    
6     6 election_A foo  
7     7 election_A foo  
8     8 election_A foo  
9     9 election_A foo  
10   10 election_B foo  

  • Related