Home > Software engineering >  In R dplyr, retain value only for specific previous value
In R dplyr, retain value only for specific previous value

Time:10-07

I have the following data frame with Buy, Sell or do Nothing by date:

df <- structure(list(date = structure(c(17448, 17449, 17450, 17451, 
17452, 17455, 17456, 17457, 17458, 17459), class = "Date"), Activity = c("Buy", 
"Nothing", "Nothing", "Sell", "Nothing", "Sell", "Buy", "Nothing", 
"Nothing", "Nothing")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to keep the sell/buy, only if the previous command was the opposite. I.e. if there are two consecutive "sells" (possibly with "nothing" in between), I want to only keep the first one (and replace the second one with "nothing").

How can I accomplish this without a loop? Consider a huge data frame

Expected output (Only row 6 has changed):

structure(list(date = structure(c(17448, 17449, 17450, 17451, 
17452, 17455, 17456, 17457, 17458, 17459), class = "Date"), Activity = 
c("Buy", 
"Nothing", "Nothing", "Sell", "Nothing", "Nothing", "Buy", "Nothing", 
"Nothing", "Nothing")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

I'm not sure how well this generalises so you probably need to do some testing on a larger dataset, but here is an option

df %>%
    mutate(tmp = na_if(Activity, "Nothing")) %>%
    fill(tmp) %>%
    mutate(
        Activity = replace(
            Activity, lag(tmp) == tmp & Activity == tmp, "Nothing")) %>%
    select(-tmp)
## A tibble: 10 × 2
#   date       Activity
#   <date>     <chr>   
# 1 2017-10-09 Buy     
# 2 2017-10-10 Nothing 
# 3 2017-10-11 Nothing 
# 4 2017-10-12 Sell    
# 5 2017-10-13 Nothing 
# 6 2017-10-16 Nothing 
# 7 2017-10-17 Buy     
# 8 2017-10-18 Nothing 
# 9 2017-10-19 Nothing 
#10 2017-10-20 Nothing 

The idea is to create a TRUE/FALSE flag (defined by the logical expression lag(tmp) == tmp & Activity == tmp) and based on (lagged) values in a tmp column that has all "Nothing" values replaced with the previous value.

  • Related