I have the following data frame with Buy, Sell or do Nothing by date:
df <- structure(list(date = structure(c(17448, 17449, 17450, 17451,
17452, 17455, 17456, 17457, 17458, 17459), class = "Date"), Activity = c("Buy",
"Nothing", "Nothing", "Sell", "Nothing", "Sell", "Buy", "Nothing",
"Nothing", "Nothing")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
I want to keep the sell/buy, only if the previous command was the opposite. I.e. if there are two consecutive "sells" (possibly with "nothing" in between), I want to only keep the first one (and replace the second one with "nothing").
How can I accomplish this without a loop? Consider a huge data frame
Expected output (Only row 6 has changed):
structure(list(date = structure(c(17448, 17449, 17450, 17451,
17452, 17455, 17456, 17457, 17458, 17459), class = "Date"), Activity =
c("Buy",
"Nothing", "Nothing", "Sell", "Nothing", "Nothing", "Buy", "Nothing",
"Nothing", "Nothing")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
I'm not sure how well this generalises so you probably need to do some testing on a larger dataset, but here is an option
df %>%
mutate(tmp = na_if(Activity, "Nothing")) %>%
fill(tmp) %>%
mutate(
Activity = replace(
Activity, lag(tmp) == tmp & Activity == tmp, "Nothing")) %>%
select(-tmp)
## A tibble: 10 × 2
# date Activity
# <date> <chr>
# 1 2017-10-09 Buy
# 2 2017-10-10 Nothing
# 3 2017-10-11 Nothing
# 4 2017-10-12 Sell
# 5 2017-10-13 Nothing
# 6 2017-10-16 Nothing
# 7 2017-10-17 Buy
# 8 2017-10-18 Nothing
# 9 2017-10-19 Nothing
#10 2017-10-20 Nothing
The idea is to create a TRUE
/FALSE
flag (defined by the logical expression lag(tmp) == tmp & Activity == tmp
) and based on (lagged) values in a tmp
column that has all "Nothing"
values replaced with the previous value.