Home > database >  Add new variable with arithmetic conditions
Add new variable with arithmetic conditions

Time:04-08

the randomly generated data frame contains ID, Dates, and Earnings. I changed up the data frame format so that each column represents a date and its values corresponds to the earnings.

I want to create a new variable named "Date_over100 " that would determine the date when one's cumulative earnings have exceeded 100. I have put below a reproducible code that would generate the data frame. I assume conditional statements or loops would be used to achieve this. I would appreciate all the help there is. Thanks in advance!

ID <- c(1:10)
Date <- sample(seq(as.Date('2021/01/01'), as.Date('2021/01/11'), by="day", replace=T), 10)
Earning <- round(runif(10,30,50),digits = 2)
df <- data.frame(ID,Date,Earning,check.names = F)

df1 <- df%>%
  arrange(Date)%>%
  pivot_wider(names_from = Date, values_from = Earning)

df1 <- as.data.frame(df1)
df1[is.na(df1)] <- round(runif(sum(is.na(df1)),min=30,max=50),digits = 2)

CodePudding user response:

I go back to long format for the calculation, then join to the wide data:

library(dplyr)
library(tidyr)

df1 %>% pivot_longer(cols = -ID, names_to = "date") %>%
  group_by(ID) %>%
  summarize(Date_over_100 = Date[which.max(cumsum(value) > 100)]) %>%
  right_join(df1, by = "ID")
# # A tibble: 10 × 12
#       ID Date_over_100 `2021-01-04` `2021-01-01` `2021-01-08` `2021-01-11` `2021-01-02` `2021-01-09`
#    <int> <date>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#  1     1 2021-01-08            45.0         46.2         40.1         47.4         47.5         48.8
#  2     2 2021-01-08            36.7         30.3         36.2         47.5         41.4         41.7
#  3     3 2021-01-08            49.5         46.0         45.0         43.9         45.4         37.1
#  4     4 2021-01-08            31.0         48.7         47.3         40.4         40.8         35.5
#  5     5 2021-01-08            48.2         35.2         32.1         44.2         35.4         49.7
#  6     6 2021-01-08            40.8         37.6         31.8         40.3         38.3         42.5
#  7     7 2021-01-08            37.9         42.9         36.8         46.0         39.8         33.6
#  8     8 2021-01-08            47.7         47.8         39.7         46.4         43.8         46.5
#  9     9 2021-01-08            32.9         42.0         41.8         32.8         33.9         35.5
# 10    10 2021-01-08            34.5         40.1         42.7         35.9         44.8         31.8
# # … with 4 more variables: 2021-01-10 <dbl>, 2021-01-03 <dbl>, 2021-01-07 <dbl>, 2021-01-05 <dbl>
  • Related