Home > other >  Converting dates to days with different starting dates in one data frame
Converting dates to days with different starting dates in one data frame

Time:09-17

I'm having some trouble trying to do a count of days based on starting dates. I basically just want a count of days passed since the starting date by product.

I think it is best illustrated by example. This is what I start with:

df1 <- data.frame(Dates = seq(as.Date("2021/1/1"), as.Date("2021/1/15"), "days"),
                  Product = rep(c(rep("Banana", 5), rep("Apple", 5), rep("Orange", 5)))
                  )
        Dates Product
1  2021-01-01  Banana
2  2021-01-02  Banana
3  2021-01-03  Banana
4  2021-01-04  Banana
5  2021-01-05  Banana
6  2021-01-06   Apple
7  2021-01-07   Apple
8  2021-01-08   Apple
9  2021-01-09   Apple
10 2021-01-10   Apple
11 2021-01-11  Orange
12 2021-01-12  Orange
13 2021-01-13  Orange
14 2021-01-14  Orange
15 2021-01-15  Orange

I currently have several measurements for each product that I need to plot as number of days rather than dates and I cannot make the transformation.

And this is what I want:

desired_df <- data.frame(Dates = seq(as.Date("2021/1/1"), as.Date("2021/1/15"), "days"),
                         Product = rep(c(rep("Banana", 5), rep("Apple", 5), rep("Orange", 5))),
                         Days = rep(seq(0, 4), 3)
)
        Dates Product Days
1  2021-01-01  Banana    0
2  2021-01-02  Banana    1
3  2021-01-03  Banana    2
4  2021-01-04  Banana    3
5  2021-01-05  Banana    4
6  2021-01-06   Apple    0
7  2021-01-07   Apple    1
8  2021-01-08   Apple    2
9  2021-01-09   Apple    3
10 2021-01-10   Apple    4
11 2021-01-11  Orange    0
12 2021-01-12  Orange    1
13 2021-01-13  Orange    2
14 2021-01-14  Orange    3
15 2021-01-15  Orange    4

So far I've tried a few approaches, but none works.

df2 <- df1 %>%
  mutate(Days = Dates - Dates[1])


df3 <- df1 %>% 
  group_by(Product) %>% 
  mutate(Days = Dates - Dates[1])
        Dates Product    Days


starter_dates <- df1 %>% 
  aggregate(by = list(df1$Product), FUN = first)
  Group.1      Dates Product
1   Apple 2021-01-06   Apple
2  Banana 2021-01-01  Banana
3  Orange 2021-01-11  Orange

df4 <- df1 %>% 
  mutate(
    Days = case_when(Product == starter_dates$Product ~ Dates - starter_dates$Dates)
  )

But none produced what I want. How can I calculate the number of days from first appearance?

EDIT:

This is what I get from suggested answers:

> df1 %>% group_by(Product) %>% mutate(Days = as.numeric(Dates - Dates[1]))
# A tibble: 15 x 3
# Groups:   Product [3]
   Dates      Product  Days
   <date>     <chr>   <dbl>
 1 2021-01-01 Banana      0
 2 2021-01-02 Banana      1
 3 2021-01-03 Banana      2
 4 2021-01-04 Banana      3
 5 2021-01-05 Banana      4
 6 2021-01-06 Apple       5
 7 2021-01-07 Apple       6
 8 2021-01-08 Apple       7
 9 2021-01-09 Apple       8
10 2021-01-10 Apple       9
11 2021-01-11 Orange     10
12 2021-01-12 Orange     11
13 2021-01-13 Orange     12
14 2021-01-14 Orange     13
15 2021-01-15 Orange     14

Ensuring no conflicts from other packages, below now works.

df1 %>% group_by(Product) %>%
        mutate(Days=lubridate::day(Dates)-first(lubridate::day(Dates)))

CodePudding user response:

We can subtract the "Date", for every row, from the first "Date" value:

df1 %>% group_by(Product) %>%
        mutate(Days=lubridate::day(Dates)-first(lubridate::day(Dates)))

# A tibble: 15 x 3
# Groups:   Product [3]
   Dates      Product  Days
   <date>     <chr>   <int>
 1 2021-01-01 Banana      0
 2 2021-01-02 Banana      1
 3 2021-01-03 Banana      2
 4 2021-01-04 Banana      3
 5 2021-01-05 Banana      4
 6 2021-01-06 Apple       0
 7 2021-01-07 Apple       1
 8 2021-01-08 Apple       2
 9 2021-01-09 Apple       3
10 2021-01-10 Apple       4
11 2021-01-11 Orange      0
12 2021-01-12 Orange      1
13 2021-01-13 Orange      2
14 2021-01-14 Orange      3
15 2021-01-15 Orange      4

CodePudding user response:

Since using tidyverse is not a requirement, here a base R solution:

data.frame( df1, Days=as.vector( sapply( unique(df1$Product),
   function(x) df1$Dates[df1$Product==x] - df1$Dates[df1$Product==x][1] ) ) )

        Dates Product Days
1  2021-01-01  Banana    0
2  2021-01-02  Banana    1
3  2021-01-03  Banana    2
4  2021-01-04  Banana    3
5  2021-01-05  Banana    4
6  2021-01-06   Apple    0
7  2021-01-07   Apple    1
8  2021-01-08   Apple    2
9  2021-01-09   Apple    3
10 2021-01-10   Apple    4
11 2021-01-11  Orange    0
12 2021-01-12  Orange    1
13 2021-01-13  Orange    2
14 2021-01-14  Orange    3
15 2021-01-15  Orange    4
  • Related