I'm having some trouble trying to do a count of days based on starting dates. I basically just want a count of days passed since the starting date by product.
I think it is best illustrated by example. This is what I start with:
df1 <- data.frame(Dates = seq(as.Date("2021/1/1"), as.Date("2021/1/15"), "days"),
Product = rep(c(rep("Banana", 5), rep("Apple", 5), rep("Orange", 5)))
)
Dates Product
1 2021-01-01 Banana
2 2021-01-02 Banana
3 2021-01-03 Banana
4 2021-01-04 Banana
5 2021-01-05 Banana
6 2021-01-06 Apple
7 2021-01-07 Apple
8 2021-01-08 Apple
9 2021-01-09 Apple
10 2021-01-10 Apple
11 2021-01-11 Orange
12 2021-01-12 Orange
13 2021-01-13 Orange
14 2021-01-14 Orange
15 2021-01-15 Orange
I currently have several measurements for each product that I need to plot as number of days rather than dates and I cannot make the transformation.
And this is what I want:
desired_df <- data.frame(Dates = seq(as.Date("2021/1/1"), as.Date("2021/1/15"), "days"),
Product = rep(c(rep("Banana", 5), rep("Apple", 5), rep("Orange", 5))),
Days = rep(seq(0, 4), 3)
)
Dates Product Days
1 2021-01-01 Banana 0
2 2021-01-02 Banana 1
3 2021-01-03 Banana 2
4 2021-01-04 Banana 3
5 2021-01-05 Banana 4
6 2021-01-06 Apple 0
7 2021-01-07 Apple 1
8 2021-01-08 Apple 2
9 2021-01-09 Apple 3
10 2021-01-10 Apple 4
11 2021-01-11 Orange 0
12 2021-01-12 Orange 1
13 2021-01-13 Orange 2
14 2021-01-14 Orange 3
15 2021-01-15 Orange 4
So far I've tried a few approaches, but none works.
df2 <- df1 %>%
mutate(Days = Dates - Dates[1])
df3 <- df1 %>%
group_by(Product) %>%
mutate(Days = Dates - Dates[1])
Dates Product Days
starter_dates <- df1 %>%
aggregate(by = list(df1$Product), FUN = first)
Group.1 Dates Product
1 Apple 2021-01-06 Apple
2 Banana 2021-01-01 Banana
3 Orange 2021-01-11 Orange
df4 <- df1 %>%
mutate(
Days = case_when(Product == starter_dates$Product ~ Dates - starter_dates$Dates)
)
But none produced what I want. How can I calculate the number of days from first appearance?
EDIT:
This is what I get from suggested answers:
> df1 %>% group_by(Product) %>% mutate(Days = as.numeric(Dates - Dates[1]))
# A tibble: 15 x 3
# Groups: Product [3]
Dates Product Days
<date> <chr> <dbl>
1 2021-01-01 Banana 0
2 2021-01-02 Banana 1
3 2021-01-03 Banana 2
4 2021-01-04 Banana 3
5 2021-01-05 Banana 4
6 2021-01-06 Apple 5
7 2021-01-07 Apple 6
8 2021-01-08 Apple 7
9 2021-01-09 Apple 8
10 2021-01-10 Apple 9
11 2021-01-11 Orange 10
12 2021-01-12 Orange 11
13 2021-01-13 Orange 12
14 2021-01-14 Orange 13
15 2021-01-15 Orange 14
Ensuring no conflicts from other packages, below now works.
df1 %>% group_by(Product) %>%
mutate(Days=lubridate::day(Dates)-first(lubridate::day(Dates)))
CodePudding user response:
We can subtract the "Date", for every row, from the first
"Date" value:
df1 %>% group_by(Product) %>%
mutate(Days=lubridate::day(Dates)-first(lubridate::day(Dates)))
# A tibble: 15 x 3
# Groups: Product [3]
Dates Product Days
<date> <chr> <int>
1 2021-01-01 Banana 0
2 2021-01-02 Banana 1
3 2021-01-03 Banana 2
4 2021-01-04 Banana 3
5 2021-01-05 Banana 4
6 2021-01-06 Apple 0
7 2021-01-07 Apple 1
8 2021-01-08 Apple 2
9 2021-01-09 Apple 3
10 2021-01-10 Apple 4
11 2021-01-11 Orange 0
12 2021-01-12 Orange 1
13 2021-01-13 Orange 2
14 2021-01-14 Orange 3
15 2021-01-15 Orange 4
CodePudding user response:
Since using tidyverse is not a requirement, here a base R solution:
data.frame( df1, Days=as.vector( sapply( unique(df1$Product),
function(x) df1$Dates[df1$Product==x] - df1$Dates[df1$Product==x][1] ) ) )
Dates Product Days
1 2021-01-01 Banana 0
2 2021-01-02 Banana 1
3 2021-01-03 Banana 2
4 2021-01-04 Banana 3
5 2021-01-05 Banana 4
6 2021-01-06 Apple 0
7 2021-01-07 Apple 1
8 2021-01-08 Apple 2
9 2021-01-09 Apple 3
10 2021-01-10 Apple 4
11 2021-01-11 Orange 0
12 2021-01-12 Orange 1
13 2021-01-13 Orange 2
14 2021-01-14 Orange 3
15 2021-01-15 Orange 4