I am fairly new to programming. I am working on my portfolio, and am looking at a dataset regarding the price of food from distribution centers to a grocery store. What I am looking at is a set of data with the price, item, and date of transaction. What I am looking for is to find the rate of change from the distribution center to the store, and when it happened.
Note: the price of the item changes from the distribution center.
Here is an example of what I am looking at:
Date | Item | Price | Kg |
---|---|---|---|
01.02.2022 | Apple | $1.00 | 1 |
02.02.2022 | Meat | $4.00 | 1 |
03.02.2022 | Fish | $3.00 | 1 |
03.02.2022 | Bread | $1.00 | 1 |
15.02.2022 | Meat | $5.00 | 1 |
15.02.2022 | Meat | $3.00 | 1 |
16.02.2022 | Apple | $2.00 | 1 |
20.02.2022 | Fish | $3.00 | 1 |
25.02.2022 | Apple | $0.50 | 1 |
As you can see, the price for the same quantity for the same product changes randomly over time. What I would like to analyse is:
- The rate of change per item
- When the change occured
This is the ideal outcome:
item | kg | 1st_price | 1st_price_date | 2nd_price | 2nd_price_date | amount_of_change |
---|---|---|---|---|---|---|
Apple | 1 | $1.00 | 01.02.2022 | $2.00 | 16.02.2022 | $1.00 |
Meat | 1 | $4.00 | 02.02.2022 | $5.00 | 15.02.2022 | $1.00 |
Bread | 1 | $1.00 | 03.02.2022 | N/A | N/A | N/A |
Fish | 1 | $3.00 | 03.02.2022 | $3.00 | 20.02.2022 | $0.00 |
#Continuing the table below. These columns would go to the right of the columns above. #Unfortunetly, StackOverflow was not able to create a table with everything together. #total_change is for the entire period
item | 3rd_price | 3rd_price_date | amount_of_change | change_duration_period | total_change |
---|---|---|---|---|---|
Apple | $0.50 | 25.02.2022 | -$1.50 | 01.02.2022-25.02.2022 | -$0.50 |
Meat | $3.00 | 15.02.2022 | -$2.00 | 02.02.2022-1502.2022 | -$1.00 |
Bread | N/A | N/A | N/A | 03.02.2022-03.02-2022 | $0.00 |
Fish | $3.00 | 20.02.2022 | $0.00 | 03.02.2022-20.02.2022 | $0.00 |
As you can see, some items can have more price changes per month than others depending on the item. Some items have drastic changes, some have no changes at all.
Presuming there are over 14,000 unique items what would you recommend to gather the data an place them in a table as seen in the "Ideal outcome" section?
I am still new to programming, please don't be too harsh!
Thanks!
CodePudding user response:
Something like this?
library(tidyverse)
df %>%
mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
Price = parse_number(Price)) %>%
group_by(Item) %>%
arrange(Date) %>%
mutate(appearance = row_number(),
change = Price - lag(Price)) %>%
ungroup() %>%
pivot_wider(names_from = appearance,
values_from = c(Date, Price, change),
names_vary = "slowest")
Result
# A tibble: 4 × 11
Item Kg Date_1 Price_1 change_1 Date_2 Price_2 change_2 Date_3 Price_3 change_3
<chr> <int> <date> <dbl> <dbl> <date> <dbl> <dbl> <date> <dbl> <dbl>
1 Apple 1 2022-02-01 1 NA 2022-02-16 2 1 2022-02-25 0.5 -1.5
2 Meat 1 2022-02-02 4 NA 2022-02-15 5 1 2022-02-15 3 -2
3 Fish 1 2022-02-03 3 NA 2022-02-20 3 0 NA NA 0
4 Bread 1 2022-02-03 1 NA NA NA 0 NA NA 0
Source data
df <- data.frame(
stringsAsFactors = FALSE,
Date = c("01.02.2022","02.02.2022",
"03.02.2022","03.02.2022","15.02.2022","15.02.2022",
"16.02.2022","20.02.2022","25.02.2022"),
Item = c("Apple","Meat","Fish",
"Bread","Meat","Meat","Apple","Fish","Apple"),
Price = c("$1.00","$4.00","$3.00",
"$1.00","$5.00","$3.00","$2.00","$3.00","$0.50"),
Kg = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)
)