My dataframe:
amount | counter |
---|---|
120 | 1 |
778 | 1 |
4 | 2 |
322 | 3 |
51 | 4 |
6 | 5 |
17 | 6 |
1 | 6 |
489 | 7 |
224 | 8 |
33 | 9 |
56 | 9 |
... | ... |
What I want to do:
The counter variable is counting in which week my data is occuring. I want to group the dataframe by 4 weeks, meaning I have to group by the counter variable e.g., 1 to 4, 5 to 8 and so on. Then I want to sort the groups ascending by amount and only keep the rows with the 30 highest amounts per group.
What functions I might need:
group_by (dplyr package) and filter (dplyr package)
What I tried:
I found out that I will need the "group_by" and "filter" functions from the dplyr package. I read thru the documentation and took a look at code examples but am not able to refer it to my usecase.
CodePudding user response:
Four-weekly groupings
If you want to do this four-weekly using dplyr
, you can do this:
NUM_WEEKS <- 4
TOP_N <- 30
dat |>
mutate(
four_week_group = cut(
counter, breaks = seq(
from = 0,
to = max(counter) NUM_WEEKS,
by = NUM_WEEKS,
)
),
group_num = as.integer(four_week_group)
) |>
group_by(four_week_group) |>
arrange(four_week_group, desc(amount)) |>
filter(row_number() <= TOP_N)
# # A tibble: 12 x 4
# # Groups: four_week_group [3]
# amount counter four_week_group group_num
# <int> <int> <fct> <int>
# 1 778 1 (0,4] 1
# 2 322 3 (0,4] 1
# 3 120 1 (0,4] 1
# 4 51 4 (0,4] 1
# 5 4 2 (0,4] 1
# 6 489 7 (4,8] 2
# 7 224 8 (4,8] 2
# 8 17 6 (4,8] 2
# 9 6 5 (4,8] 2
# 10 1 6 (4,8] 2
# 11 56 9 (8,12] 3
# 12 33 9 (8,12] 3
The output is the same as the input in this case (apart from the row group) as all groups have <30 rows.
Monthly groupings
You could do the same with monthly groupings as well if you provide a year:
YEAR <- 2022
dat |>
mutate(
date_to_use = as.Date(
paste(YEAR, counter, 1, sep="-"),
"%Y-%U-%u"),
month = format(date_to_use, "%B")
)
# amount counter date_to_use month
# 1 120 1 2022-01-03 January
# 2 778 1 2022-01-03 January
# 3 4 2 2022-01-10 January
# 4 322 3 2022-01-17 January
# 5 51 4 2022-01-24 January
# 6 6 5 2022-01-31 January
# 7 17 6 2022-02-07 February
# 8 1 6 2022-02-07 February
# <etc>
The rest of the steps would be the same (using group_by(month)
of course).
Data
dat <- structure(list(amount = c(120L, 778L, 4L, 322L, 51L, 6L, 17L,
1L, 489L, 224L, 33L, 56L), counter = c(1L, 1L, 2L, 3L, 4L, 5L,
6L, 6L, 7L, 8L, 9L, 9L)), class = "data.frame", row.names = c(NA,
-12L))
# # A tibble: 12 x 3
# # Groups: four_week_group [3]
# amount counter four_week_group
# <int> <int> <fct>
# 1 778 1 (0,4]
# 2 489 7 (4,8]
# 3 322 3 (0,4]
# 4 224 8 (4,8]
# 5 120 1 (0,4]
# 6 56 9 (8,12]
# 7 51 4 (0,4]
# 8 33 9 (8,12]
# 9 17 6 (4,8]
# 10 6 5 (4,8]
# 11 4 2 (0,4]
# 12 1 6 (4,8]