I am trying to use dplyr to do some basic statistics. The two parts of my data that I’m interested in are the dates and the outcomes of an event.
My data has three events that occur, reward, stop, or none (meaning neither a reward or stop occurred). What I want to do is calculate which days have the highest average of rewards by day. I was also hoping to try and obtain the count for overall occurrences, number of stops, rewards, and nones per day.
I have had some success obtaining the unique days and the overall occurrences per day. However, I am struggling to get the remaining data. When I try to adjust the group_by
, it ends up causing issues trying to find the unique days.
df %>%
mutate(ind, ind2 = case_when(ind=="Reward"~1, ind=="Stop"~0, ind=="None"~0)) %>%
group_by(time2) %>%
count(time2, sort = TRUE)
Here, I try to create a new column that converts the event to binary formate so I could then try and calculate the average reward per day. This code is not required in an answer, just an example
Desires output:
Date num_occ stop reward none avg_reward
2022-01-03 3 1 1 1 0.3333333
2022-01-04 9 5 3 1 0.3333333
2022-01-05 2 1 1 0 0.5
2022-01-06 3 3 0 0 0
My question is, how can I calculate the average reward occurrences per day as well as obtain count information regarding the number of overall (reward, stop, none) occurrences per day, number of stops per day, number of rewards per day, and number of nones per day?
Example data:
structure(list(values = c(0, 18, 3, 2, 1, 9, 15, 13, 0, 12, 8,
2, 3, 7, 6, 3), ind = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 2L,
2L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 3L), .Label = c("None", "Reward",
"Stop"), class = "factor"), entry = c(TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE), time = structure(c(1641234180, 1641235020, 1641308400,
1641312840, 1641312900, 1641316920, 1641322920, 1641325080, 1641325560,
1641328740, 1641329220, 1641393900, 1641412140, 1641491040, 1641491640,
1641493200), class = c("POSIXct", "POSIXt"), tzone = ""), time2 = structure(c(18995,
18995, 18996, 18996, 18996, 18996, 18996, 18996, 18996, 18996,
18996, 18997, 18997, 18998, 18998, 18998), class = "Date")), row.names = c(NA,
16L), class = "data.frame")
CodePudding user response:
Something like this may work for you
library(tidyverse)
example_data <- structure(list(values = c(0, 18, 3, 2, 1, 9, 15, 13, 0, 12, 8,
2, 3, 7, 6, 3), ind = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 2L,
2L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 3L), .Label = c("None", "Reward",
"Stop"), class = "factor"), entry = c(TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE), time = structure(c(1641234180, 1641235020, 1641308400,
1641312840, 1641312900, 1641316920, 1641322920, 1641325080, 1641325560,
1641328740, 1641329220, 1641393900, 1641412140, 1641491040, 1641491640,
1641493200), class = c("POSIXct", "POSIXt"), tzone = ""), time2 = structure(c(18995,
18995, 18996, 18996, 18996, 18996, 18996, 18996, 18996, 18996,
18996, 18997, 18997, 18998, 18998, 18998), class = "Date")), row.names = c(NA,
16L), class = "data.frame")
example_data |>
group_by(day = time |> lubridate::as_date()) |>
summarise(num_occ = n(),
stop = length(ind[ind == 'Stop']),
Reward = length(ind[ind == 'Reward']),
None = length(ind[ind == 'None']),
sum_reward = sum(values[ind[ind == 'Reward']])
)
#> # A tibble: 4 x 6
#> day num_occ stop Reward None sum_reward
#> <date> <int> <int> <int> <int> <dbl>
#> 1 2022-01-03 2 0 1 1 18
#> 2 2022-01-04 9 5 3 1 6
#> 3 2022-01-05 2 1 1 0 3
#> 4 2022-01-06 3 3 0 0 0
Created on 2022-02-10 by the reprex package (v2.0.1)
CodePudding user response:
Using @Bruno's sample data (thanks!)
## count totals per day
d1 <- (example_data
%>% count(time2)
)
## count number of each event type per day; convert to wide format
d2 <- (example_data
%>% count(time2, ind)
%>% pivot_wider(names_from = "ind", values_from = n)
%>% replace_na(list(None = 0, Reward = 0, Stop = 0))
)
## combine and compute averages
(full_join(d1, d2, by = "time2")
%>% mutate(avg_reward = Reward/n)
)