Home > Mobile >  count weekdays and group by two column in R
count weekdays and group by two column in R

Time:11-28

How to transform data frame, given dataset assume this is a large dataset

datestamp <- c("2020-04-26 17:45:14","2020-04-17 17:08:54","2020-04-01 17:54:13","2020-04-07 12:50:19","2020-04-18 10:22:59")
member_casual <- c("member","member","member","member","casual")
df <- data.frame(datestamp, member_casual)

Desire dataset

member_casual <- c("member", "casual")
monday <- c(0,0)
tuesday <- c(1,0)
wednesday <- c(1,0)
thursday <- c(0,0)
friday <- c(1,0)
saturday <- c(0,1)
sunday <- c(1,0)
df <- data.frame(member_casual,monday,tuesday,wednesday,thursday,friday,saturday,sunday)

I want to know which days is the most counted

CodePudding user response:

Here I way to do it, I create a larger data.frame to exemplify

library(dplyr)
library(lubridate)
library(tidyr)

datestamp <- seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = "day")
member_casual <- rep(c("member","casual"),length(datestamp),replace = TRUE)

df <- data.frame(datestamp  , member_casual)
df
df %>%
  mutate(
    weekday = wday(datestamp,label = TRUE,abbr = FALSE),
    aux = 1) %>%
  pivot_wider(names_from = weekday,values_from = aux,values_fill = 0)

# A tibble: 2,194 x 9
   datestamp  member_casual sábado domingo `segunda-feira` `terça-feira` `quarta-feira` `quinta-feira` `sexta-feira`
   <date>     <chr>          <dbl>   <dbl>           <dbl>         <dbl>          <dbl>          <dbl>         <dbl>
 1 2000-01-01 member             1       0               0             0              0              0             0
 2 2000-01-02 casual             0       1               0             0              0              0             0
 3 2000-01-03 member             0       0               1             0              0              0             0
 4 2000-01-04 casual             0       0               0             1              0              0             0
 5 2000-01-05 member             0       0               0             0              1              0             0
 6 2000-01-06 casual             0       0               0             0              0              1             0
 7 2000-01-07 member             0       0               0             0              0              0             1
 8 2000-01-08 casual             1       0               0             0              0              0             0
 9 2000-01-09 member             0       1               0             0              0              0             0
10 2000-01-10 casual             0       0               1             0              0              0             0
# ... with 2,184 more rows

*My columns have different names because my default language is portuguese.

But if you just wanna know which days is the most counted this works too

df %>% 
  count(weekday = wday(datestamp,label = TRUE,abbr = FALSE),sort = TRUE)

        weekday   n
1       domingo 314
2 segunda-feira 314
3   terça-feira 314
4  quarta-feira 314
5        sábado 314
6  quinta-feira 312
7   sexta-feira 312

CodePudding user response:

An approach that grows with data. Used on slightly extended data. Grouping by date and day first to catch duplicated days or multiple hours per day.

df %>% 
  group_by(day = strftime(datestamp, format = "%A"), 
           date=strftime(datestamp, format="%F")) %>% 
  distinct(date, day, member_casual) %>% 
  group_by(member_casual, day) %>% 
  add_count(day) %>% 
  ungroup() %>% 
  pivot_wider(member_casual, names_from=day, values_from=n, values_fill=0)
# A tibble: 2 × 6
  member_casual Sunday Friday Wednesday Tuesday Saturday
  <chr>          <int>  <int>     <int>   <int>    <int>
1 member             1      1         1       1        0
2 casual             1      0         0       0        1

ext. data

df <- structure(list(datestamp = c("2020-04-26 17:45:14", "2020-04-17 17:08:54",
"2020-04-01 17:54:13", "2020-04-07 12:50:19", "2020-04-18 10:22:59",
"2020-04-26 17:45:14", "2020-04-17 17:08:54", "2020-04-01 17:54:13",
"2020-04-07 12:50:19", "2020-04-18 10:22:59", "2020-04-18 10:22:59",
"2020-04-19 10:22:59"), member_casual = c("member", "member",
"member", "member", "casual", "member", "member", "member", "member",
"casual", "casual", "casual")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
  • Related