How to transform data frame, given dataset assume this is a large dataset
datestamp <- c("2020-04-26 17:45:14","2020-04-17 17:08:54","2020-04-01 17:54:13","2020-04-07 12:50:19","2020-04-18 10:22:59")
member_casual <- c("member","member","member","member","casual")
df <- data.frame(datestamp, member_casual)
Desire dataset
member_casual <- c("member", "casual")
monday <- c(0,0)
tuesday <- c(1,0)
wednesday <- c(1,0)
thursday <- c(0,0)
friday <- c(1,0)
saturday <- c(0,1)
sunday <- c(1,0)
df <- data.frame(member_casual,monday,tuesday,wednesday,thursday,friday,saturday,sunday)
I want to know which days is the most counted
CodePudding user response:
Here I way to do it, I create a larger data.frame to exemplify
library(dplyr)
library(lubridate)
library(tidyr)
datestamp <- seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = "day")
member_casual <- rep(c("member","casual"),length(datestamp),replace = TRUE)
df <- data.frame(datestamp , member_casual)
df
df %>%
mutate(
weekday = wday(datestamp,label = TRUE,abbr = FALSE),
aux = 1) %>%
pivot_wider(names_from = weekday,values_from = aux,values_fill = 0)
# A tibble: 2,194 x 9
datestamp member_casual sábado domingo `segunda-feira` `terça-feira` `quarta-feira` `quinta-feira` `sexta-feira`
<date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2000-01-01 member 1 0 0 0 0 0 0
2 2000-01-02 casual 0 1 0 0 0 0 0
3 2000-01-03 member 0 0 1 0 0 0 0
4 2000-01-04 casual 0 0 0 1 0 0 0
5 2000-01-05 member 0 0 0 0 1 0 0
6 2000-01-06 casual 0 0 0 0 0 1 0
7 2000-01-07 member 0 0 0 0 0 0 1
8 2000-01-08 casual 1 0 0 0 0 0 0
9 2000-01-09 member 0 1 0 0 0 0 0
10 2000-01-10 casual 0 0 1 0 0 0 0
# ... with 2,184 more rows
*My columns have different names because my default language is portuguese.
But if you just wanna know which days is the most counted this works too
df %>%
count(weekday = wday(datestamp,label = TRUE,abbr = FALSE),sort = TRUE)
weekday n
1 domingo 314
2 segunda-feira 314
3 terça-feira 314
4 quarta-feira 314
5 sábado 314
6 quinta-feira 312
7 sexta-feira 312
CodePudding user response:
An approach that grows with data. Used on slightly extended data. Grouping by date and day first to catch duplicated days or multiple hours per day.
df %>%
group_by(day = strftime(datestamp, format = "%A"),
date=strftime(datestamp, format="%F")) %>%
distinct(date, day, member_casual) %>%
group_by(member_casual, day) %>%
add_count(day) %>%
ungroup() %>%
pivot_wider(member_casual, names_from=day, values_from=n, values_fill=0)
# A tibble: 2 × 6
member_casual Sunday Friday Wednesday Tuesday Saturday
<chr> <int> <int> <int> <int> <int>
1 member 1 1 1 1 0
2 casual 1 0 0 0 1
ext. data
df <- structure(list(datestamp = c("2020-04-26 17:45:14", "2020-04-17 17:08:54",
"2020-04-01 17:54:13", "2020-04-07 12:50:19", "2020-04-18 10:22:59",
"2020-04-26 17:45:14", "2020-04-17 17:08:54", "2020-04-01 17:54:13",
"2020-04-07 12:50:19", "2020-04-18 10:22:59", "2020-04-18 10:22:59",
"2020-04-19 10:22:59"), member_casual = c("member", "member",
"member", "member", "casual", "member", "member", "member", "member",
"casual", "casual", "casual")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))