this is my data:
## # A tibble: 41,685 × 3
## dow rider_count Date
## <chr> <dbl> <int>
## 1 Fri 17 20160101
## 2 Fri 20 20160101
## 3 Fri 17 20160101
## 4 Fri 17 20160101
## 5 Fri NA 20160101
## 6 Fri 63 20160101
## 7 Fri 12 20160101
## 8 Fri 14 20160101
## 9 Fri NA 20160101
## 10 Fri 28 20160101
## # … with 41,675 more rows
There are many more rows as the date goes on. I want a dataframe that has the sum of rider_count for each date. So that it looks something like this:
## # A tibble: 41,685 × 3
## dow rider_count Date
## <chr> <dbl> <int>
## 1 Fri .. 20160101
## 2 Sat .. 20160102
## 3 Sun .. 20160103
## 4 Mon .. 20160104
## 5 Tue .. 20160105
## 6 Wed .. 20160106
## 7 Thu .. 20160107
## 8 Fri .. 20160108
## 9 Sat .. 20160109
## 10 Sun .. 20160110
## # … with 41,675 more rows
I have tried the below, but it doesn't give me the sum of rider_count as it only removes those that had the same date instead of summing the rider_count as you can see here:
new_both$Date <- as.factor(new_both$Date)
new_both[!duplicated(new_both[,c('Date')]),]
## # A tibble: 1,097 × 3
## dow rider_count Date
## <chr> <dbl> <fct>
## 1 Fri 17 20160101
## 2 Sat 21 20160102
## 3 Sun 208 20160103
## 4 Mon 434 20160104
## 5 Tue 292 20160105
## 6 Wed 336 20160106
## 7 Thu 323 20160107
## 8 Fri 95 20160108
## 9 Sat 228 20160109
## 10 Sun 310 20160110
## # … with 1,087 more rows
Any tips I could apply? Thanks.
CodePudding user response:
Maybe you need something like this. We could group then summarise:
library(dplyr)
df %>%
group_by(Date, dow) %>%
summarise(Sum_rider_count = sum(rider_count, na.rm = TRUE))
Date dow Sum_rider_count
<int> <chr> <int>
1 20160101 Fri 188