Is there an easy function to collect the number of observations corresponding as a part of a date and trasnforming this data in a new variable ? I mean, if I have 5000 observations in my data with dates under the form: y-m-d, is there a way for me to count down easily the number of observations i have for the month of january only for example?
the data looks like this: show(tweets$created_at.x): [1] "2021-12-27 CET" "2021-12-10 CET" "2021-12-25 CET" "2021-12-16 CET" "2021-12-30 CET" "2021-12-26 CET" [7] "2021-12-26 CET" "2021-12-27 CET" "2021-12-26 CET" "2021-12-26 CET" "2021-12-26 CET" "2021-12-27 CET" ... (there are over 40000 tweets)
Since i have way too many observations to perform an analysis with my other data set (50000 to 2400), i want to count them in order to be able to do the analysis
For reference, the data are as is in my other data set: show(df$created_at) [1] "2021-05-21 03:00:51 CEST" "2020-10-13 16:27:30 CEST" "2020-06-11 01:02:52 CEST" "2021-01-12 09:22:27 CET" [5] "2021-01-30 21:03:28 CET" "2020-12-16 19:35:08 CET" "2021-02-03 03:50:48 CET" "2020-04-23 11:35:34 CEST"
CodePudding user response:
You could count the observations by year/month like this.
Included a second example where the input is a list.
library(tidyverse)
library(tsibble)
# Made-up example data
df <- tribble(~date,
"2022-01-02 CET",
"2022-01-05 CET",
"2022-02-01 CET",
"2022-02-08 CET",
"2022-03-06 CET",
)
df |>
mutate(
date = as.Date(date),
yr_month = yearmonth(date)) |>
count(yr_month)
#> # A tibble: 3 × 2
#> yr_month n
#> <mth> <int>
#> 1 2022 Jan 2
#> 2 2022 Feb 2
#> 3 2022 Mar 1
# Where Tweets data is a list
tweets <-
list(
created_at.x = c(
"2021-12-27 CET",
"2021-12-10 CET",
"2021-12-25 CET",
"2021-12-16 CET",
"2021-12-30 CET",
"2021-12-26 CET"
)
)
tweets$created_at.x |>
as_tibble() |>
mutate(
date = as.Date(value),
yr_month = yearmonth(date)) |>
count(yr_month)
#> # A tibble: 1 × 2
#> yr_month n
#> <mth> <int>
#> 1 2021 Dec 6
Created on 2022-05-27 by the reprex package (v2.0.1)