Home > Software design >  Creating a new variable from data (from dates) for research
Creating a new variable from data (from dates) for research

Time:05-28

Is there an easy function to collect the number of observations corresponding as a part of a date and trasnforming this data in a new variable ? I mean, if I have 5000 observations in my data with dates under the form: y-m-d, is there a way for me to count down easily the number of observations i have for the month of january only for example?

the data looks like this: show(tweets$created_at.x): [1] "2021-12-27 CET" "2021-12-10 CET" "2021-12-25 CET" "2021-12-16 CET" "2021-12-30 CET" "2021-12-26 CET" [7] "2021-12-26 CET" "2021-12-27 CET" "2021-12-26 CET" "2021-12-26 CET" "2021-12-26 CET" "2021-12-27 CET" ... (there are over 40000 tweets)

Since i have way too many observations to perform an analysis with my other data set (50000 to 2400), i want to count them in order to be able to do the analysis

For reference, the data are as is in my other data set: show(df$created_at) [1] "2021-05-21 03:00:51 CEST" "2020-10-13 16:27:30 CEST" "2020-06-11 01:02:52 CEST" "2021-01-12 09:22:27 CET" [5] "2021-01-30 21:03:28 CET" "2020-12-16 19:35:08 CET" "2021-02-03 03:50:48 CET" "2020-04-23 11:35:34 CEST"

CodePudding user response:

You could count the observations by year/month like this.

Included a second example where the input is a list.

library(tidyverse)
library(tsibble)

# Made-up example data
df <- tribble(~date,
              "2022-01-02 CET",
              "2022-01-05 CET",
              "2022-02-01 CET",
              "2022-02-08 CET",
              "2022-03-06 CET",
) 

df |> 
  mutate(
    date = as.Date(date),
    yr_month = yearmonth(date)) |> 
  count(yr_month)
#> # A tibble: 3 × 2
#>   yr_month     n
#>      <mth> <int>
#> 1 2022 Jan     2
#> 2 2022 Feb     2
#> 3 2022 Mar     1

# Where Tweets data is a list
tweets <-
  list(
    created_at.x = c(
      "2021-12-27 CET",
      "2021-12-10 CET",
      "2021-12-25 CET",
      "2021-12-16 CET",
      "2021-12-30 CET",
      "2021-12-26 CET"
    )
  )

tweets$created_at.x |> 
  as_tibble() |> 
  mutate(
    date = as.Date(value),
    yr_month = yearmonth(date)) |> 
  count(yr_month)
#> # A tibble: 1 × 2
#>   yr_month     n
#>      <mth> <int>
#> 1 2021 Dec     6

Created on 2022-05-27 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related