Reading a date, time text file and converting to string using strptime()?-CodePudding

I have a text file of many rows containing date and time and the end goal is for me to group together the number of rows per week that their date values are in. This is so that I can plot a scatter diagram with x values being the week number and y values being the frequency. For example the text file (dates.txt):

Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013

So from here, week 1 will have a frequency of 6 and week 2 will have a frequency of 1

As I want to plot a scatter diagram for this, I want to convert them to text value first using strptime() with format %a %b

my attempt so far has been

time_stamp <- strptime(time_stamp, format='%a.%b')

However it shows the input string is too long. I'm very new to R-studio so could somebody please help me figure this out?

Thank you

@jay.sf

trying:

strftime(strptime(readLines('timestamp.txt'), '%c'), '%a.%b')

output:

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[16] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[31] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[46] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
...
[991] NA NA NA NA NA NA NA NA NA NA
[ reached getOption("max.print") -- omitted 659 entries ]

CodePudding user response：

You need to first read (or assign) the data, parse it to a date type and then use that to e.g. get the number of the week.

Here is one example

text <- "Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013"

data <- read.table(text=text, sep='\n', col.names="dates")
data$parse <- anytime::anytime(data$dates)
data$week <- as.integer(format(data$parse, "%V"))

data

The result is a new data.frame object:

> data
                     dates               parse week
1 Mon May 11 22:51:27 2013 2013-05-11 22:51:27   19
2 Mon May 11 22:58:34 2013 2013-05-11 22:58:34   19
3 Wed May 13 23:15:27 2013 2013-05-13 23:15:27   20
4 Thu May 14 04:11:22 2013 2013-05-14 04:11:22   20
5 Sat May 16 19:46:55 2013 2013-05-16 19:46:55   20
6 Sat May 16 22:29:54 2013 2013-05-16 22:29:54   20
7 Sun May 17 02:08:45 2013 2013-05-17 02:08:45   20
8 Sun May 17 23:55:15 2013 2013-05-17 23:55:15   20
9 Mon May 18 00:42:07 2013 2013-05-18 00:42:07   20
>

CodePudding user response：

You could use readLines() to avoid the data frame, then read time using strptime, and finally strftime to format the output.

strftime(strptime(readLines('dates.txt'), '%c'), '%a.%b')
# [1] "Sat.May" "Sat.May" "Mon.May" "Tue.May" "Thu.May" "Thu.May" "Fri.May" "Fri.May" "Sat.May"

Edit

So it appears that your dates have a time zone abbreviation "Mon Apr 06 23:49:29 PDT 2009". Since it is constant during the dates we can specify it literally in the pattern.

We will use '%d_%m' for strftime to get something numeric seperated by _ with which we feed strsplit and then type.convert into numerics.

Finally we unlist, create a matrix that we fill byrow, and plot the guy.

strptime(readLines('timestamp.txt'), '%a %b %d %H:%M:%S PDT %Y') |>
  strftime('%d_%m') |>
  strsplit('_') |>
  type.convert(as.is=TRUE) |>
  unlist() |>
  matrix(ncol=2, byrow=TRUE) |>
  plot(pch=20, col=4, main='My Plot', xlab='day', ylab='month')

Note: Please use R>=4.1 for the |> pipes.