I need to parse a txt file like this:
2021 Sep 27 15:54:50 avg_dur = 0.321 s
2021 Sep 27 15:54:52 avg_dur = 0.036 s
2021 Sep 27 15:54:54 avg_dur = 0.350 s
2021 Sep 27 15:54:56 avg_dur = 0.317 s
I am interest in parsing the date and the number in a R data frame. I am trying a parser like this (only for the date):
df <- read_table("myFile.txt", col_names = FALSE, col_types = cols(X1 = col_datetime(format = "%Y %b %d %H:%M:%S")))
But it doesn't work:
Warning: 31502 parsing failures.
row col expected actual file
1 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
2 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
3 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
4 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
5 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
... ... ........................... ...... ...................................................
See problems(...) for more details.
The problem is clearly that it's trying to parse the first column with the recipe of the whole date time.
Which is the correct way to parse this txt file in a data frame?
Regards, S.
CodePudding user response:
1) read.zoo Read it into a zoo object, z
, and then convert that to a data frame (or just leave it as a zoo object). We have used Lines
in the Note at the end for reproducibility but text = Lines
can be replaced with "myFile.txt"
.
library(zoo)
z <- read.zoo(text = Lines, sep = "=",
format = "%Y %b %d %H:%M:%S", tz = "", comment.char = "s")
fortify.zoo(z)
giving this data frame having POSIXct and numeric columns:
Index z
1 2021-09-27 15:54:50 0.321
2 2021-09-27 15:54:52 0.036
3 2021-09-27 15:54:54 0.350
4 2021-09-27 15:54:56 0.317
2) Base R Read it into a data frame dd
and then convert the first column to POSIXct.
dd <- read.table(text = Lines, sep = "=", comment.char = "s")
dd$V1 <- as.POSIXct(dd$V1, format = "%Y %b %d %H:%M:%S")
Note
Lines <- "2021 Sep 27 15:54:50 avg_dur = 0.321 s
2021 Sep 27 15:54:52 avg_dur = 0.036 s
2021 Sep 27 15:54:54 avg_dur = 0.350 s
2021 Sep 27 15:54:56 avg_dur = 0.317 s"
CodePudding user response:
This should get you started: Read the text file and replace the spaces (or whatever string separates the columns) with a comma (or semicolon etc). Then pass this to read.csv
using the text=
argument. Then use any of the many date parsers to convert the strings to date datatypes.
1.Creating example data
txt <- "2021 Sep 27 15:54:50 avg_dur = 0.321 s
2021 Sep 27 15:54:52 avg_dur = 0.036 s
2021 Sep 27 15:54:54 avg_dur = 0.350 s
2021 Sep 27 15:54:56 avg_dur = 0.317 s"
2.Read data using read_lines
. In your case txt
is the path to the text file
read.csv(text=gsub(" ", ", ", read_lines(txt)), sep=",", header = FALSE)
Returns:
V1 V2 V3 V4
1 2021 Sep 27 15:54:50 avg_dur = 0.321 s
2 2021 Sep 27 15:54:52 avg_dur = 0.036 s
3 2021 Sep 27 15:54:54 avg_dur = 0.350 s
4 2021 Sep 27 15:54:56 avg_dur = 0.317 s