Home > Enterprise >  Parsing txt file in R
Parsing txt file in R

Time:09-28

I need to parse a txt file like this:

2021 Sep 27 15:54:50     avg_dur     =      0.321 s
2021 Sep 27 15:54:52     avg_dur     =      0.036 s
2021 Sep 27 15:54:54     avg_dur     =      0.350 s
2021 Sep 27 15:54:56     avg_dur     =      0.317 s

I am interest in parsing the date and the number in a R data frame. I am trying a parser like this (only for the date):

df <- read_table("myFile.txt", col_names = FALSE, col_types = cols(X1 = col_datetime(format = "%Y %b %d %H:%M:%S")))

But it doesn't work:

Warning: 31502 parsing failures.
row col                    expected actual                                                file
  1  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  2  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  3  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  4  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  5  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
... ... ........................... ...... ...................................................
See problems(...) for more details.

The problem is clearly that it's trying to parse the first column with the recipe of the whole date time.

Which is the correct way to parse this txt file in a data frame?

Regards, S.

CodePudding user response:

1) read.zoo Read it into a zoo object, z, and then convert that to a data frame (or just leave it as a zoo object). We have used Lines in the Note at the end for reproducibility but text = Lines can be replaced with "myFile.txt".

library(zoo)

z <- read.zoo(text = Lines, sep = "=", 
  format = "%Y %b %d %H:%M:%S", tz = "", comment.char = "s")
fortify.zoo(z)

giving this data frame having POSIXct and numeric columns:

                Index     z
1 2021-09-27 15:54:50 0.321
2 2021-09-27 15:54:52 0.036
3 2021-09-27 15:54:54 0.350
4 2021-09-27 15:54:56 0.317

2) Base R Read it into a data frame dd and then convert the first column to POSIXct.

dd <- read.table(text = Lines, sep = "=", comment.char = "s")
dd$V1 <- as.POSIXct(dd$V1, format = "%Y %b %d %H:%M:%S")

Note

Lines <- "2021 Sep 27 15:54:50     avg_dur     =      0.321 s
2021 Sep 27 15:54:52     avg_dur     =      0.036 s
2021 Sep 27 15:54:54     avg_dur     =      0.350 s
2021 Sep 27 15:54:56     avg_dur     =      0.317 s"

CodePudding user response:

This should get you started: Read the text file and replace the spaces (or whatever string separates the columns) with a comma (or semicolon etc). Then pass this to read.csv using the text= argument. Then use any of the many date parsers to convert the strings to date datatypes.

1.Creating example data

txt <- "2021 Sep 27 15:54:50     avg_dur     =      0.321 s
2021 Sep 27 15:54:52     avg_dur     =      0.036 s
2021 Sep 27 15:54:54     avg_dur     =      0.350 s
2021 Sep 27 15:54:56     avg_dur     =      0.317 s"

2.Read data using read_lines. In your case txt is the path to the text file

read.csv(text=gsub("     ",  ", ", read_lines(txt)), sep=",", header = FALSE)

Returns:

                    V1       V2 V3        V4
1 2021 Sep 27 15:54:50  avg_dur  =   0.321 s
2 2021 Sep 27 15:54:52  avg_dur  =   0.036 s
3 2021 Sep 27 15:54:54  avg_dur  =   0.350 s
4 2021 Sep 27 15:54:56  avg_dur  =   0.317 s
  • Related