I have a file that has multiple "header lines" (identical) for each section (ID) and has multiple IDs. see below;
IDS: df-1 df-1 df-1
Date/Time L A Z
yyy-mm-dd H:M:S in ft/s2 ft
2022-04-28 13:30:00 NA 0 NA
2022-04-28 13:45:00 NA 2.968 NA
2022-04-28 14:00:00 2.427 3.124 22.818
IDS: dat-2 dat-2 dat-2
Date/Time L A Z
yyy-mm-dd H:M:S in ft/s2 ft
2022-04-29 10:15:00 1.352 2.137 5.552
2022-04-29 10:30:00 1.365 2.303 6.065
This is a tab-separated file, and I tried using tsconvert
package to no avail.
My first idea is to remove the header lines and only keep the ID lines, split the data at those lines and assign the ID to each section, and then bind the data into a single dataframe with an ID
column and four other columns for Datetime
, L
, A
, and Z
.
However, this seems to involve a lot of preprocessing. I was hoping for a better (read direct) solution.
CodePudding user response:
An option is to read with readLines
and then do the changes
i1 <- grepl('IDS', lines)
ids <- gsub("IDS:\\s (\\S )\\s .*", "\\1", lines[i1])
df1 <- do.call(rbind, Map(cbind, id = ids, lapply(split(lines,
cumsum(i1)), function(x)
read.table(text = sub("^(\\d{4}.*:\\d{2})\\s ", "'\\1' ",
x[-(1:3)]), header = FALSE, col.names = strsplit(x[2], "\\s ")[[1]],
check.names = FALSE))))
row.names(df1) <- NULL
-output
> df1
id Date/Time L A Z
1 df-1 2022-04-28 13:30:00 NA 0.000 NA
2 df-1 2022-04-28 13:45:00 NA 2.968 NA
3 df-1 2022-04-28 14:00:00 2.427 3.124 22.818
4 dat-2 2022-04-29 10:15:00 1.352 2.137 5.552
5 dat-2 2022-04-29 10:30:00 1.365 2.303 6.065
data
lines <- readLines('file.txt')