I have a text file of the following format: //
DATASET
..... unnecessary lines.....
TIMEUNITS SECONDS
TS 0 1.98849600e 08
3.30000000e-03 1.25400000e-02 5.88000000e-03 0.00000000e 00 0.00000000e 00
5.88000000e-03 3.33000000e-03 2.16000000e-03 0.00000000e 00 0.00000000e 00
TS 0 1.98853209e 08
0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00
1.25400000e-02 5.88000000e-03 3.33000000e-03 0.00000000e 00 0.00000000e 00
TS 0 1.98860419e 08
3.33000000e-03 2.16000000e-03 1.08000000e-03 0.00000000e 00 0.00000000e 00
0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00
TS 0 1.98864081e 08
1.08000000e-03 8.70000000e-04 7.20000000e-04 0.00000000e 00 0.00000000e 00
0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00
TS 0 1.98867619e 08
0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00 0.00000000e 00
3.33000000e-03 2.16000000e-03 1.08000000e-03 0.00000000e 00 0.00000000e 00
I am also attaching the sample text file named "D50.bc" in this link: https://drive.google.com/file/d/1P5aFC0JsRLhwuUo7JENLg03DbDJ696lk/view?usp=sharing.
There is no column names but it is fine to add column names i.e., V1, V2 etc. In the real text file there are 14 columns and 1000s of lines/rows after each TS. Each row corresponds to a node and the columns corresponds to certain values(i.e., velocity/shear stress etc.) at that node
I want to extract all the data/tables under the lines "TS 0 XXX" into individual dfs based on their timestamps(TS) so that I can do column operations for each TS. The TS value XXXX which is in second can be added as a separate column in the dfs. Below is the regular expression I have tried in R but it returns nothing. Any help is highly appreciated.
library(dplyr)
rm(list=ls(all=TRUE))
a <- paste0(readLines("D:/D50 python/D50.bc"), collapse = "\n")
b <- regmatches(a, gregexpr("(?s)^TS[^\n] 0.*?\n\\K.*?\n\b", a,, perl = T))[[1]]
CodePudding user response:
spltxt <- split(txt, cumsum(grepl("^\\s*TS 0 ", txt)))[-1]
alldat <- Map(function(S, grp) {
out <- read.table(text = S[-1], header = FALSE)
out$grp <- grp
out$node <- seq_len(nrow(out))
TS <- trimws(strsplit(S[1], "\\s ")[[1]])
out$TS <- as.numeric(TS[length(TS)])
out$TS0 <- S[1]
out
}, spltxt, seq_along(spltxt))
out <- do.call(rbind, alldat)
out
# V1 V2 V3 V4 V5 grp node TS TS0
# 1.1 0.00330 0.01254 0.00588 0 0 1 1 198849600 TS 0 1.98849600e 08
# 1.2 0.00588 0.00333 0.00216 0 0 1 2 198849600 TS 0 1.98849600e 08
# 2.1 0.00000 0.00000 0.00000 0 0 2 1 198853209 TS 0 1.98853209e 08
# 2.2 0.01254 0.00588 0.00333 0 0 2 2 198853209 TS 0 1.98853209e 08
# 3.1 0.00333 0.00216 0.00108 0 0 3 1 198860419 TS 0 1.98860419e 08
# 3.2 0.00000 0.00000 0.00000 0 0 3 2 198860419 TS 0 1.98860419e 08
# 4.1 0.00108 0.00087 0.00072 0 0 4 1 198864081 TS 0 1.98864081e 08
# 4.2 0.00000 0.00000 0.00000 0 0 4 2 198864081 TS 0 1.98864081e 08
# 5.1 0.00000 0.00000 0.00000 0 0 5 1 198867619 TS 0 1.98867619e 08
# 5.2 0.00333 0.00216 0.00108 0 0 5 2 198867619 TS 0 1.98867619e 08
Walk-through:
cumsum(grepl(...))
groups the data by leadingTS 0
, so that we can split them up into individual matrices:cumsum(grepl("^\\s*TS 0 ", txt)) # [1] 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
We split the text up by these numbers, so
0
contains all of the preamble;1
includes the firstTS 0
group, etc. In this case, the0
s correspond to all text before the firstTS 0
, so we can discard the first segment with[-1]
.Map(...
iterates ourfunction(S, grp)
over each of the numbered groups as well as counting along the groups, called five times in this case.read.table(...)
makes short work of converting blank-space delimited numbers into adata.frame
without pre-defined column names. The normal frame$
-assignment takes care of the group and node values, as well as parsing out the timestamp numeric in theTS 0
line.do.call(rbind, alldat)
is one way to combine all frames in a list into a single frame; alternatives includedplyr::bind_rows(alldat)
anddata.table::rbindlist(alldat)
. (All of these are significantly better/faster than iteratively building a frame in afor
loop.)
Assumption:
- all matrix groups have the same number of columns