I'm reading hundreds of model outputs from text file, where the first nrows are non-relevant text rows about the model run (note: nrows varies from file to file). However, I have comma separated data that I want to import from the text file. This data can be found following the line "BREAK THROUGH @ WT, ITERATION" and "END BREAK THROUGH @ WT" in all of the files (see below). My current approach of using nrows and skiprows in read_csv() doesnt work because these parameters vary from file to file. Any thoughts on how you can import csv data from text files using string "markers". Thanks!
Model output/Input files I want to read look like
text
text
0.314347435514229
text
text
text
text
text
text
text
BREAK THROUGH @ WT, ITERATION
1 0.0
3 0.0
6 0.0
END BREAK THROUGH @ WT
The extracted data in the dataframe would look like
1 0.0
3 0.0
6 0.0
CodePudding user response:
Using fake data with a column named "your_column":
words = ["BREAK THROUGH @ WT, ITERATION", "END BREAK THROUGH @ WT"]
df = pd.read_csv(...)
df = df.loc[df["your_column"].isin(words).cumsum() & ~df["your_column"].isin(words)].reset_index(drop=True)
print(df)
CodePudding user response:
Seems like I was able to find a solution without regex but still curious how regex could have simplified my life.
beg_id = "BREAK THROUGH @ WT, ITERATION = 1\n"
end_id = "END BREAK THROUGH @ WT"
# for f in cmtp_fnames:
f = 'data/cmtp/PFOS_Dry_LS_1m_AD R.OUT'
with open(f) as fname:
data = fname.read()
data = data[data.find(beg_id):]
data = data[data.find(beg_id) len(beg_id):data.find(end_id)]
data=data.splitlines(False)
data=pd.DataFrame(sub.split(",") for sub in data).drop(labels=2,axis=1)