I have a text file that has the following structure
"ts": "2021-01-29T00:06:46.929363"
"from": "text"
"to": "text"
"body": "text"
The txt file is quite large.
How can I create a dataframe with the following structure
ts | from | to | body |
---|---|---|---|
timestamp | text | text | text |
timestamp | text | text | text |
timestamp | text | text | text |
timestamp | text | text | text |
timestamp | text | text | text |
Any help is much appreciated!
CodePudding user response:
Read the file, and use each line to update a dict
, when there is 4 keys, save them and start a new dict, finally build the dataframe
import pandas as pd
with open("data.txt") as f:
batch = {}
result = []
for line in f:
key, value = line.rstrip().split(":", maxsplit=1)
batch[key.strip('" ')] = value.strip('" ')
if len(batch) == 4:
result.append(batch)
batch = {}
df = pd.DataFrame(result)