I have a file in the following format:
Heading 1
Some text here
Heading 2
Some text here
Heading 3
Some text here
I need to create a dataframe in the format
df=
Heading | text |
---|---|
Heading 1 | Some text here |
Heading 2 | Some text here |
Heading 3 | Some text here |
CodePudding user response:
You can try the code below:
with open('data.txt') as fp:
data = [line.strip() for line in fp if line.strip()]
df = pd.DataFrame(list(zip(data[::2], data[1::2])), columns=['Heading', 'text'])
Output:
>>> df
Heading text
0 Heading 1 Some text here
1 Heading 2 Some text here
2 Heading 3 Some text here
Content of data.txt
file:
Heading 1
Some text here
Heading 2
Some text here
Heading 3
Some text here
CodePudding user response:
Considering the file has Header in odd and text in even rows:
You first read the whole data and create a numpy
array. Then you can reshape it to be nx2
. Last but not least you create a data frame from it.
import numpy as np
import pandas as pd
with open("data.dat", "r") as the_file:
data = np.array([d.strip() for d in the_file])
df = pd.DataFrame(data.reshape((-1, 2)), columns=["Heading", "text"])
CodePudding user response:
The following could be adapted if you want to understand the way of structuring your code.
df= DataFrame()
header=[]
text=[]
with open("sample.txt", "r") as f:
for f_line in f.readlines():
if f_line.startswith("Header"):
header.append(f_line.rstrip())
else:
text.append(f_line.rstrip())
df['header']=header
df['text']=text